Ambient runtime
Every entry point feeds the same intent system, so Ziri stays available whether the interaction starts from the mic, the browser, Siri, or a raw API call.
- Always-on wake word detection via openWakeWord on the Mac
- ElevenLabs Scribe realtime transcription with faster-whisper fallback
- Streaming ElevenLabs TTS, volume ducking, and pre-cached phrases for speed
Supervisor-worker orchestration
The architecture stays inspectable. A supervisor classifies the request, routes to domain agents, and preserves tool use as a visible system instead of burying everything in one oversized prompt.
- LangGraph flow runs `supervisor -> router -> [music|info|home|quick] -> respond`
- MusicAgent, InfoAgent, and HomeAgent use bounded ReAct loops for tool work
- Heuristic and legacy fallbacks keep the assistant alive when services fail
Memory and observability
The jump from prototype to daily-use assistant is whether it remembers context and whether you can see what broke. Ziri does both.
- Amazon Titan embeddings land in pgvector with HNSW search for recall
- Hybrid retrieval fuses Elasticsearch keyword matches with vector search via RRF
- Langfuse and Prometheus track token use, tool accuracy, TTFB, and routing latency
