Voice OS / Updated May 25

Ziri

An always-on voice OS that keeps common commands on a sub-100ms deterministic path and escalates only real reasoning work to a LangGraph multi-agent stack.

Creator2026PythonAlways-on build

Source

What this project proves

This project makes voice feel like infrastructure: instant on routine commands, agentic only when the request is actually hard.

Ziri is a full voice runtime, not a chat wrapper with a microphone. It runs an always-on wake word listener, streams speech in and audio out, routes requests across music, info, home, and quick-action domains, and backs the whole system with semantic memory, traces, metrics, and graceful fallbacks.

Ziri architecture diagram captured from the GitHub README. — README capture
The architecture is explicit: always-on mic, Siri, browser, and REST all feed one runtime, then fan out into deterministic routes, domain agents, streaming TTS, memory, and observability.

sub-100ms

Fast path

recognized commands bypass the LLM entirely

200+

Direct routes

pattern-matched commands land before agent routing

272

Test suite

PyTest coverage backs the API, orchestration, and tools

Ambient runtime

Every entry point feeds the same intent system, so Ziri stays available whether the interaction starts from the mic, the browser, Siri, or a raw API call.

Always-on wake word detection via openWakeWord on the Mac
ElevenLabs Scribe realtime transcription with faster-whisper fallback
Streaming ElevenLabs TTS, volume ducking, and pre-cached phrases for speed

Supervisor-worker orchestration

The architecture stays inspectable. A supervisor classifies the request, routes to domain agents, and preserves tool use as a visible system instead of burying everything in one oversized prompt.

LangGraph flow runs `supervisor -> router -> [music|info|home|quick] -> respond`
MusicAgent, InfoAgent, and HomeAgent use bounded ReAct loops for tool work
Heuristic and legacy fallbacks keep the assistant alive when services fail

Memory and observability

The jump from prototype to daily-use assistant is whether it remembers context and whether you can see what broke. Ziri does both.

Amazon Titan embeddings land in pgvector with HNSW search for recall
Hybrid retrieval fuses Elasticsearch keyword matches with vector search via RRF
Langfuse and Prometheus track token use, tool accuracy, TTFB, and routing latency