Voice OS / Updated May 25

Ziri

An always-on voice OS that keeps common commands on a sub-100ms deterministic path and escalates only real reasoning work to a LangGraph multi-agent stack.

Creator2026PythonAlways-on build

What this project proves

This project makes voice feel like infrastructure: instant on routine commands, agentic only when the request is actually hard.

Ziri is a full voice runtime, not a chat wrapper with a microphone. It runs an always-on wake word listener, streams speech in and audio out, routes requests across music, info, home, and quick-action domains, and backs the whole system with semantic memory, traces, metrics, and graceful fallbacks.

README capture
Surface proof
Ziri architecture diagram captured from the GitHub README.

README capture

The architecture is explicit: always-on mic, Siri, browser, and REST all feed one runtime, then fan out into deterministic routes, domain agents, streaming TTS, memory, and observability.

sub-100ms

Fast path

recognized commands bypass the LLM entirely

200+

Direct routes

pattern-matched commands land before agent routing

272

Test suite

PyTest coverage backs the API, orchestration, and tools

01

Ambient runtime

Every entry point feeds the same intent system, so Ziri stays available whether the interaction starts from the mic, the browser, Siri, or a raw API call.

  • Always-on wake word detection via openWakeWord on the Mac
  • ElevenLabs Scribe realtime transcription with faster-whisper fallback
  • Streaming ElevenLabs TTS, volume ducking, and pre-cached phrases for speed
02

Supervisor-worker orchestration

The architecture stays inspectable. A supervisor classifies the request, routes to domain agents, and preserves tool use as a visible system instead of burying everything in one oversized prompt.

  • LangGraph flow runs `supervisor -> router -> [music|info|home|quick] -> respond`
  • MusicAgent, InfoAgent, and HomeAgent use bounded ReAct loops for tool work
  • Heuristic and legacy fallbacks keep the assistant alive when services fail
03

Memory and observability

The jump from prototype to daily-use assistant is whether it remembers context and whether you can see what broke. Ziri does both.

  • Amazon Titan embeddings land in pgvector with HNSW search for recall
  • Hybrid retrieval fuses Elasticsearch keyword matches with vector search via RRF
  • Langfuse and Prometheus track token use, tool accuracy, TTFB, and routing latency