2 weeks ago

Tues Jan 20, 2026 10:45am PST

Show HN: Gemini-live-react – Real-time voice AI that works in the browser

Gemini Live offers real-time bidirectional voice AI, but using it in the browser is rough: - 16kHz in, 24kHz out, browser wants 44.1/48kHz - PCM16 endianness issues - buffering vs latency tradeoffs - playback gaps when chunks arrive mid-stream

I built gemini-live-react, a React hook that fixes the audio DX and adds features I needed to build real AI agents:

Session recording – record transcripts, audio metadata, tool calls, browser actions, and DOM snapshots into a single JSON for debugging/replay

Workflow builder – define multi-step browser automations as a simple state machine (branching + error handling)

Smart element detection – auto-detect clickable elements so agents don’t rely on brittle selectors

Used for voice-driven web agents where the loop is: AI sees UI → decides → clicks/types → repeat

Tech: React hook (~2k LOC), AudioWorklet, WS proxy (Deno/Supabase), TypeScript

GitHub: https://github.com/loffloff/gemini-live-react npm: npm install gemini-live-react

Looking for feedback on the workflow abstraction — state machines felt right, but curious what others use.

comments:

add comment

loading comments...