Voice Agent
Add realtime voice chat to your generated apps with bidirectional audio over WebRTC, powered by Azure OpenAI Realtime (gpt-realtime-1.5).
Voice sessions require the Pro or Business plan. Each session costs 5 credits, charged to the app owner.
What It Does
The Voice Agent SDK gives your generated app a single-call API to start a realtime voice session. Audio flows in both directions over WebRTC with sub-second latency, the user speaks, the model responds, and the SDK handles microphone capture, encoding, and playback.
- Bidirectional audio over WebRTC, no polling, no chunked HTTP.
- Ephemeral session tokens, your Azure OpenAI keys never leave the backend.
- Detected automatically when your prompt mentions voice agents, voice bots, or voice booking.
- SDK exposed at window.genmb.voice with start, stop, and status callbacks.
Setup
The SDK is auto-injected when the AI detects voice patterns in your prompt. You can also enable it manually from the Services panel.
Mention voice in your prompt
Or enable from the Services panel
Call window.genmb.voice.start()
window.genmb.voice.start() to begin a session. The SDK requests microphone permission, opens a WebRTC connection, and starts streaming.Handle stop and errors
window.genmb.voice.stop() to end the session and the onerror callback to surface plan-gating, rate-limit, or microphone permission errors to the user.Costs & Plan Gating
Voice sessions are metered per session start, not per turn. Once a session is open, all turns within it are covered by the same charge.
| Cost per session | 5 credits |
| Charged to | App owner, not the end user of the deployed app |
| Minimum plan | Pro |
| Failed sessions | No charge - credits only deduct after the WebRTC handshake succeeds |
Rate Limits
Voice rate limits are enforced per app to protect the upstream realtime endpoint and keep one runaway client from consuming the shared pool.
| Sessions per hour, per app | 20 |
| Window | Rolling 60-minute window (Redis-backed) |
| Over-limit response | HTTP 429 - surface to users with a brief retry message |
Security
Voice sessions are designed so secrets never reach the browser and audio is handled in-flight only.
Browser receives a short-lived session token; the underlying Azure OpenAI API key stays on the GenMB backend.
Audio streams are not stored. Sessions are stateless - turn audio is processed and discarded.
Audio uses encrypted SRTP. No plain HTTP audio paths exist.
Browser-native permission gate, your app cannot bypass user consent.
FAQs
What model powers the Voice Agent?▾
Do I need to provide an API key?▾
How much does a voice session cost?▾
What plans support Voice Agent?▾
Are conversations recorded?▾
Ready to build?
Create your first app for free, no credit card required.