Skip to main content
Docs/Voice Agent
DocsVoice Agent
Pro

Voice Agent

Add realtime voice chat to your generated apps with bidirectional audio over WebRTC, powered by Azure OpenAI Realtime (gpt-realtime-1.5).

Voice sessions require the Pro or Business plan. Each session costs 5 credits, charged to the app owner.

What It Does

The Voice Agent SDK gives your generated app a single-call API to start a realtime voice session. Audio flows in both directions over WebRTC with sub-second latency, the user speaks, the model responds, and the SDK handles microphone capture, encoding, and playback.

  • Bidirectional audio over WebRTC, no polling, no chunked HTTP.
  • Ephemeral session tokens, your Azure OpenAI keys never leave the backend.
  • Detected automatically when your prompt mentions voice agents, voice bots, or voice booking.
  • SDK exposed at window.genmb.voice with start, stop, and status callbacks.

Setup

The SDK is auto-injected when the AI detects voice patterns in your prompt. You can also enable it manually from the Services panel.

1

Mention voice in your prompt

Phrases like "voice booking flow", "voice ordering", or "talk-to-AI button" trigger auto-detection during code generation.
2

Or enable from the Services panel

Open the Services panel in the app editor sidebar, find Voice Agent, and toggle it on. The SDK is injected on the next code save.
3

Call window.genmb.voice.start()

Wire a button to window.genmb.voice.start() to begin a session. The SDK requests microphone permission, opens a WebRTC connection, and starts streaming.
4

Handle stop and errors

Use window.genmb.voice.stop() to end the session and the onerror callback to surface plan-gating, rate-limit, or microphone permission errors to the user.

Costs & Plan Gating

Voice sessions are metered per session start, not per turn. Once a session is open, all turns within it are covered by the same charge.

Cost per session5 credits
Charged toApp owner, not the end user of the deployed app
Minimum planPro
Failed sessionsNo charge - credits only deduct after the WebRTC handshake succeeds
See the Credits documentation for top-up packs and how the team credit pool works on Business plans.

Rate Limits

Voice rate limits are enforced per app to protect the upstream realtime endpoint and keep one runaway client from consuming the shared pool.

Sessions per hour, per app20
WindowRolling 60-minute window (Redis-backed)
Over-limit responseHTTP 429 - surface to users with a brief retry message
For high-volume voice apps, design the UI so users explicitly start a session via a button rather than auto-starting on page load, this keeps idle visitors from consuming sessions.

Security

Voice sessions are designed so secrets never reach the browser and audio is handled in-flight only.

Ephemeral tokens

Browser receives a short-lived session token; the underlying Azure OpenAI API key stays on the GenMB backend.

No persistence

Audio streams are not stored. Sessions are stateless - turn audio is processed and discarded.

WebRTC only

Audio uses encrypted SRTP. No plain HTTP audio paths exist.

Microphone permission

Browser-native permission gate, your app cannot bypass user consent.

If you capture transcripts client-side, treat them as user data and follow your privacy policy. GenMB does not transcribe or store audio on the backend.

FAQs

What model powers the Voice Agent?
The Voice Agent uses Azure OpenAI Realtime with the gpt-realtime-1.5 model. Audio is streamed bidirectionally over WebRTC for sub-second turn-taking, you do not see a typing indicator and waiting for full text generation between turns.
Do I need to provide an API key?
No. Voice sessions route through GenMB's backend proxy with ephemeral session tokens. Your Azure OpenAI keys never reach the browser.
How much does a voice session cost?
Each session creation costs 5 credits, charged to the app owner, not the end user of the deployed app. Idle browser tabs that never start a session cost nothing.
What plans support Voice Agent?
Voice Agent is available on the Pro and Business plans. Free plan apps can include the SDK but session creation will fail with a plan-gated error until the owner upgrades.
Are conversations recorded?
No. Audio streams are not persisted. Sessions are stateless - turn audio is processed in-flight and discarded. If you need transcripts, capture them client-side from the SDK callbacks.

Ready to build?

Create your first app for free, no credit card required.