Technical docs

Realtime Voice

Overview

Support reference for the current realtime voice server and client flows.

Runtime entrypoints

  • HTTP/WebSocket runtime:
  • `backend/src/realtimeServer.ts`
  • WebSocket orchestration:
  • `backend/src/components/realtime/RealtimeProxy.ts`
  • Provider/runtime implementations:
  • `backend/src/components/realtime/agents/realtime/OpenAIRealtimeAgent.ts`
  • `backend/src/components/realtime/agents/realtime/GoogleAIRealtimeAgent.ts`
  • `backend/src/components/realtime/agents/tts/TtsAgent.ts`
  • WebSocket path:
  • `/chat/realtime?session={sessionId}`
  • Health endpoint:
  • `GET /chat/realtime/health`

Session creation endpoints

  • Public webchat voice:
  • `POST /chat/realtime/create-session/{employeeId}`
  • The widget currently passes its configured `eventId` in that path slot.
  • Internal skill voice:
  • `POST /chat/realtime/skill/create-session/{organization_id}/{skill_id}`
  • Internal AI teammate voice:
  • `POST /chat/realtime/employee/create-session/{organization_id}/{employee_id}`
  • Internal Ally voice:
  • `POST /chat/realtime/ally/create-session/{organization_id}`

Request rules and responses

  • Employee and Ally session creation require exactly one of:
  • `conversation_id`
  • `create_new_conversation`
  • Public webchat session creation can include:
  • `contact`
  • `multi_conversations`
  • Successful create-session responses include:
  • `sessionId`
  • `isAudioAutoTurnOn`
  • Employee and Ally responses also include `conversation_id`.
  • Session IDs are one-time use and currently match the created workflow run ID.
  • Pending session configs are stored in Redis with a 20-minute TTL and deleted on the first websocket connect.
  • All realtime session routes reject `structured` workflows. The published starting step must resolve to an `agent` step with a configured TTS-capable employee.
  • Supported `locale` values are currently:
  • `en-US`
  • `ru-RU`
  • `es-ES`
  • default fallback is `en-US`

Provider routing

  • If the selected TTS model is realtime and the provider is Google:
  • `GoogleAIRealtimeAgent`
  • `GeminiLiveVoice`
  • If the selected TTS model is realtime and not Google:
  • `OpenAIRealtimeAgent`
  • If the TTS model is non-realtime but still TTS-capable:
  • `TtsAgent`
  • OpenAI realtime STT is used when an STT model is configured
  • The websocket runtime rebuilds a `RequestContext` from `workflowRun.state.runtimeContext` and adds `organizationId`, `employeeId`, and `runId` before the session starts.

Audio and message flow

  • Clients send binary microphone audio over the websocket.
  • The websocket wire format expects mono signed `16-bit PCM` audio at `24kHz`.
  • The internal frontend explicitly resamples to `24kHz` before sending.
  • Clients can also send JSON:
  • `type: "user_message"`
  • `data.text`
  • optional `data.attachments`
  • The server validates attachments for websocket `user_message` payloads before processing them.
  • Attachment guardrails on websocket messages:
  • maximum 5 files
  • maximum 10 MB each
  • allowed MIME families: images, PDF, plain text, XML, and SQLite

Server -> client events

  • The realtime websocket currently sends JSON events such as:
  • `session.created`
  • `speech_started`
  • `speech_stopped`
  • `response.cancelled`
  • `error`
  • Audio responses are streamed back as binary websocket frames.
  • `TtsAgent` additionally sends text-sidecar events such as:
  • `text` with `role: user|assistant`
  • `speaking.done`
  • Internal chat also listens to org websocket events from the TTS path:
  • `voice_thinking_started`
  • `voice_thinking_ended`

Automation and takeover behavior

  • Realtime agents refresh automation state from the conversation while the session is running.
  • If automation is disabled on the conversation, the voice session stays connected and still saves user messages, but skips AI response generation.
  • If the organization is out of tokens when voice starts, the server sets the conversation to `waiting_for_operator` and closes the realtime websocket with code `4008`.
  • Native realtime paths also re-check token usage after each response and close with `4008` when the balance reaches zero.

Start building your AI team