Technical docs
Troubleshooting Guide
Public reference generated from tech docs/troubleshooting.md.
How to troubleshoot
1. Reproduce from system state first: workflow runs, step logs, conversation records, and current org/teammate settings. 2. If AI acted unexpectedly, checking workflow runs and logs is mandatory. 3. Ask the user to perform checks only after you have exhausted the checks available from the system side.
High-signal checks
### AI not responding - Check org token balance first. - If the org uses its own model keys, verify the relevant key/model still exists and is valid. - Confirm the workflow has a published `flow` and `starting_step_id`. - For teammate chat, confirm the teammate still resolves a valid main chat model after org defaults and teammate overrides are applied.
### Widget issues - Confirm the snippet `eventId` belongs to the intended AI teammate or trigger. - If the snippet uses the wrong host, check the org Webchat widget-domain setting plus `API_DOMAIN` and `API_PROXY_DOMAIN`. - If the widget is blocked on a network, switching the org to the proxy widget domain is the supported fallback. - Current feedback mismatch: the widget runtime still calls `/api/public/webchat/messages/{message_id}/reaction`, while the backend route is `/api/public/webchat/messages/{message_id}/feedback`. - If reactions look stale in the widget, also check websocket updates: the current widget handler does not fully reconcile reaction fields from `webchat.message_updated`.
### Prompt/behavior issues - Teammate-level instructions affect the default chat behavior. - A prompt inside an Alloy AI node affects only that workflow step. - Defaults and prompt pages are frontend developer-only, but the backend `/api/admin/*` routes are admin-gated. Missing UI and failing API access are not the same problem.
### Voice issues - The teammate needs a valid main model, `tts_model`, and compatible `voice`. - `stt_model` is required for the TTS/STT voice path. - Realtime voice can run without `stt_model`, but it still needs valid main-model and TTS-model resolution. - Realtime health is exposed from the separate server at `GET /health`; it reports service status plus active session count. - Realtime session creation and websocket admission validate different things in different layers. If a session is created but the websocket fails, inspect both the create-session route and `RealtimeProxy` checks.
### Escalation and assignment issues - AI escalation sets the conversation status to `waiting_for_operator`. - Assign and unassign actions create operator system logs. - Webchat conversations also emit `webchat.conversation_updated` on assign/unassign.
### Internal chat issues - Internal chat conversation channels are `skill_chat`, `employee_chat`, `ally_chat`, and `test_chat`. - Internal chat attachments are limited to 5 files, 10 MB each. - Allowed attachment MIME families are images, PDF, plain text, XML, and SQLite.
### Feedback issues - The feedback API accepts `like`, `dislike`, or `null`. - Feedback is blocked for contact/operator messages. - Assistant and system messages can still be eligible.
### API trigger issues - Use workflow runs first, then any trigger/snippet metadata second. - Do not assume a trigger record's `step_id` controls execution. Current runtime starts from the workflow's published `flow` and `starting_step_id`. - Treat `trigger_id`, `trigger_url`, and snippet `eventId` values as invocation breadcrumbs, not as the source of the execution graph. - Current code creates `logs/webhooks/`, but the public trigger route does not append request bodies there. Do not rely on webhook-body log files.
### Workflow and scheduler issues - Distinguish job status from workflow-run status. A job marked `completed` only means the workflow run was created and queued successfully. - If a scheduler appears to have fired with no useful outcome, inspect both the scheduler row and the linked workflow run. `last_scheduled_at` is advanced when the daemon claims the scheduler, even if no downstream run is created. - API-triggered runs are labeled `API` mostly on the frontend side. Backend `channel=api` filtering is not fully reliable.
### Omni issues - The current Omni queue backend is still hard-limited to `api` and `web_chat` conversations. - If the UI channel-category toggle suggests broader coverage, treat it as presentation-only and verify the underlying queue response before debugging deeper.
### Storage issues - AI teammates can only access folders explicitly shared with them. - Storage uploads allow: `png`, `jpg`, `jpeg`, `gif`, `webp`, `svg`, `pdf`, `docx`, `xlsx`, `pptx`, `txt`, `htm`, `html`, `md`, `csv`, `json`. - Max upload size is 10 MB per file. - Individual files cannot be uploaded to storage root. Upload them inside a folder, or upload a folder tree at root. - Folder drag-and-drop depends on `webkitGetAsEntry`, so reproduce folder-drop issues in Chromium first. - The docs do not currently confirm a separate backend size cap for `PUT /storage/file/content`; avoid asserting one without reproducing it from code or runtime.
Access checks
- Developer-only: Models, Voices, Prompts, Defaults, and admin APIs. - Org-admin gated: Skills and Logs. - If controls are missing, verify the user's role before debugging the UI.