Technical docs
Troubleshooting Guide
How to troubleshoot
- Reproduce from system state first: workflow runs, step logs, conversation records, and current org/teammate settings.
- If AI acted unexpectedly, checking workflow runs and logs is mandatory.
- Ask the user to perform checks only after you have exhausted the checks available from the system side.
High-signal checks
AI not responding
- Check org token balance first.
- If the org uses its own model keys, verify the relevant key/model still exists and is valid.
- Confirm the workflow has a published `flow` and `starting_step_id`.
- For teammate chat, confirm the teammate still resolves a valid main chat model after org defaults and teammate overrides are applied.
Widget issues
- Confirm the snippet `eventId` belongs to the intended AI teammate or trigger.
- If the snippet uses the wrong host, check the org Webchat widget-domain setting plus `API_DOMAIN` and `API_PROXY_DOMAIN`.
- If the widget is blocked on a network, switching the org to the proxy widget domain is the supported fallback.
- Current feedback mismatch: the widget runtime still calls `/api/public/webchat/messages/{message_id}/reaction`, while the backend route is `/api/public/webchat/messages/{message_id}/feedback`.
- If reactions look stale in the widget, also check websocket updates: the current widget handler does not fully reconcile reaction fields from `webchat.message_updated`.
Prompt/behavior issues
- Teammate-level instructions affect the default chat behavior.
- A prompt inside an Alloy AI node affects only that workflow step.
- Defaults and prompt pages are frontend developer-only, but the backend `/api/admin/*` routes are admin-gated. Missing UI and failing API access are not the same problem.
Voice issues
- The teammate needs a valid main model, `tts_model`, and compatible `voice`.
- `stt_model` is required for the TTS/STT voice path.
- Realtime voice can run without `stt_model`, but it still needs valid main-model and TTS-model resolution.
- Realtime health is exposed from the separate server at `GET /health`; it reports service status plus active session count.
- Realtime session creation and websocket admission validate different things in different layers. If a session is created but the websocket fails, inspect both the create-session route and `RealtimeProxy` checks.
Escalation and assignment issues
- AI escalation sets the conversation status to `waiting_for_operator`.
- Assign and unassign actions create operator system logs.
- Webchat conversations also emit `webchat.conversation_updated` on assign/unassign.
Internal chat issues
- Internal chat conversation channels are `skill_chat`, `employee_chat`, `ally_chat`, and `test_chat`.
- Internal chat attachments are limited to 5 files, 10 MB each.
- Allowed attachment MIME families are images, PDF, plain text, XML, and SQLite.
Feedback issues
- The feedback API accepts `like`, `dislike`, or `null`.
- Feedback is blocked for contact/operator messages.
- Assistant and system messages can still be eligible.
API trigger issues
- Use workflow runs first, then any trigger/snippet metadata second.
- Do not assume a trigger record's `step_id` controls execution. Current runtime starts from the workflow's published `flow` and `starting_step_id`.
- Treat `trigger_id`, `trigger_url`, and snippet `eventId` values as invocation breadcrumbs, not as the source of the execution graph.
- Current code creates `logs/webhooks/`, but the public trigger route does not append request bodies there. Do not rely on webhook-body log files.
Workflow and scheduler issues
- Distinguish job status from workflow-run status. A job marked `completed` only means the workflow run was created and queued successfully.
- If a scheduler appears to have fired with no useful outcome, inspect both the scheduler row and the linked workflow run. `last_scheduled_at` is advanced when the daemon claims the scheduler, even if no downstream run is created.
- API-triggered runs are labeled `API` mostly on the frontend side. Backend `channel=api` filtering is not fully reliable.
Omni issues
- The current Omni queue backend is still hard-limited to `api` and `web_chat` conversations.
- If the UI channel-category toggle suggests broader coverage, treat it as presentation-only and verify the underlying queue response before debugging deeper.
Storage issues
- AI teammates can only access folders explicitly shared with them.
- Storage uploads allow: `png`, `jpg`, `jpeg`, `gif`, `webp`, `svg`, `pdf`, `docx`, `xlsx`, `pptx`, `txt`, `htm`, `html`, `md`, `csv`, `json`.
- Max upload size is 10 MB per file.
- Individual files cannot be uploaded to storage root. Upload them inside a folder, or upload a folder tree at root.
- Folder drag-and-drop depends on `webkitGetAsEntry`, so reproduce folder-drop issues in Chromium first.
- The docs do not currently confirm a separate backend size cap for `PUT /storage/file/content`; avoid asserting one without reproducing it from code or runtime.
Access checks
- Developer-only: Models, Voices, Prompts, Defaults, and admin APIs.
- Org-admin gated: Skills and Logs.
- If controls are missing, verify the user's role before debugging the UI.