Technical docs

Troubleshooting Guide

How to troubleshoot

Reproduce from system state first: workflow runs, step logs, conversation records, and current org/teammate settings.
If AI acted unexpectedly, checking workflow runs and logs is mandatory.
Ask the user to perform checks only after you have exhausted the checks available from the system side.

Check org token balance first.
If the org uses its own model keys, verify the relevant key/model still exists and is valid.
Confirm the workflow has a published `flow` and `starting_step_id`.
For teammate chat, confirm the teammate still resolves a valid main chat model after org defaults and teammate overrides are applied.

Confirm the snippet `eventId` belongs to the intended AI teammate or trigger.
If the snippet uses the wrong host, check the org Webchat widget-domain setting plus `API_DOMAIN` and `API_PROXY_DOMAIN`.
If the widget is blocked on a network, switching the org to the proxy widget domain is the supported fallback.
Current feedback mismatch: the widget runtime still calls `/api/public/webchat/messages/{message_id}/reaction`, while the backend route is `/api/public/webchat/messages/{message_id}/feedback`.
If reactions look stale in the widget, also check websocket updates: the current widget handler does not fully reconcile reaction fields from `webchat.message_updated`.

Teammate-level instructions affect the default chat behavior.
A prompt inside an Alloy AI node affects only that workflow step.
Defaults and prompt pages are frontend developer-only, but the backend `/api/admin/*` routes are admin-gated. Missing UI and failing API access are not the same problem.

The teammate needs a valid main model, `tts_model`, and compatible `voice`.
`stt_model` is required for the TTS/STT voice path.
Realtime voice can run without `stt_model`, but it still needs valid main-model and TTS-model resolution.
Realtime health is exposed from the separate server at `GET /health`; it reports service status plus active session count.
Realtime session creation and websocket admission validate different things in different layers. If a session is created but the websocket fails, inspect both the create-session route and `RealtimeProxy` checks.

AI escalation sets the conversation status to `waiting_for_operator`.
Assign and unassign actions create operator system logs.
Webchat conversations also emit `webchat.conversation_updated` on assign/unassign.

Internal chat conversation channels are `skill_chat`, `employee_chat`, `ally_chat`, and `test_chat`.
Internal chat attachments are limited to 5 files, 10 MB each.
Allowed attachment MIME families are images, PDF, plain text, XML, and SQLite.

If a person tries to talk to an AI teammate through Telegram and the teammate does not respond, check whether that Telegram channel or direct chat is allowed for the teammate. A likely cause is that the channel/chat was not added to the whitelist. Ask the user to enable it in Alloy.

Use workflow runs first, then any trigger/snippet metadata second.
Do not assume a trigger record's `step_id` controls execution. Current runtime starts from the workflow's published `flow` and `starting_step_id`.
Treat `trigger_id`, `trigger_url`, and snippet `eventId` values as invocation breadcrumbs, not as the source of the execution graph.
Current code creates `logs/webhooks/`, but the public trigger route does not append request bodies there. Do not rely on webhook-body log files.

Distinguish job status from workflow-run status. A job marked `completed` only means the workflow run was created and queued successfully.
If a scheduler appears to have fired with no useful outcome, inspect both the scheduler row and the linked workflow run. `last_scheduled_at` is advanced when the daemon claims the scheduler, even if no downstream run is created.
API-triggered runs are labeled `API` mostly on the frontend side. Backend `channel=api` filtering is not fully reliable.

The current Omni queue backend is still hard-limited to `api` and `web_chat` conversations.
If the UI channel-category toggle suggests broader coverage, treat it as presentation-only and verify the underlying queue response before debugging deeper.

AI teammates can only access folders explicitly shared with them.
Storage uploads allow: `png`, `jpg`, `jpeg`, `gif`, `webp`, `svg`, `pdf`, `docx`, `xlsx`, `pptx`, `txt`, `htm`, `html`, `md`, `csv`, `json`.
Max upload size is 10 MB per file.
Individual files cannot be uploaded to storage root. Upload them inside a folder, or upload a folder tree at root.
Folder drag-and-drop depends on `webkitGetAsEntry`, so reproduce folder-drop issues in Chromium first.
The docs do not currently confirm a separate backend size cap for `PUT /storage/file/content`; avoid asserting one without reproducing it from code or runtime.