Technical docs
Deployment & Data Residency
Overview
Alloy can run as a fully managed service or be self-hosted on your own infrastructure. This document explains the deployment options available to you across two dimensions: where your infrastructure and data live, and where model (LLM) calls run. Together they let you balance speed of setup against control over your data.
Where your infrastructure and data live
Choose the level of isolation that matches your security and compliance requirements.
| Option | What it is | Who operates it | Availability |
|---|---|---|---|
| Shared Cloud | Runs on Alloy's shared, multi-tenant infrastructure | Alloy | All plans |
| Dedicated VPC | Isolated network boundary for your organization; runtime and data are not shared with other customers | Alloy | Enterprise |
| Bring Your Own Cloud (BYOC) | The full Alloy platform is deployed inside your cloud account (AWS, GCP, or Azure); your business data stays in your cloud account | Alloy (remotely, in your cloud) | Enterprise |
| On-Premises | Alloy ships as a Docker container you install on your own servers or data center; no dependency on any Alloy-operated infrastructure | You (self-managed) | Enterprise |
Dedicated VPC, BYOC, and On-Premises are available on Enterprise plans.
For every option except On-Premises, Alloy manages provisioning, scaling, and monitoring — so increasing data control does not add operational burden on your team. With On-Premises, your team operates the platform.
Model (LLM) calls
These options govern the LLM calls Alloy makes itself — both when running Alloy-managed agents and when a workflow calls a model. You control where those calls run.
External agents connected to Alloy (Claude, Codex, Cursor, and similar) run on their own LLM endpoints.
All options are available on all plans.
| Option | Where calls run | Keys & billing | What reaches the endpoint |
|---|---|---|---|
| Alloy-Managed LLM | Models and infrastructure provided by Alloy (default) | Alloy | All model-visible data is processed through Alloy's infrastructure and model providers |
| Your Own Provider | A commercial provider you connect with your own keys — AWS Bedrock, Google Vertex, Azure OpenAI, or an aggregator such as OpenRouter | You (your keys / account) | All model-visible data is sent to the provider you choose, under your account and contracts — a public endpoint outside your network |
| Self-Hosted Models | Open-weight models you run on your own hardware via a local inference server — Ollama, vLLM, LM Studio, or TGI | You (your hardware) | All model-visible data stays inside your environment; nothing is sent to an external provider |
How the two dimensions work together
- Self-Hosted Models pair naturally with BYOC or On-Premises — both your data and your model calls stay inside your environment, with nothing sent to an external provider.
- Your Own Provider keeps billing and governance under your account, but model calls still run on a public endpoint outside your network.
- Shared Cloud uses the Alloy-Managed LLM by default for the simplest possible start.
- Self-Hosted Models require Alloy to reach your inference server, so with Shared Cloud or Dedicated VPC that endpoint must be network-reachable from Alloy.