Deployment Guide
Deploy Agent Swarm to production with Docker Compose — volumes, networking, secrets, and persistent storage
Docker Compose (Recommended)
The easiest way to deploy a full swarm with API, workers, and lead agent. You can go from zero to a running swarm in under 5 minutes.
Prerequisites
Before you start, make sure you have:
- Docker & Docker Compose installed (install guide)
- A Claude Code OAuth token — run
claude setup-tokenin your terminal to get one - An API key — any secret string you choose (all services share this key for authentication)
Step 1: Download the Compose File
curl -O https://raw.githubusercontent.com/desplega-ai/agent-swarm/main/docker-compose.example.yml
mv docker-compose.example.yml docker-compose.ymlOr if you have the repo cloned:
cp docker-compose.example.yml docker-compose.ymlStep 2: Create Your .env File
Create a .env file in the same directory as docker-compose.yml:
# ---- Required ----
API_KEY=your-secret-api-key
CLAUDE_CODE_OAUTH_TOKEN=your-oauth-token # Run `claude setup-token` to get this
SECRETS_ENCRYPTION_KEY= # Run `openssl rand -base64 32` and paste here. See "Encryption Key" below.
# ---- Optional ----
GITHUB_TOKEN=ghp_xxxx # For git operations inside agents
GITHUB_EMAIL=you@example.com
GITHUB_NAME=Your Name
SWARM_URL=localhost # Base domain for service discoveryYou can pass multiple OAuth tokens for load balancing: CLAUDE_CODE_OAUTH_TOKEN=token1,token2,token3
If you leave SECRETS_ENCRYPTION_KEY blank on a brand-new deploy, the API will auto-generate one and write it to the swarm_api volume at .encryption-key. For production, set it explicitly so you control where it lives and can back it up alongside your other secrets. See Encryption Key.
Step 3: Generate Stable Agent IDs
Each agent needs a stable UUID that persists across restarts. This is critical for task resume — if an agent restarts, it uses its AGENT_ID to pick up paused tasks.
The example compose file has placeholder UUIDs. Replace them with your own:
# Generate one UUID per agent
uuidgen # → paste into lead AGENT_ID
uuidgen # → paste into worker-1 AGENT_ID
uuidgen # → paste into worker-2 AGENT_IDEdit docker-compose.yml and replace each service's AGENT_ID value.
Step 4: Start the Swarm
docker compose up -dThe API service starts first. Workers and lead wait until the API health check passes before starting (via depends_on: condition: service_healthy).
Step 5: Verify It's Running
# Check all services are up
docker compose ps
# Check API health
curl http://localhost:3013/health
# List registered agents (replace YOUR_API_KEY with your actual key)
curl -s -H "Authorization: Bearer YOUR_API_KEY" \
http://localhost:3013/api/agents | jq '.agents[] | {name, status, isLead}'You should see the lead and workers listed with status: "idle".
If you are deploying Codex workers with ChatGPT OAuth instead of OPENAI_API_KEY, follow Provider Auth: Codex OAuth after the API is up, then restart those workers.
What's Included
The example docker-compose.yml sets up:
| Service | Role | Port | Template |
|---|---|---|---|
| api | MCP HTTP server + SQLite DB | 3013 | — |
| lead | Coordinator agent | 3020 | official/lead |
| worker-1 | Task executor | 3021 | official/coder |
| worker-2 | Task executor | 3022 | official/coder |
| worker-content-writer | Content specialist | 3026 | official/content-writer |
| worker-content-reviewer | Content reviewer | 3027 | official/content-reviewer |
| worker-content-strategist | Content strategist | 3028 | official/content-strategist |
The content agents are optional — remove them from docker-compose.yml if you don't need content workflows.
Volumes & Persistence
The swarm uses Docker named volumes to persist data across restarts and upgrades. Getting volumes right is essential — without them, you lose your database and agent workspaces on every restart.
Volume Architecture
Docker Volume → Container Path → What It Stores
─────────────────────────────────────────────────────────────────────
swarm_api → /app → SQLite DB (all swarm state)
swarm_logs → /logs → Session logs (all agents)
swarm_shared → /workspace/shared → Shared workspace (all agents)
swarm_lead → /workspace/personal → Lead's private workspace
swarm_worker_1 → /workspace/personal → Worker 1's private workspace
swarm_worker_2 → /workspace/personal → Worker 2's private workspaceWhat Each Volume Stores
| Volume | Critical? | Backup? | Description |
|---|---|---|---|
swarm_api | Yes | Yes | Contains the SQLite database (agent-swarm-db.sqlite) with all tasks, agents, schedules, and configuration. Losing this = losing all swarm state. |
swarm_logs | No | Optional | Session logs from all agents. Useful for debugging. Can be recreated. |
swarm_shared | Moderate | Recommended | Shared workspace. Agents store research, plans, and memory under /workspace/shared/{thoughts,memory,downloads,misc}/$AGENT_ID. All agents can read each other's files. |
swarm_<agent> | Low | No | Personal workspace per agent (/workspace/personal). Isolated — only that agent can see it. Contains cloned repos, working files, and agent-specific state. |
How Workspaces Work
Each agent container has two workspace directories:
-
/workspace/shared— Mounted fromswarm_shared. All agents share this volume. By convention, each agent writes only to its own subdirectory (e.g.,/workspace/shared/thoughts/<agent-id>/) but can read from any agent's directory. This is how agents share research, plans, and context. -
/workspace/personal— Mounted from a per-agent volume (e.g.,swarm_worker_1). Only that agent can see this. Used for cloned git repos, working files, and private state.
Backing Up the Database
The swarm_api volume contains your SQLite database — the single source of truth for all swarm state. Back it up regularly.
# Back up the database
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite
# Restore from backup
docker compose down
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
cp /backup/agent-swarm-db-backup.sqlite /app/agent-swarm-db.sqlite
docker compose up -dA database backup is useless without the matching encryption key. Back up both together.
Encryption Key
Agent Swarm encrypts all swarm_config rows marked as secrets (OAuth tokens, API keys, webhook signing secrets) at rest using AES-256-GCM. The master key is resolved on every boot in this order:
SECRETS_ENCRYPTION_KEYenv var (recommended for production)SECRETS_ENCRYPTION_KEY_FILEenv var (path to a file containing the key — useful with Docker secrets or k8sSecretvolume mounts)<data-dir>/.encryption-keyfile on the API's data volume- Auto-generated on first boot only when the database does not yet contain any encrypted secret rows. Existing databases with encrypted rows fail closed if no key is found, instead of silently generating a different key and orphaning your secrets.
Generating a key
Use either format — both are accepted everywhere (env var, file, on-disk):
# Recommended: 43-character base64 string
openssl rand -base64 32
# Equivalent: 64-character hex string
openssl rand -hex 32Do NOT use openssl rand -base64 39 (or any size other than 32). That produces a 52-character string that decodes to 39 bytes, which the server will reject at boot with Invalid encryption key ... got 39 bytes after base64 decode. The number passed to openssl rand is the decoded byte count, and AES-256 requires exactly 32.
Backing up the key
The encryption key is just as critical as your database backup. Losing the key while keeping the database means you lose every encrypted secret with no recovery path — you would have to manually re-add every OAuth token, API key, and webhook secret in the swarm.
Back up both together, every time:
# Back up DB and key in one shot (Docker Compose deployment)
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine sh -c '
cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite &&
cp /app/.encryption-key /backup/encryption-key-backup 2>/dev/null || true
'If you set SECRETS_ENCRYPTION_KEY via env var instead, store the value itself in your secrets manager (1Password, Vault, AWS Secrets Manager, etc.) — treat it with the same rigor as your database backup.
Common mistakes
| Mistake | Symptom | Fix |
|---|---|---|
Used openssl rand -base64 39 | got 39 bytes after base64 decode on boot | Regenerate with -base64 32 (or -hex 32) |
Wrapped value in quotes in .env | Same as above (extra characters change the decoded length) | Remove the quotes — .env lines are KEY=value, no quoting needed |
| Trailing newline or whitespace | Same as above | The server trims surrounding whitespace, but if your secrets manager injected embedded characters, regenerate |
| Changed key between deploys | Decryption errors when reading existing secrets | Restore the original key — key rotation is not yet supported (planned) |
| Lost the key entirely | Decryption errors on every secret read | Manually delete encrypted rows via the dashboard or swarm_config API and re-add them under the new key |
Reserved key names: SECRETS_ENCRYPTION_KEY and API_KEY are blocked from being stored in the DB config store (HTTP, MCP, and direct DB layers all reject them, case-insensitive). They must come from the environment.
First-time migration from plaintext secrets
If you upgraded from a pre-1.67 deploy that stored secrets in plaintext without setting SECRETS_ENCRYPTION_KEY ahead of time, the API auto-generates a .encryption-key on the data volume and writes a one-time plaintext backup at <db-path>.backup.secrets-YYYY-MM-DD.env before encrypting the existing rows. Delete that backup file immediately after verifying your new encryption key is safely backed up — it contains every secret in plaintext.
Environment Variables
These are the key variables for Docker Compose deployment. For the complete reference, see Environment Variables.
Required (in .env)
| Variable | Description |
|---|---|
API_KEY | Shared authentication key between API and agents |
CLAUDE_CODE_OAUTH_TOKEN | OAuth token from claude setup-token. Supports comma-separated values for load balancing. |
SECRETS_ENCRYPTION_KEY | Master key for encrypting swarm_config secrets at rest. See Encryption Key for generation and backup guidance. |
Per-Agent (in docker-compose.yml)
| Variable | Description |
|---|---|
AGENT_ID | Stable UUID. Keep the same across restarts for task resume. |
AGENT_ROLE | lead or worker |
TEMPLATE_ID | Template for initial profile (e.g., official/coder, official/lead). Applied on first boot only. |
MCP_BASE_URL | API server URL. Use http://api:3013 when in the same Docker network. |
Optional (in .env)
| Variable | Description |
|---|---|
GITHUB_TOKEN | Personal access token for git operations |
GITHUB_EMAIL | Git commit email |
GITHUB_NAME | Git commit name |
SWARM_URL | Base domain for service discovery (default: localhost) |
SLACK_BOT_TOKEN | Enable Slack integration (also needs SLACK_APP_TOKEN) |
SLACK_DISABLE | Set to true to disable Slack (default: false) |
Secrets Encryption
Starting with v1.67.0, swarm_config secrets are encrypted at rest using AES-256-GCM. The docker-compose.example.yml includes a Docker secrets block for the encryption key:
# Generate the key file (one-time)
openssl rand -base64 32 > ./encryption_key
chmod 600 ./encryption_keyThe compose file mounts this as a Docker secret at /run/secrets/encryption_key and sets SECRETS_ENCRYPTION_KEY_FILE accordingly. If you prefer an inline env var, use SECRETS_ENCRYPTION_KEY instead.
Back up the encryption key alongside your SQLite database. Losing it means losing all encrypted secrets with no recovery path. See Environment Variables — Secrets Encryption for details.
Adding More Workers
To scale the swarm, copy an existing worker block in docker-compose.yml:
- Give it a new service name (e.g.,
worker-3) - Generate a new
AGENT_IDUUID:uuidgen - Pick a new host port (e.g.,
3023:3000) - Add a new personal volume (e.g.,
swarm_worker_3:/workspace/personal) - Declare the new volume at the bottom of the file under
volumes:
worker-3:
image: "ghcr.io/desplega-ai/agent-swarm-worker:latest"
platform: linux/amd64
pull_policy: always
stop_grace_period: 60s
depends_on:
api:
condition: service_healthy
environment:
- CLAUDE_CODE_OAUTH_TOKEN=${CLAUDE_CODE_OAUTH_TOKEN}
- API_KEY=${API_KEY}
- AGENT_ID=YOUR-NEW-UUID-HERE
- AGENT_ROLE=worker
- TEMPLATE_ID=official/coder
- MCP_BASE_URL=http://api:3013
- YOLO=true
- GITHUB_TOKEN=${GITHUB_TOKEN:-}
- GITHUB_EMAIL=${GITHUB_EMAIL:-}
- GITHUB_NAME=${GITHUB_NAME:-}
- SWARM_URL=${SWARM_URL:-localhost}
ports:
- "3023:3000"
volumes:
- swarm_logs:/logs
- swarm_shared:/workspace/shared
- swarm_worker_3:/workspace/personal
restart: unless-stopped
volumes:
# ... existing volumes ...
swarm_worker_3:If using HARNESS_PROVIDER=pi for this worker, do not include CLAUDE_CODE_OAUTH_TOKEN — pass OPENROUTER_API_KEY or ANTHROPIC_API_KEY instead. Claude credentials in the environment will override the pi-mono provider. See Harness Configuration.
Then:
docker compose up -d worker-3ARM Compatibility (Apple Silicon)
All services include platform: linux/amd64 to avoid no matching manifest for linux/arm64/v8 errors on Apple Silicon Macs. The Docker images are built for linux/amd64 and run via Rosetta emulation.
Graceful Shutdown & Task Resume
The docker-compose example uses stop_grace_period: 60s to allow graceful task pause during deployments. When a container receives SIGTERM:
- In-progress tasks are paused (not failed)
- Task state and progress are preserved
- After restart, paused tasks are automatically resumed with context
Configuration
# Grace period before force-pausing tasks (milliseconds)
SHUTDOWN_TIMEOUT=30000
# Docker compose stop grace period (should be >= SHUTDOWN_TIMEOUT + buffer)
stop_grace_period: 60sResume Behavior
When a worker starts, it:
- Registers with the MCP server
- Checks for paused tasks assigned to its
AGENT_ID - Resumes each paused task with original context and progress
Best Practices
- Use stable Agent IDs — Set explicit
AGENT_IDfor each worker - Save progress regularly — Workers should call
store-progressduring long tasks - Test deployments — Verify tasks resume correctly in staging first
Docker Worker (Standalone)
Run individual Claude workers in containers without Compose.
Pull from Registry
docker pull ghcr.io/desplega-ai/agent-swarm-worker:latestRun
docker run --rm -it \
-e CLAUDE_CODE_OAUTH_TOKEN=your-token \
-e API_KEY=your-api-key \
-v ./logs:/logs \
-v ./work:/workspace \
ghcr.io/desplega-ai/agent-swarm-workerWith Custom System Prompt
docker run --rm -it \
-e CLAUDE_CODE_OAUTH_TOKEN=your-token \
-e API_KEY=your-api-key \
-e SYSTEM_PROMPT="You are a Python specialist" \
-v ./logs:/logs \
-v ./work:/workspace \
ghcr.io/desplega-ai/agent-swarm-workerServer Deployment (systemd)
Deploy the MCP server to a Linux host with systemd.
Prerequisites
- Linux with systemd
- Bun installed (
curl -fsSL https://bun.sh/install | bash)
Install
git clone https://github.com/desplega-ai/agent-swarm.git
cd agent-swarm
sudo bun deploy/install.tsThis will:
- Copy files to
/opt/agent-swarm - Create
.envfile (edit to setAPI_KEY) - Install systemd service with health checks every 30s
- Start the service on port 3013
Management
sudo systemctl status agent-swarm # Check status
sudo journalctl -u agent-swarm -f # View logs
sudo systemctl restart agent-swarm # Restart
sudo systemctl stop agent-swarm # StopUpdate
git pull
sudo bun deploy/update.tsRelated
- Getting Started — Initial setup and first deployment
- Environment Variables — All configuration options
- Architecture Overview — How the system components fit together
- Task Lifecycle — Graceful shutdown and task resume behavior