Deploy Agent Swarm to production with Docker Compose — volumes, networking, secrets, and persistent storage

Docker Compose (Recommended)

The easiest way to deploy a full swarm with API, workers, and lead agent. You can go from zero to a running swarm in under 5 minutes.

Prerequisites

Before you start, make sure you have:

Docker & Docker Compose installed (install guide)
A Claude Code OAuth token — run claude setup-token in your terminal to get one
An API key — any secret string you choose (all services share this key for authentication)

Step 1: Download the Compose File

curl -O https://raw.githubusercontent.com/desplega-ai/agent-swarm/main/docker-compose.example.yml
mv docker-compose.example.yml docker-compose.yml

Or if you have the repo cloned:

cp docker-compose.example.yml docker-compose.yml

Step 2: Create Your `.env` File

Create a .env file in the same directory as docker-compose.yml:

.env

# ---- Required ----
API_KEY=your-secret-api-key
CLAUDE_CODE_OAUTH_TOKEN=your-oauth-token   # Run `claude setup-token` to get this
SECRETS_ENCRYPTION_KEY=                     # Run `openssl rand -base64 32` and paste here. See "Encryption Key" below.

# ---- Optional ----
GITHUB_TOKEN=ghp_xxxx                      # For git operations inside agents
GITHUB_EMAIL=you@example.com
GITHUB_NAME=Your Name
SWARM_URL=localhost                         # Base domain for service discovery

You can pass multiple OAuth tokens for load balancing: CLAUDE_CODE_OAUTH_TOKEN=token1,token2,token3

If you leave SECRETS_ENCRYPTION_KEY blank on a brand-new deploy, the API will auto-generate one and write it to the swarm_api volume at .encryption-key. For production, set it explicitly so you control where it lives and can back it up alongside your other secrets. See Encryption Key.

Step 3: Generate Stable Agent IDs

Each agent needs a stable UUID that persists across restarts. This is critical for task resume — if an agent restarts, it uses its AGENT_ID to pick up paused tasks.

The example compose file has placeholder UUIDs. Replace them with your own:

# Generate one UUID per agent
uuidgen  # → paste into lead AGENT_ID
uuidgen  # → paste into worker-1 AGENT_ID
uuidgen  # → paste into worker-2 AGENT_ID

Edit docker-compose.yml and replace each service's AGENT_ID value.

Step 4: Start the Swarm

docker compose up -d

The API service starts first. Workers and lead wait until the API health check passes before starting (via depends_on: condition: service_healthy).

Step 5: Verify It's Running

# Check all services are up
docker compose ps

# Check API health
curl http://localhost:3013/health

# List registered agents (replace YOUR_API_KEY with your actual key)
curl -s -H "Authorization: Bearer YOUR_API_KEY" \
  http://localhost:3013/api/agents | jq '.agents[] | {name, status, isLead}'

You should see the lead and workers listed with status: "idle".

The bundled worker image already includes the Ubuntu runtime libraries required by Playwright's Chromium binary. Browser-driven tasks such as qa-use, Playwright smoke tests, and screenshot workflows no longer need a per-agent apt install workaround just to launch the bundled browser.

Startup-script privilege boundary

Since v1.106.0, per-agent setupScript / /workspace/start-up.* hooks run as the non-root worker user after the container drops privileges, and the worker image no longer grants blanket passwordless sudo. This was a security hardening change for the setup-script privilege boundary.

Use per-agent setup for project-local work such as bun install, generated config, cache warmup, repo bootstrap, or worker-owned global installs like bun i -g. For npm global installs, set an npm prefix under $HOME before installing. Move root-requiring steps such as Debian/Ubuntu package installs, writes under /usr/lib, service ownership changes, or local database bootstrap into the worker image, the admin-controlled global SETUP_SCRIPT config, or the built-in optional service toggles.

STARTUP_SCRIPT_STRICT defaults to false, so a failed per-agent startup script logs a migration warning and the worker continues booting. Set STARTUP_SCRIPT_STRICT=true if you want startup-script failures to stop the container.

If you are deploying Codex workers with ChatGPT OAuth instead of OPENAI_API_KEY, follow Provider Auth: Codex OAuth after the API is up, then restart those workers.

What's Included

The example docker-compose.yml sets up:

Service	Role	Port	Template
api	MCP HTTP server + SQLite DB	`3013`	—
lead	Coordinator agent	`3020`	`official/lead`
worker-1	Task executor	`3021`	`official/coder`
worker-2	Task executor	`3022`	`official/coder`
worker-content-writer	Content specialist	`3026`	`official/content-writer`
worker-content-reviewer	Content reviewer	`3027`	`official/content-reviewer`
worker-content-strategist	Content strategist	`3028`	`official/content-strategist`

The content agents are optional — remove them from docker-compose.yml if you don't need content workflows.

The example Compose files also include a co-deployed MinIO + agent-fs stack for persistent shared files and dashboard task attachments. See agent-fs Co-deployment for Helm and Compose configuration, migration notes, and verification commands.

Volumes & Persistence

The swarm uses Docker named volumes to persist data across restarts and upgrades. Getting volumes right is essential — without them, you lose your database and agent workspaces on every restart.

Volume Architecture

Docker Volume            → Container Path        → What It Stores
─────────────────────────────────────────────────────────────────────
swarm_api                → /app                  → SQLite DB (all swarm state)
swarm_logs               → /logs                 → Session logs (all agents)
swarm_shared             → /workspace/shared     → Shared workspace (all agents)
swarm_lead               → /workspace/personal   → Lead's private workspace
swarm_worker_1           → /workspace/personal   → Worker 1's private workspace
swarm_worker_2           → /workspace/personal   → Worker 2's private workspace

What Each Volume Stores

Volume	Critical?	Backup?	Description
`swarm_api`	Yes	Yes	Contains the SQLite database (`agent-swarm-db.sqlite`) with all tasks, agents, schedules, and configuration. Losing this = losing all swarm state.
`swarm_logs`	No	Optional	Session logs from all agents. Useful for debugging. Can be recreated.
`swarm_shared`	Moderate	Recommended	Shared workspace. Agents store research, plans, and memory under `/workspace/shared/{thoughts,memory,downloads,misc}/$AGENT_ID`. All agents can read each other's files.
`swarm_<agent>`	Low	No	Personal workspace per agent (`/workspace/personal`). Isolated — only that agent can see it. Contains cloned repos, working files, and agent-specific state.

How Workspaces Work

Each agent container has two workspace directories:

/workspace/shared — Mounted from swarm_shared. All agents share this volume. By convention, each agent writes only to its own subdirectory (e.g., /workspace/shared/thoughts/<agent-id>/) but can read from any agent's directory. This is how agents share research, plans, and context.
/workspace/personal — Mounted from a per-agent volume (e.g., swarm_worker_1). Only that agent can see this. Used for cloned git repos, working files, and private state.

Backing Up the Database

The swarm_api volume contains your SQLite database — the single source of truth for all swarm state. Back it up regularly.

# Back up the database
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
  cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite

# Restore from backup
docker compose down
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
  cp /backup/agent-swarm-db-backup.sqlite /app/agent-swarm-db.sqlite
docker compose up -d

A database backup is useless without the matching encryption key. Back up both together.

Encryption Key

Agent Swarm encrypts all swarm_config rows marked as secrets (OAuth tokens, API keys, webhook signing secrets) at rest using AES-256-GCM. The master key is resolved on every boot in this order:

SECRETS_ENCRYPTION_KEY env var (recommended for production)
SECRETS_ENCRYPTION_KEY_FILE env var (path to a file containing the key — useful with Docker secrets or k8s Secret volume mounts)
<data-dir>/.encryption-key file on the API's data volume
Auto-generated on first boot only when the database does not yet contain any encrypted secret rows. Existing databases with encrypted rows fail closed if no key is found, instead of silently generating a different key and orphaning your secrets.

Generating a key

Use either format — both are accepted everywhere (env var, file, on-disk):

# Recommended: 43-character base64 string
openssl rand -base64 32

# Equivalent: 64-character hex string
openssl rand -hex 32

Do NOT use openssl rand -base64 39 (or any size other than 32). That produces a 52-character string that decodes to 39 bytes, which the server will reject at boot with Invalid encryption key ... got 39 bytes after base64 decode. The number passed to openssl rand is the decoded byte count, and AES-256 requires exactly 32.

Backing up the key

The encryption key is just as critical as your database backup. Losing the key while keeping the database means you lose every encrypted secret with no recovery path — you would have to manually re-add every OAuth token, API key, and webhook secret in the swarm.

Back up both together, every time:

# Back up DB and key in one shot (Docker Compose deployment)
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine sh -c '
  cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite &&
  cp /app/.encryption-key /backup/encryption-key-backup 2>/dev/null || true
'

If you set SECRETS_ENCRYPTION_KEY via env var instead, store the value itself in your secrets manager (1Password, Vault, AWS Secrets Manager, etc.) — treat it with the same rigor as your database backup.

Common mistakes

Mistake	Symptom	Fix
Used `openssl rand -base64 39`	`got 39 bytes after base64 decode` on boot	Regenerate with `-base64 32` (or `-hex 32`)
Wrapped value in quotes in `.env`	Same as above (extra characters change the decoded length)	Remove the quotes — `.env` lines are `KEY=value`, no quoting needed
Trailing newline or whitespace	Same as above	The server trims surrounding whitespace, but if your secrets manager injected embedded characters, regenerate
Changed key between deploys	Decryption errors when reading existing secrets	Restore the original key — key rotation is not yet supported (planned)
Lost the key entirely	Decryption errors on every secret read	Manually delete encrypted rows via the dashboard or `swarm_config` API and re-add them under the new key

Reserved key names: SECRETS_ENCRYPTION_KEY and API_KEY are blocked from being stored in the DB config store (HTTP, MCP, and direct DB layers all reject them, case-insensitive). They must come from the environment.

First-time migration from plaintext secrets

If you upgraded from a pre-1.67 deploy that stored secrets in plaintext without setting SECRETS_ENCRYPTION_KEY ahead of time, the API auto-generates a .encryption-key on the data volume and writes a one-time plaintext backup at <db-path>.backup.secrets-YYYY-MM-DD.env before encrypting the existing rows. Delete that backup file immediately after verifying your new encryption key is safely backed up — it contains every secret in plaintext.

Environment Variables

These are the key variables for Docker Compose deployment. For the complete reference, see Environment Variables.

Required (in `.env`)

Variable	Description
`API_KEY`	Shared authentication key between API and agents
`CLAUDE_CODE_OAUTH_TOKEN`	OAuth token from `claude setup-token`. Supports comma-separated values for load balancing.
`SECRETS_ENCRYPTION_KEY`	Master key for encrypting `swarm_config` secrets at rest. See Encryption Key for generation and backup guidance.

Per-Agent (in `docker-compose.yml`)

Variable	Description
`AGENT_ID`	Stable UUID. Keep the same across restarts for task resume.
`AGENT_ROLE`	`lead` or `worker`
`TEMPLATE_ID`	Template for initial profile (e.g., `official/coder`, `official/lead`). Applied on first boot only.
`MCP_BASE_URL`	API server URL. Use `http://api:3013` when in the same Docker network.

Optional (in `.env`)

Variable	Description
`GITHUB_TOKEN`	Personal access token for git operations
`GITHUB_EMAIL`	Git commit email
`GITHUB_NAME`	Git commit name
`SWARM_URL`	Base domain for service discovery (default: `localhost`)
`SLACK_BOT_TOKEN`	Enable Slack integration (also needs `SLACK_APP_TOKEN`)
`SLACK_DISABLE`	Set to `true` to disable Slack (default: `false`)

Secrets Encryption

Starting with v1.67.0, swarm_config secrets are encrypted at rest using AES-256-GCM. The docker-compose.example.yml includes a Docker secrets block for the encryption key:

# Generate the key file (one-time)
openssl rand -base64 32 > ./encryption_key
chmod 600 ./encryption_key

The compose file mounts this as a Docker secret at /run/secrets/encryption_key and sets SECRETS_ENCRYPTION_KEY_FILE accordingly. If you prefer an inline env var, use SECRETS_ENCRYPTION_KEY instead.

Back up the encryption key alongside your SQLite database. Losing it means losing all encrypted secrets with no recovery path. See Environment Variables — Secrets Encryption for details.

Adding More Workers

To scale the swarm, copy an existing worker block in docker-compose.yml:

Give it a new service name (e.g., worker-3)
Generate a new AGENT_ID UUID: uuidgen
Pick a new host port (e.g., 3023:3000)
Add a new personal volume (e.g., swarm_worker_3:/workspace/personal)
Declare the new volume at the bottom of the file under volumes:

docker-compose.yml (add a new worker)

  worker-3:
    image: "ghcr.io/desplega-ai/agent-swarm-worker:latest"
    platform: linux/amd64
    pull_policy: always
    stop_grace_period: 60s
    depends_on:
      api:
        condition: service_healthy
    environment:
      - CLAUDE_CODE_OAUTH_TOKEN=${CLAUDE_CODE_OAUTH_TOKEN}
      - API_KEY=${API_KEY}
      - AGENT_ID=YOUR-NEW-UUID-HERE
      - AGENT_ROLE=worker
      - TEMPLATE_ID=official/coder
      - MCP_BASE_URL=http://api:3013
      - YOLO=true
      - GITHUB_TOKEN=${GITHUB_TOKEN:-}
      - GITHUB_EMAIL=${GITHUB_EMAIL:-}
      - GITHUB_NAME=${GITHUB_NAME:-}
      - SWARM_URL=${SWARM_URL:-localhost}
    ports:
      - "3023:3000"
    volumes:
      - swarm_logs:/logs
      - swarm_shared:/workspace/shared
      - swarm_worker_3:/workspace/personal
    restart: unless-stopped

volumes:
  # ... existing volumes ...
  swarm_worker_3:

If using HARNESS_PROVIDER=pi for this worker, do not include CLAUDE_CODE_OAUTH_TOKEN — pass OPENROUTER_API_KEY or ANTHROPIC_API_KEY instead. Claude credentials in the environment will override the pi-mono provider. See Harness Configuration.

Then:

docker compose up -d worker-3

ARM Compatibility (Apple Silicon)

All services include platform: linux/amd64 to avoid no matching manifest for linux/arm64/v8 errors on Apple Silicon Macs. The Docker images are built for linux/amd64 and run via Rosetta emulation.

Graceful Shutdown & Task Resume

The docker-compose example uses stop_grace_period: 60s to allow graceful task pause during deployments. When a container receives SIGTERM:

In-progress tasks are paused (not failed)
Task state and progress are preserved
After restart, paused tasks are automatically resumed with context

Configuration

# Grace period before force-pausing tasks (milliseconds)
SHUTDOWN_TIMEOUT=30000

# Docker compose stop grace period (should be >= SHUTDOWN_TIMEOUT + buffer)
stop_grace_period: 60s

Resume Behavior

When a worker starts, it:

Registers with the MCP server
Checks for paused tasks assigned to its AGENT_ID
Resumes each paused task with original context and progress

Best Practices

Use stable Agent IDs — Set explicit AGENT_ID for each worker
Save progress regularly — Workers should call store-progress during long tasks
Test deployments — Verify tasks resume correctly in staging first

Docker Worker (Standalone)

Run individual Claude workers in containers without Compose.

Pull from Registry

docker pull ghcr.io/desplega-ai/agent-swarm-worker:latest

Build Locally

docker build -f Dockerfile.worker -t agent-swarm-worker .

# Override the pinned Claude Code version (default: 2.1.204)
docker build -f Dockerfile.worker --build-arg CLAUDE_CODE_VERSION=2.2.0 -t agent-swarm-worker .

Current worker-image defaults in Dockerfile.worker:

CLAUDE_CODE_VERSION=2.1.204
PI_CODING_AGENT_VERSION=0.80.3
CODEX_VERSION=0.144.1
OPENCODE_VERSION=1.17.15
OPENCODE_SDK_VERSION=1.17.15

The image also sets DISABLE_AUTOUPDATER=1 so Claude Code stays on the pinned version instead of self-updating at runtime.

The worker image also includes PostgreSQL 16 server binaries for local test setups. Nothing auto-starts unless you set SWARM_DEP_POSTGRES_ENABLED=true; in that case the root-stage entrypoint runs scripts/init-local-postgres.sh before the worker privilege drop, then starts the cluster as the worker user on localhost:5433 by default. Override with LOCAL_POSTGRES_DATA_DIR, LOCAL_POSTGRES_PORT, LOCAL_POSTGRES_USER, LOCAL_POSTGRES_PASSWORD, and LOCAL_POSTGRES_DB as needed.

For local Redis, set SWARM_DEP_REDIS_ENABLED=true; the root-stage entrypoint runs scripts/init-local-redis.sh before privilege drop and starts redis-server as the worker user.

Both Dockerfile and Dockerfile.worker now copy the repository templates/ directory into the image, so system-default skills and templates are available inside compiled deployments without an extra post-build sync step.

The worker image also bundles /usr/local/bin/install-repo-hooks.sh. If a registered repo is configured with hooks: { enabled: true }, the runner calls that helper after clone/refresh so git hooks are installed automatically inside the worker checkout.

Run

docker run --rm -it \
  -e CLAUDE_CODE_OAUTH_TOKEN=your-token \
  -e API_KEY=your-api-key \
  -v ./logs:/logs \
  -v ./work:/workspace \
  ghcr.io/desplega-ai/agent-swarm-worker

With Custom System Prompt

docker run --rm -it \
  -e CLAUDE_CODE_OAUTH_TOKEN=your-token \
  -e API_KEY=your-api-key \
  -e SYSTEM_PROMPT="You are a Python specialist" \
  -v ./logs:/logs \
  -v ./work:/workspace \
  ghcr.io/desplega-ai/agent-swarm-worker

Server Deployment (systemd)

Deploy the MCP server to a Linux host with systemd.

Prerequisites

Linux with systemd
Bun installed (curl -fsSL https://bun.sh/install | bash)

Install

git clone https://github.com/desplega-ai/agent-swarm.git
cd agent-swarm
sudo bun deploy/install.ts

This will:

Copy files to /opt/agent-swarm
Create .env file (edit to set API_KEY)
Install systemd service with health checks every 30s
Start the service on port 3013

Management

sudo systemctl status agent-swarm    # Check status
sudo journalctl -u agent-swarm -f    # View logs
sudo systemctl restart agent-swarm   # Restart
sudo systemctl stop agent-swarm      # Stop

Update

git pull
sudo bun deploy/update.ts

Getting Started — Initial setup and first deployment
Environment Variables — All configuration options
Architecture Overview — How the system components fit together
Task Lifecycle — Graceful shutdown and task resume behavior

Deployment Guide

On this page