Agent SwarmAgent Swarm
Guides

Deployment Guide

Deploy Agent Swarm to production with Docker Compose — volumes, networking, secrets, and persistent storage

The easiest way to deploy a full swarm with API, workers, and lead agent. You can go from zero to a running swarm in under 5 minutes.

Prerequisites

Before you start, make sure you have:

  • Docker & Docker Compose installed (install guide)
  • A Claude Code OAuth token — run claude setup-token in your terminal to get one
  • An API key — any secret string you choose (all services share this key for authentication)

Step 1: Download the Compose File

curl -O https://raw.githubusercontent.com/desplega-ai/agent-swarm/main/docker-compose.example.yml
mv docker-compose.example.yml docker-compose.yml

Or if you have the repo cloned:

cp docker-compose.example.yml docker-compose.yml

Step 2: Create Your .env File

Create a .env file in the same directory as docker-compose.yml:

.env
# ---- Required ----
API_KEY=your-secret-api-key
CLAUDE_CODE_OAUTH_TOKEN=your-oauth-token   # Run `claude setup-token` to get this
SECRETS_ENCRYPTION_KEY=                     # Run `openssl rand -base64 32` and paste here. See "Encryption Key" below.

# ---- Optional ----
GITHUB_TOKEN=ghp_xxxx                      # For git operations inside agents
GITHUB_EMAIL=you@example.com
GITHUB_NAME=Your Name
SWARM_URL=localhost                         # Base domain for service discovery

You can pass multiple OAuth tokens for load balancing: CLAUDE_CODE_OAUTH_TOKEN=token1,token2,token3

If you leave SECRETS_ENCRYPTION_KEY blank on a brand-new deploy, the API will auto-generate one and write it to the swarm_api volume at .encryption-key. For production, set it explicitly so you control where it lives and can back it up alongside your other secrets. See Encryption Key.

Step 3: Generate Stable Agent IDs

Each agent needs a stable UUID that persists across restarts. This is critical for task resume — if an agent restarts, it uses its AGENT_ID to pick up paused tasks.

The example compose file has placeholder UUIDs. Replace them with your own:

# Generate one UUID per agent
uuidgen  # → paste into lead AGENT_ID
uuidgen  # → paste into worker-1 AGENT_ID
uuidgen  # → paste into worker-2 AGENT_ID

Edit docker-compose.yml and replace each service's AGENT_ID value.

Step 4: Start the Swarm

docker compose up -d

The API service starts first. Workers and lead wait until the API health check passes before starting (via depends_on: condition: service_healthy).

Step 5: Verify It's Running

# Check all services are up
docker compose ps

# Check API health
curl http://localhost:3013/health

# List registered agents (replace YOUR_API_KEY with your actual key)
curl -s -H "Authorization: Bearer YOUR_API_KEY" \
  http://localhost:3013/api/agents | jq '.agents[] | {name, status, isLead}'

You should see the lead and workers listed with status: "idle".

If you are deploying Codex workers with ChatGPT OAuth instead of OPENAI_API_KEY, follow Provider Auth: Codex OAuth after the API is up, then restart those workers.


What's Included

The example docker-compose.yml sets up:

ServiceRolePortTemplate
apiMCP HTTP server + SQLite DB3013
leadCoordinator agent3020official/lead
worker-1Task executor3021official/coder
worker-2Task executor3022official/coder
worker-content-writerContent specialist3026official/content-writer
worker-content-reviewerContent reviewer3027official/content-reviewer
worker-content-strategistContent strategist3028official/content-strategist

The content agents are optional — remove them from docker-compose.yml if you don't need content workflows.


Volumes & Persistence

The swarm uses Docker named volumes to persist data across restarts and upgrades. Getting volumes right is essential — without them, you lose your database and agent workspaces on every restart.

Volume Architecture

Docker Volume            → Container Path        → What It Stores
─────────────────────────────────────────────────────────────────────
swarm_api                → /app                  → SQLite DB (all swarm state)
swarm_logs               → /logs                 → Session logs (all agents)
swarm_shared             → /workspace/shared     → Shared workspace (all agents)
swarm_lead               → /workspace/personal   → Lead's private workspace
swarm_worker_1           → /workspace/personal   → Worker 1's private workspace
swarm_worker_2           → /workspace/personal   → Worker 2's private workspace

What Each Volume Stores

VolumeCritical?Backup?Description
swarm_apiYesYesContains the SQLite database (agent-swarm-db.sqlite) with all tasks, agents, schedules, and configuration. Losing this = losing all swarm state.
swarm_logsNoOptionalSession logs from all agents. Useful for debugging. Can be recreated.
swarm_sharedModerateRecommendedShared workspace. Agents store research, plans, and memory under /workspace/shared/{thoughts,memory,downloads,misc}/$AGENT_ID. All agents can read each other's files.
swarm_<agent>LowNoPersonal workspace per agent (/workspace/personal). Isolated — only that agent can see it. Contains cloned repos, working files, and agent-specific state.

How Workspaces Work

Each agent container has two workspace directories:

  • /workspace/shared — Mounted from swarm_shared. All agents share this volume. By convention, each agent writes only to its own subdirectory (e.g., /workspace/shared/thoughts/<agent-id>/) but can read from any agent's directory. This is how agents share research, plans, and context.

  • /workspace/personal — Mounted from a per-agent volume (e.g., swarm_worker_1). Only that agent can see this. Used for cloned git repos, working files, and private state.

Backing Up the Database

The swarm_api volume contains your SQLite database — the single source of truth for all swarm state. Back it up regularly.

# Back up the database
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
  cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite

# Restore from backup
docker compose down
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine \
  cp /backup/agent-swarm-db-backup.sqlite /app/agent-swarm-db.sqlite
docker compose up -d

A database backup is useless without the matching encryption key. Back up both together.


Encryption Key

Agent Swarm encrypts all swarm_config rows marked as secrets (OAuth tokens, API keys, webhook signing secrets) at rest using AES-256-GCM. The master key is resolved on every boot in this order:

  1. SECRETS_ENCRYPTION_KEY env var (recommended for production)
  2. SECRETS_ENCRYPTION_KEY_FILE env var (path to a file containing the key — useful with Docker secrets or k8s Secret volume mounts)
  3. <data-dir>/.encryption-key file on the API's data volume
  4. Auto-generated on first boot only when the database does not yet contain any encrypted secret rows. Existing databases with encrypted rows fail closed if no key is found, instead of silently generating a different key and orphaning your secrets.

Generating a key

Use either format — both are accepted everywhere (env var, file, on-disk):

# Recommended: 43-character base64 string
openssl rand -base64 32

# Equivalent: 64-character hex string
openssl rand -hex 32

Do NOT use openssl rand -base64 39 (or any size other than 32). That produces a 52-character string that decodes to 39 bytes, which the server will reject at boot with Invalid encryption key ... got 39 bytes after base64 decode. The number passed to openssl rand is the decoded byte count, and AES-256 requires exactly 32.

Backing up the key

The encryption key is just as critical as your database backup. Losing the key while keeping the database means you lose every encrypted secret with no recovery path — you would have to manually re-add every OAuth token, API key, and webhook secret in the swarm.

Back up both together, every time:

# Back up DB and key in one shot (Docker Compose deployment)
docker run --rm -v swarm_api:/app -v $(pwd):/backup alpine sh -c '
  cp /app/agent-swarm-db.sqlite /backup/agent-swarm-db-backup.sqlite &&
  cp /app/.encryption-key /backup/encryption-key-backup 2>/dev/null || true
'

If you set SECRETS_ENCRYPTION_KEY via env var instead, store the value itself in your secrets manager (1Password, Vault, AWS Secrets Manager, etc.) — treat it with the same rigor as your database backup.

Common mistakes

MistakeSymptomFix
Used openssl rand -base64 39got 39 bytes after base64 decode on bootRegenerate with -base64 32 (or -hex 32)
Wrapped value in quotes in .envSame as above (extra characters change the decoded length)Remove the quotes — .env lines are KEY=value, no quoting needed
Trailing newline or whitespaceSame as aboveThe server trims surrounding whitespace, but if your secrets manager injected embedded characters, regenerate
Changed key between deploysDecryption errors when reading existing secretsRestore the original key — key rotation is not yet supported (planned)
Lost the key entirelyDecryption errors on every secret readManually delete encrypted rows via the dashboard or swarm_config API and re-add them under the new key

Reserved key names: SECRETS_ENCRYPTION_KEY and API_KEY are blocked from being stored in the DB config store (HTTP, MCP, and direct DB layers all reject them, case-insensitive). They must come from the environment.

First-time migration from plaintext secrets

If you upgraded from a pre-1.67 deploy that stored secrets in plaintext without setting SECRETS_ENCRYPTION_KEY ahead of time, the API auto-generates a .encryption-key on the data volume and writes a one-time plaintext backup at <db-path>.backup.secrets-YYYY-MM-DD.env before encrypting the existing rows. Delete that backup file immediately after verifying your new encryption key is safely backed up — it contains every secret in plaintext.


Environment Variables

These are the key variables for Docker Compose deployment. For the complete reference, see Environment Variables.

Required (in .env)

VariableDescription
API_KEYShared authentication key between API and agents
CLAUDE_CODE_OAUTH_TOKENOAuth token from claude setup-token. Supports comma-separated values for load balancing.
SECRETS_ENCRYPTION_KEYMaster key for encrypting swarm_config secrets at rest. See Encryption Key for generation and backup guidance.

Per-Agent (in docker-compose.yml)

VariableDescription
AGENT_IDStable UUID. Keep the same across restarts for task resume.
AGENT_ROLElead or worker
TEMPLATE_IDTemplate for initial profile (e.g., official/coder, official/lead). Applied on first boot only.
MCP_BASE_URLAPI server URL. Use http://api:3013 when in the same Docker network.

Optional (in .env)

VariableDescription
GITHUB_TOKENPersonal access token for git operations
GITHUB_EMAILGit commit email
GITHUB_NAMEGit commit name
SWARM_URLBase domain for service discovery (default: localhost)
SLACK_BOT_TOKENEnable Slack integration (also needs SLACK_APP_TOKEN)
SLACK_DISABLESet to true to disable Slack (default: false)

Secrets Encryption

Starting with v1.67.0, swarm_config secrets are encrypted at rest using AES-256-GCM. The docker-compose.example.yml includes a Docker secrets block for the encryption key:

# Generate the key file (one-time)
openssl rand -base64 32 > ./encryption_key
chmod 600 ./encryption_key

The compose file mounts this as a Docker secret at /run/secrets/encryption_key and sets SECRETS_ENCRYPTION_KEY_FILE accordingly. If you prefer an inline env var, use SECRETS_ENCRYPTION_KEY instead.

Back up the encryption key alongside your SQLite database. Losing it means losing all encrypted secrets with no recovery path. See Environment Variables — Secrets Encryption for details.


Adding More Workers

To scale the swarm, copy an existing worker block in docker-compose.yml:

  1. Give it a new service name (e.g., worker-3)
  2. Generate a new AGENT_ID UUID: uuidgen
  3. Pick a new host port (e.g., 3023:3000)
  4. Add a new personal volume (e.g., swarm_worker_3:/workspace/personal)
  5. Declare the new volume at the bottom of the file under volumes:
docker-compose.yml (add a new worker)
  worker-3:
    image: "ghcr.io/desplega-ai/agent-swarm-worker:latest"
    platform: linux/amd64
    pull_policy: always
    stop_grace_period: 60s
    depends_on:
      api:
        condition: service_healthy
    environment:
      - CLAUDE_CODE_OAUTH_TOKEN=${CLAUDE_CODE_OAUTH_TOKEN}
      - API_KEY=${API_KEY}
      - AGENT_ID=YOUR-NEW-UUID-HERE
      - AGENT_ROLE=worker
      - TEMPLATE_ID=official/coder
      - MCP_BASE_URL=http://api:3013
      - YOLO=true
      - GITHUB_TOKEN=${GITHUB_TOKEN:-}
      - GITHUB_EMAIL=${GITHUB_EMAIL:-}
      - GITHUB_NAME=${GITHUB_NAME:-}
      - SWARM_URL=${SWARM_URL:-localhost}
    ports:
      - "3023:3000"
    volumes:
      - swarm_logs:/logs
      - swarm_shared:/workspace/shared
      - swarm_worker_3:/workspace/personal
    restart: unless-stopped

volumes:
  # ... existing volumes ...
  swarm_worker_3:

If using HARNESS_PROVIDER=pi for this worker, do not include CLAUDE_CODE_OAUTH_TOKEN — pass OPENROUTER_API_KEY or ANTHROPIC_API_KEY instead. Claude credentials in the environment will override the pi-mono provider. See Harness Configuration.

Then:

docker compose up -d worker-3

ARM Compatibility (Apple Silicon)

All services include platform: linux/amd64 to avoid no matching manifest for linux/arm64/v8 errors on Apple Silicon Macs. The Docker images are built for linux/amd64 and run via Rosetta emulation.


Graceful Shutdown & Task Resume

The docker-compose example uses stop_grace_period: 60s to allow graceful task pause during deployments. When a container receives SIGTERM:

  1. In-progress tasks are paused (not failed)
  2. Task state and progress are preserved
  3. After restart, paused tasks are automatically resumed with context

Configuration

# Grace period before force-pausing tasks (milliseconds)
SHUTDOWN_TIMEOUT=30000

# Docker compose stop grace period (should be >= SHUTDOWN_TIMEOUT + buffer)
stop_grace_period: 60s

Resume Behavior

When a worker starts, it:

  1. Registers with the MCP server
  2. Checks for paused tasks assigned to its AGENT_ID
  3. Resumes each paused task with original context and progress

Best Practices

  • Use stable Agent IDs — Set explicit AGENT_ID for each worker
  • Save progress regularly — Workers should call store-progress during long tasks
  • Test deployments — Verify tasks resume correctly in staging first

Docker Worker (Standalone)

Run individual Claude workers in containers without Compose.

Pull from Registry

docker pull ghcr.io/desplega-ai/agent-swarm-worker:latest

Run

docker run --rm -it \
  -e CLAUDE_CODE_OAUTH_TOKEN=your-token \
  -e API_KEY=your-api-key \
  -v ./logs:/logs \
  -v ./work:/workspace \
  ghcr.io/desplega-ai/agent-swarm-worker

With Custom System Prompt

docker run --rm -it \
  -e CLAUDE_CODE_OAUTH_TOKEN=your-token \
  -e API_KEY=your-api-key \
  -e SYSTEM_PROMPT="You are a Python specialist" \
  -v ./logs:/logs \
  -v ./work:/workspace \
  ghcr.io/desplega-ai/agent-swarm-worker

Server Deployment (systemd)

Deploy the MCP server to a Linux host with systemd.

Prerequisites

  • Linux with systemd
  • Bun installed (curl -fsSL https://bun.sh/install | bash)

Install

git clone https://github.com/desplega-ai/agent-swarm.git
cd agent-swarm
sudo bun deploy/install.ts

This will:

  • Copy files to /opt/agent-swarm
  • Create .env file (edit to set API_KEY)
  • Install systemd service with health checks every 30s
  • Start the service on port 3013

Management

sudo systemctl status agent-swarm    # Check status
sudo journalctl -u agent-swarm -f    # View logs
sudo systemctl restart agent-swarm   # Restart
sudo systemctl stop agent-swarm      # Stop

Update

git pull
sudo bun deploy/update.ts

On this page