Agent SwarmAgent Swarm
Guides

Adding a Harness Provider

Implement a new harness provider (like claude, pi, or codex) — the full adapter contract, reference implementations, and wiring checklist

This guide documents the harness-provider contract used by agent-swarm workers, the three reference implementations (claude, pi, codex), and every hook that must be wired when adding a new provider.

A harness provider is the runtime that actually drives an LLM: it owns the subprocess (or in-process SDK) that reads the user prompt, talks to the model, invokes tools, and streams events back. The swarm treats providers as plug-ins behind a single TypeScript interface (ProviderAdapter). Workers select one at boot via HARNESS_PROVIDER.

For configuring an existing provider, see Harness Configuration. This guide is for implementing a new one.

Supported today: claude (Anthropic Claude Code CLI), pi (pi-mono, in-process via @earendil-works/pi-coding-agent), codex (OpenAI Codex SDK hosted inside a per-task subprocess), devin (Cognition Devin via /sessions), claude-managed (Anthropic Managed Agents — sessions execute in Anthropic's cloud sandbox), opencode (in-process @opencode-ai/sdk server with SSE event mapping; experimental, rolling out across DES-295 → DES-299).


1. The ProviderAdapter contract

Source of truth: src/providers/types.ts.

export interface ProviderAdapter {
  readonly name: string;
  createSession(config: ProviderSessionConfig): Promise<ProviderSession>;
  canResume(sessionId: string): Promise<boolean>;
  formatCommand(commandName: string): string;
}

export interface ProviderSession {
  readonly sessionId: string | undefined;
  onEvent(listener: (event: ProviderEvent) => void): void;
  waitForCompletion(): Promise<ProviderResult>;
  abort(): Promise<void>;
}

What each member does

MemberResponsibility
nameShort identifier used in logs ("claude", "pi", "codex").
createSession(config)Spin up a new session for a task. Must not block on model output — return a ProviderSession immediately and stream events asynchronously.
canResume(sessionId)Return true iff the provider can continue a previous session by ID. Used when the runner resumes paused work.
formatCommand(commandName)Map a swarm slash-command (e.g. "review-pr") to the form the underlying CLI expects (e.g. /review-pr, /skill:review-pr, or inlined via skill resolver).
session.onEventRegister a listener. The adapter must call every registered listener for every ProviderEvent.
session.waitForCompletion()Resolve once the session ends; returns { exitCode, sessionId, cost, output, isError, errorCategory, failureReason }.
session.abort()Cancel in flight (SIGTERM, SDK abort signal, etc.). Must be idempotent.

ProviderSessionConfig inputs

interface ProviderSessionConfig {
  prompt: string;           // user prompt
  systemPrompt: string;     // composed system prompt
  model: string;            // resolved model id or ""
  role: string;             // "lead" | "worker" | ...
  agentId: string;
  taskId: string;
  apiUrl: string;           // swarm API base URL for callbacks
  apiKey: string;           // swarm API key
  cwd: string;              // workspace dir
  /** @deprecated Always undefined — native resume removed in the 2026-05-28 plan. */
  resumeSessionId?: string;
  iteration?: number;
  logFile: string;          // jsonl log path
  additionalArgs?: string[];
  env?: Record<string, string>;
}

ProviderEvent (the normalized stream)

Every provider translates its native events into this tagged union:

type ProviderEvent =
  | { type: "session_init"; sessionId: string }
  | { type: "message"; role: "assistant" | "user"; content: string }
  | { type: "tool_start"; toolCallId: string; toolName: string; args: unknown }
  | { type: "tool_end"; toolCallId: string; toolName: string; result: unknown }
  | { type: "result"; cost: CostData; output?: string; isError: boolean; errorCategory?: string }
  | { type: "error"; message: string; category?: string }
  | { type: "raw_log"; content: string }
  | { type: "raw_stderr"; content: string }
  | { type: "custom"; name: string; data: unknown }
  | { type: "context_usage"; contextUsedTokens: number; contextTotalTokens: number; contextPercent: number; outputTokens: number }
  | { type: "compaction"; preCompactTokens: number; compactTrigger: "auto" | "manual"; contextTotalTokens: number };

The runner consumes these events (not the provider's native ones) to post progress to the swarm API, detect tool loops, charge cost, and update task state. Implementing this translation is the bulk of the work for a new provider.


2. Reference implementations

FileTransportAuth
src/providers/claude-adapter.tsBun.spawn of claude CLI with --output-format stream-json, JSONL stdout parsingCLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY
src/providers/pi-mono-adapter.tsIn-process via createAgentSession from @earendil-works/pi-coding-agent; no subprocessANTHROPIC_API_KEY / OPENROUTER_API_KEY / ~/.pi/agent/auth.json — or, when MODEL_OVERRIDE=amazon-bedrock/*, anything the AWS SDK default chain accepts (see pi-mono + Amazon Bedrock below)
src/providers/codex-adapter.tsParent adapter spawns src/commands/codex-session-runner.ts, sends session config over stdin, and receives line-delimited events/results over stdout; the child hosts @openai/codex-sdk Codex / Thread and streams ThreadEvents backOPENAI_API_KEY or ChatGPT OAuth stored in swarm_config.codex_oauth (see §6)
src/providers/claude-managed-adapter.tsAnthropic SDK client.beta.sessions.events.stream (SSE); session executes in Anthropic's managed cloud sandbox, worker is a thin relayANTHROPIC_API_KEY plus pre-existing MANAGED_AGENT_ID + MANAGED_ENVIRONMENT_ID from one-time claude-managed-setup CLI (see §13)

pi-mono + Amazon Bedrock auth

pi-mono routes to Amazon Bedrock when MODEL_OVERRIDE=amazon-bedrock/<model-id> (e.g. amazon-bedrock/anthropic.claude-sonnet-4-20250514-v1:0). Bedrock authenticates through the AWS SDK's default credential chain — not a single API key — so agent-swarm delegates credential resolution to the SDK. The boot credential gate detects the amazon-bedrock/ prefix and short-circuits to satisfiedBy: "sdk-delegated" without inspecting any AWS env var or file.

Accepted credential sources — anything the AWS SDK accepts:

  • AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (+ optional AWS_SESSION_TOKEN)
  • AWS_PROFILE resolved against ~/.aws/credentials and ~/.aws/config
  • AWS SSO sessions configured in ~/.aws/config
  • EC2 IMDS instance role / ECS task role
  • Web-identity / OIDC (AWS_WEB_IDENTITY_TOKEN_FILE, AWS_ROLE_ARN)
  • credential_process and assume-role chains

AWS_REGION (or AWS_DEFAULT_REGION) is required by the SDK and must be a Bedrock-enabled region.

Failure mode. No credential-wait parking — the worker claims tasks immediately. If creds are missing, the first Bedrock inference call fails with the AWS SDK's own error (e.g. CredentialsProviderError: Could not load credentials from any providers), which surfaces in the session log via the existing error path (scrubbed through scrubSecrets). Treat this the same way you treat a stale codex auth.json: the underlying SDK is the source of truth.

This is intentionally the same pattern codex uses for OAuth credentials (codexAuthFileExistspresenceCheckOk in src/commands/provider-credentials.ts) — presence-only gate, runtime validation owned by the adapter/SDK, no upstream call from agent-swarm. pi-ai does not currently expose a Bedrock-specific credential check we could call; if it does in the future, the live-test branch can be upgraded without touching the boot gate.

Cost & context tracking

Every adapter emits one CostData row per CLI invocation and one or more context_usage events per session. The dollar value is recomputed server-side against the seeded pricing table (Phase 2 of the cost-tracking plan); the resulting costSource enum is surfaced in the UI. The unified input + cache + output context formula (Phase 9) replaces every adapter's previous per-provider arithmetic so cross-provider percent comparisons make sense.

Full story: Cost & context computation.

All four are wired together by the factory at src/providers/index.ts:

export async function createProviderAdapter(provider: string): Promise<ProviderAdapter> {
  switch (provider) {
    case "claude":
      return new ClaudeAdapter();
    case "pi": {
      const { PiMonoAdapter } = await import("./pi-mono-adapter");
      return new PiMonoAdapter();
    }
    case "codex":
      return new CodexAdapter();
    case "claude-managed":
      return new ClaudeManagedAdapter();
    default:
      throw new Error(`Unknown HARNESS_PROVIDER: "${provider}". Supported: claude, pi, codex, claude-managed`);
  }
}

The runner reads the resolved HARNESS_PROVIDER and awaits the factory at boot. The async factory is intentional: providers with module-level side effects (notably pi) are lazy-loaded only when selected, so unused adapters cannot crash unrelated workers during startup. The resolved value comes from swarm_config overlaid on process.env, with this precedence (highest first):

  1. swarm_config HARNESS_PROVIDER — repo > agent > global, via /api/config/resolved
  2. process.env.HARNESS_PROVIDER — container env
  3. "claude" — default

Resolution lives in src/utils/harness-provider.ts (resolveHarnessProvider); it's threaded through fetchResolvedEnv in src/commands/runner.ts so credential pool selection uses the same resolved value.

Switching providers without restart

The worker re-evaluates the resolved provider on each poll iteration (throttled to ~10s). When the value changes — typically because an operator wrote a new swarm_config row, or called PATCH /api/agents/{id}/harness-provider (which mirrors the value into swarm_config at scope=agent) — the worker:

  1. Creates a fresh adapter via createProviderAdapter(resolvedProvider).
  2. Updates state.harnessProvider.
  3. Rebuilds basePrompt so traits-driven prompt sections (e.g. local-environment vs. cloud-managed) match the new provider.
  4. Resets the cached cred_status snapshot so the dashboard shows credential health for the new adapter.

In-flight task sessions hold their own ProviderSession reference and continue on the old adapter unaffected; only future spawns pick up the swap. Failures during reconciliation (network blip, invalid value, adapter init error) log a warning and stay on the current provider — the worker is never wedged by a bad config.

Invalid HARNESS_PROVIDER values are rejected at write time (validateConfigValue in src/be/swarm-config-guard.ts), so a typo via PUT /api/config or the set-config MCP tool returns 400 instead of being silently stored.

Per-task outputSchema support

Tasks may carry an optional JSON Schema on outputSchema that the agent's final output must conform to. Enforcement depends on the harness:

ProviderSupportedNotes
claudeYesVia MCP store-progress; CLI extraction fallback if missed
claude-managedYesVia MCP store-progress
codexYesVia MCP store-progress
opencodeYesVia MCP store-progress
pi (pi-mono)YesVia MCP store-progress
devinYesVia MCP store-progress when HAS_MCP=true; otherwise the runner validates direct providerOutput before finishing the task

The primary enforcement point is the MCP store-progress tool: a non-conforming output fails the tool call and the agent is asked to retry. For providers that return direct providerOutput to the runner, ensureTaskFinished() now performs the same JSON-parse + schema-validation check before persisting task.output; if validation fails, the task is marked failed instead of silently storing invalid structured output.


3. How a harness run fits into a task lifecycle

Every harness run is scoped to exactly one task. The runner owns the task's lifecycle and gives the adapter only what it needs to execute that single unit of work.

The flow

All steps live in src/commands/runner.ts.

StepWhat it does
pollForTriggerGET /api/poll until a Trigger is returned
buildPromptForTriggerConstructs the user prompt from the trigger
fetchRelevantMemoriesEnriches context (memory vectors + installed skills)
buildSystemPromptComposes the full system prompt (see §6)
spawnProviderProcessCalls adapter.createSession(config)
session.onEvent(...)Each ProviderEvent → API call (progress, logs, cost, context)
session.waitForCompletionBlocks until the adapter resolves
ensureTaskFinishedPOST /api/tasks/{id}/finish
syncProfileFilesToServerSession-end FS → DB sync of the agent's self-editable files (see below). Runs for every hasLocalEnvironment harness.

One task → one session. The runner tracks concurrent tasks in state.activeTasks: Map<taskId, RunningTask> up to MAX_CONCURRENT_TASKS. An adapter never needs to multiplex tasks internally — if the worker should handle two at once, the runner spawns two sessions.

Session-end identity / config sync (FS → DB)

When a session finishes, the runner syncs the agent's self-editable files back to the API so edits the agent made during the session persist into its profile (context_versions):

  • SOUL.md / IDENTITY.md / TOOLS.md / HEARTBEAT.md (bundled identity update)
  • the CLAUDE.md source — provider-dependent (see below)
  • the agent-managed section of /workspace/start-up.sh (setupScript)

The sync lives in src/commands/profile-sync.ts (syncProfileFilesToServer) and is called from checkCompletedProcesses in runner.ts. The decision to sync is made per finished session, using the provider / local-env trait snapshotted on RunningTask at spawn time — not the mutable global state.hasLocalEnvironment. The runner lets an in-flight session finish on its original adapter after a live provider swap, so reading the global would skip a session that started local (worker since flipped remote) or sync stale local files after a remote session finished (worker since flipped local). The sync fires when any finished session in the batch ran in a local environment:

Providerlocal environmentSession-end sync
claude, pi, codex, opencodeyesYes — when the finished session ran on this provider
devin, claude-managednoNo — these have no /workspace FS
  • CLAUDE.md source is provider-routed (resolveClaudeMdPath). claude edits its personal file at ~/.claude/CLAUDE.md (also synced by the Claude Stop hook); every other local harness (codex/pi/opencode) edits /workspace/CLAUDE.md — the file the runner materializes from the claudeMd DB field at boot and that the base-prompt truncation notice points them to. An all-claude batch syncs the personal file (the runner is a backstop and never overwrites it with the stale workspace copy); any non-claude session in the batch routes to /workspace/CLAUDE.md. This closes the previously-open non-Claude CLAUDE.md FS → DB gap.

Notes:

  • Harness-agnostic, runner-driven. Because the sync runs in the runner — at the single point where every completed session converges, including crashes — it does not depend on the harness emitting a shutdown event. This is what makes it reliable for codex/opencode (which previously had no sync path) and for pi (whose in-extension session_shutdown sync could silently not-fire). The Claude plugin Stop hook (src/hooks/hook.ts) and the pi extension (src/providers/pi-mono-extension.ts) still run their own sync; the runner-level call is the authoritative backstop.
  • Idempotent. The profile route only writes a new context_versions row when the content hash changes (updateAgentProfile in src/be/db.ts), so a redundant sync — pi's extension + runner double-POST, or an unchanged file — collapses to a no-op.
  • Non-fatal but visible. A failed sync never fails the task, but unlike the original copies it checks resp.ok and logs a scrubbed warning instead of silently swallowing the error.
  • Consequence (the "Rule 19" reversion). Because the FS is now the authoritative source at session end for all local-environment harnesses, a Lead update-profile edit to another agent's SOUL.md/IDENTITY.md/setupScript (which writes only the DB row, not the remote container's FS) is reverted on that agent's next session end. Previously only claude exhibited this; now pi (reliably), codex, and opencode do too. Pushing Lead edits all the way to a remote agent's FS is a separate, unsolved problem.

What fields on the task become ProviderSessionConfig

Assembled around runner.ts L1582–1596 and L3093–3133:

ProviderSessionConfig fieldComes from
promptbuildPromptForTrigger(trigger, task, memories, ...), optionally prefixed with a bounded follow-up context preamble when task.parentTaskId is set
systemPromptbuildSystemPrompt() + task.additionalSystemPrompt (see §6)
modeltask.modelMODEL_OVERRIDE env → ""
agentId, taskId, apiUrl, apiKeyWorker env + task row
cwdtask.dirrepoContext.clonePathprocess.cwd() (L3050–3072)
resumeSessionIdDeprecated — always undefined. The runner stopped threading native session resume in the 2026-05-28 plan; follow-up continuity flows entirely through the context preamble prepended to prompt.
iterationRetry counter (runner state)
logFile${logDir}/${timestamp}-${taskIdSlice}.jsonl (L3102) — see §4
envMerged worker env + task-provided env

API endpoints touched per task

The runner (not the adapter) owns all of these. They are listed here so you know what the adapter's event stream is ultimately driving:

  • POST /api/tasks/{id}/progress — human-readable progress (L402–410)
  • POST /api/tasks/{id}/context — on context_usage, compaction, completion (L1777, L1797, L1900)
  • POST /api/session-logs — on raw_log (L983, see §4)
  • POST /api/events/batch — tool/session events (L1648)
  • PUT /api/tasks/{id}/claude-session — on session_init (L1037)
  • POST /api/session-costs — on result (L1014)
  • POST /api/active-sessions / DELETE /api/active-sessions/by-task/{id} — L1178, L1199
  • POST /api/tasks/{id}/pause / resume — L711, L797
  • POST /api/tasks/{id}/finish — L579
  • GET /cancelled-tasks?taskId=... — L2909 (also polled by adapter-side hooks)

Resume semantics

Native session resume (the claude --resume <UUID> CLI flag and codex.resumeThread(id) SDK call) was removed in the 2026-05-28 deprecation plan (thoughts/taras/plans/2026-05-28-deprecate-native-resume.md). The reasoning: native resume relied on an on-disk transcript that disappears when the worker container restarts (deploy, OOM, autoscaler reschedule), and the harness then either errored out or silently spawned a context-less session. The bounded context preamble survives any worker restart because it is rebuilt from the parent-task chain held in the API DB.

  • Parent → child continuity is now single-layered: the runner prepends a bounded preamble (cap ~2000 tokens, see src/commands/context-preamble.ts) for all providers when task.parentTaskId is set, and adapters always spawn a fresh harness session. resolveResumeSession is preserved as an observability shim that logs which session ids would have been used; it never returns a resumeSessionId.
  • Pause → resume: a paused task restarts with a new session, re-applying task.progress via buildResumePrompt (L809–829); if the task is also part of a parent chain, the context-preamble path is applied before execution resumes.
  • Adapter behavior: claude / claude-managed / codex each warn and ignore any stray resumeSessionId they receive; the runner no longer sets one. CodexAdapter.canResume() returns false unconditionally.

What your adapter owes the task: emit session_init as early as possible (so the runner can persist the provider session id), emit tool_start/tool_end faithfully (so the UI shows the agent's work), and emit a result with populated CostData before your waitForCompletion() resolves.


4. Raw session logs & the task details page

The logFile field in ProviderSessionConfig is not optional decoration — it is the system of record for what happened inside a run. The task details UI reads from this pipeline.

Path convention

${LOG_DIR:-/logs}/<sessionId>/<timestamp>-<taskId8>.jsonl

Constructed at runner.ts:3102. LOG_DIR defaults to /logs in Docker workers, so the effective path is /workspace/logs/<sessionId>/<...>.jsonl. The runner writes the first line — a metadata record — at L3120 before spawning.

What each adapter writes to logFile

All three open the file with Bun.file(config.logFile).writer() and append JSONL lines.

  • Claude (claude-adapter.ts):
    • Raw NDJSON stdout from the Claude CLI is piped through (L284).
    • stderr wrapped as {type: "stderr", content, timestamp} (L321–323).
    • File handle closed at L329.
  • Pi-mono (pi-mono-adapter.ts):
    • Every normalized ProviderEvent is written as {...event, timestamp} inside emit() (L167–187). Closed in runSession's finally at L327.
  • Codex (codex-adapter.ts):
    • Every ProviderEvent written with timestamp in emit() (L347–374).
    • Raw SDK ThreadEvent mirrored as raw_log at L466.
    • Closed at L734.

Secret scrubbing is mandatory at every log egress. Import scrubSecrets from src/utils/secret-scrubber.ts and wrap every string before writing. All three reference adapters do this; a new adapter must too (see claude L284, L294, L303, L320, L344; pi L170–173; codex L352–355).

How raw logs reach the task details page

The .jsonl file on disk is the adapter-side dump. The UI does not read the file directly — it reads the DB-backed copy.

adapter emits raw_log  ─▶  runner flushLogBuffer  ─▶  POST /api/session-logs


                                                 session_logs (SQLite)


              ui    ─ useTaskSessionLogs ─▶ GET /api/tasks/{id}/session-logs


                                              <SessionLogViewer />
  • Runner upload: runner.ts:1814–1833. Only raw_log triggers the remote push; raw_stderr is pretty-printed to worker stdout only (L1834–1836).
  • API write: src/http/session-data.ts L135–153 → createSessionLogs in src/be/db.
  • API read: same file L155–166 — GET /api/tasks/{taskId}/session-logsgetSessionLogsByTaskId.
  • UI hook: useTaskSessionLogs in ui/src/api/hooks/use-tasks.ts, consumed at ui/src/pages/tasks/[id]/page.tsx:42 and rendered by <SessionLogViewer /> at ui/src/components/shared/session-log-viewer.tsx.

What this means for a new provider

Your emit() implementation must do three things for the UI to light up:

  1. Write every event to logFile as JSONL (for offline diagnostics / /workspace/logs/).
  2. Emit a raw_log ProviderEvent for anything the user might want to inspect in the task details page. The runner will upload it.
  3. Run every string through scrubSecrets before emitting or writing.

Tool calls (tool_start/tool_end) are not shown via session-logs — they go to /api/events/batch. You still need to emit them, but they reach the UI through a different channel (the agent's tool-activity timeline).


5. Exposing the swarm MCP to the runtime

This is the single most important integration point after event translation. The swarm MCP server is how the agent actually interacts with the swarm: store progress, offer subtasks, read/write memory, request human input, cancel itself. A provider that runs code but cannot call swarm MCP tools is effectively a read-only model invocation — it will never drive real swarm behavior.

Where the MCP server lives

The swarm API exposes its MCP server at {apiUrl}/mcp. Tools are defined under src/tools/.

How each reference adapter wires it

ProviderWiringFile
ClaudeDiscovers an existing .mcp.json (walking up from cwd), injects X-Source-Task-Id into the agent-swarm entry, writes a per-session copy to /tmp/mcp-<taskId>.json, launches CLI with --mcp-config <path> --strict-mcp-config.claude-adapter.ts L117–184, L251
PiConstructs McpHttpClient(apiUrl, apiKey, agentId, taskId), calls listTools(), wraps each as a pi-mono ToolDefinition with prefix mcp__<name>__.pi-mono-adapter.ts L410–421, pi-mono-mcp-client.ts L53
CodexbuildCodexConfig registers mcp_servers["agent-swarm"] = { url: "{apiUrl}/mcp", http_headers: { Authorization, X-Agent-ID, X-Source-Task-Id }, bearer_token_env_var, startup_timeout_sec }. Passed to new Codex({ config }).codex-adapter.ts L132–243, L857–861

Required headers

Whatever the transport, the adapter MUST set these three headers on the swarm MCP connection:

  • Authorization: Bearer ${apiKey}
  • X-Agent-ID: <agentId>
  • X-Source-Task-Id: <taskId> — so nested tool calls attribute back to the right task

Key tools a harness will call mid-run

Pretty labels at runner.ts:254–317. Representative list:

  • store-progress — structured progress updates + memories
  • offer-task / send-task — delegate to another agent
  • cancel-task / poll-task — lifecycle
  • memory-search / memory-get / inject-learning — shared memory
  • get-task-details, post-message, read-messages — inter-agent coordination
  • request-human-input — HITL gates
  • trigger-workflow — start a workflow DAG

Fallback when MCP is unavailable

Each reference adapter fails open (the run continues without MCP tools), but the agent is effectively blind to the swarm:

  • Pi: try/catch around discovery (pi-mono-adapter.ts L409–424) — on failure, customTools = [].
  • Codex: adds the agent-swarm entry unconditionally; failures fetching installed servers are non-fatal (codex-adapter.ts L217–229).
  • Claude: createSessionMcpConfig returns null if nothing found; CLI runs without --mcp-config.

Even if the in-process MCP connection fails, the runner and the adapter's own swarm-event hooks still talk to the swarm HTTP API directly for lifecycle, heartbeat, and cancellation polling — see src/providers/codex-swarm-events.ts and src/providers/pi-mono-extension.ts L1–50. That is the safety net; MCP is what lets the model call tools.


6. System prompt composition & delivery

ProviderSessionConfig.systemPrompt is the full assembled system prompt, not a fragment. Your adapter's job is to hand it to the underlying runtime verbatim — not to add preamble.

How it is built

Composed by getBasePrompt(args) at src/prompts/base-prompt.ts:55–225 and orchestrated by buildSystemPrompt() at runner.ts:2269–2286. The pieces, in order:

SourceSectionResolver
Template system.session.{lead|worker}Base role promptresolveTemplateAsync L61–63
Template system.agent.worker.slackSlack section (workers only)L66–72
agentSoulMd + agentIdentityMd + name/description## Your IdentityL75–90
Installed skills listSkill summaryL93–96
Installed MCP servers listMCP summaryL99–101
Repo CLAUDE.md (from cwd)## Repository ContextL111–118 (read via readClaudeMd at runner.ts:77–86)
Templates system.agent.agent_fs, ...services, ...artifactsConditional suffixesL157–188
agentClaudeMd## Agent Instructions (truncated)L197–207
agentToolsMd## Your Tools & Capabilities (truncated)L210–219
SYSTEM_PROMPT env / --system-promptAppendedrunner.ts:2321–2323
task.additionalSystemPromptAppended per-taskrunner.ts:3093–3097

Truncation caps: BOOTSTRAP_MAX_CHARS=20_000 per section, BOOTSTRAP_TOTAL_MAX_CHARS=150_000 total (base-prompt.ts L16–19).

Identity sources (soulMd, identityMd, claudeMd, toolsMd, heartbeatMd) are fetched from GET /me at runner.ts:2427–2449. If missing, defaults come from the template, then from the generators in src/prompts/defaults.ts, then pushed back to the server.

If the runner reuses an existing repo clone that has local changes, ensureRepoForTask() now auto-stashes that work before refreshing from origin. The resulting swarm-autostash refs are threaded into repoContext.autoStashes and appended to the base prompt so the active session can restore them deliberately instead of silently losing or ignoring dirty work.

Template resolution goes over HTTP (configureHttpResolver(apiUrl, apiKey) at runner.ts:2206) to obey the API/worker DB boundary. Prompt files under src/prompts/ must remain pure — no bun:sqlite or src/be/db imports. Enforced by scripts/check-db-boundary.sh.

How each adapter delivers the prompt

  • Claude — CLI flag: cmd.push("--append-system-prompt", this.config.systemPrompt) at claude-adapter.ts:245–247.
  • Codex — written into AGENTS.md in cwd inside a <swarm_system_prompt> block. The Codex SDK has no --append-system-prompt equivalent; this file is the only channel. Entry: writeCodexAgentsMd(config.cwd, config.systemPrompt) at codex-adapter.ts:761; implementation at codex-agents-md.ts:56–119. If an AGENTS.md exists, the block is prepended; otherwise stacked atop any CLAUDE.md. Cleanup in the session finally at codex-adapter.ts:738.
  • Pi — SDK parameter: new DefaultResourceLoader({ appendSystemPrompt: [config.systemPrompt], ... }) passed via CreateAgentSessionOptions at pi-mono-adapter.ts:508–519.

What this means for a new provider

Pick the delivery path your runtime supports, in order of preference:

  1. A dedicated system-prompt argument (flag, SDK field) — cleanest, no filesystem side effects.
  2. A file-based convention (like Codex's AGENTS.md) — fine, but you must clean up in finally and not clobber user files.
  3. Prepending to the user prompt — last resort; may confuse the model.

If your runtime has a distinct prompt shape (e.g. a different preamble format), add a subdirectory under src/prompts/<foo>/ and invoke it from your adapter, but keep the merge logic in base-prompt.ts as the single source of truth.


7. Skills (how the three providers handle them)

Skills are the swarm's portable procedural knowledge — reusable markdown files invoked as slash-commands like /review-pr, /implement-issue, /create-pr. Each provider surfaces them differently; your adapter must implement formatCommand(name) and may need a resolver if your runtime doesn't have native support.

The three patterns

Claude. The CLI already knows about skills installed under ~/.claude/skills/<name>/SKILL.md. The adapter just returns /<name>.

formatCommand(name: string): string { return `/${name}`; }

See claude-adapter.ts:601.

Pi. Pi-mono supports skills but namespaces them:

formatCommand(name: string): string { return `/skill:${name}`; }

See pi-mono-adapter.ts:541. Skills live at ~/.pi/agent/skills/<name>/SKILL.md.

Codex. The Codex SDK has no skill mechanism, so the adapter resolves the skill itself before calling the model:

  • src/providers/codex-skill-resolver.ts intercepts a leading /<name> in the user prompt.
  • Reads ${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md.
  • Inlines the SKILL.md content into the prompt before calling thread.runStreamed.
  • formatCommand(name) simply returns /${name} so the swarm can still emit the canonical form; the resolver does the work.

This is the template to follow if your provider lacks native skill support.

Where skill files come from

The swarm syncs skills into each provider's skill dir at container boot in docker-entrypoint.sh L764–803. The same SKILL.md content is copied into:

  • ~/.claude/skills/<name>/SKILL.md
  • ~/.pi/agent/skills/<name>/SKILL.md
  • ~/.codex/skills/<name>/SKILL.md

When adding a new provider, extend this block with ~/.<foo>/skills/<name>/SKILL.md (or whatever path your runtime expects).


8. Adding a new provider — step by step

Assume you are adding a provider called foo. Follow every step; "optional" is called out where true.

Step 1 — Scaffold the adapter

Create src/providers/foo-adapter.ts:

import type {
  ProviderAdapter,
  ProviderEvent,
  ProviderResult,
  ProviderSession,
  ProviderSessionConfig,
} from "./types";

export class FooAdapter implements ProviderAdapter {
  readonly name = "foo";

  async createSession(config: ProviderSessionConfig): Promise<ProviderSession> {
    return new FooSession(config);
  }

  async canResume(_sessionId: string): Promise<boolean> {
    return false; // or true if your SDK/CLI supports resume
  }

  formatCommand(commandName: string): string {
    return `/${commandName}`; // adjust to your runtime's convention
  }
}

class FooSession implements ProviderSession {
  sessionId: string | undefined;
  private listeners: Array<(e: ProviderEvent) => void> = [];
  // ... abort controller, pending promise, etc.

  constructor(private config: ProviderSessionConfig) {
    this.start();
  }

  onEvent(listener: (e: ProviderEvent) => void) { this.listeners.push(listener); }
  private emit(event: ProviderEvent) { for (const l of this.listeners) l(event); }

  async waitForCompletion(): Promise<ProviderResult> { /* resolve when native stream ends */ }
  async abort(): Promise<void> { /* kill subprocess / signal AbortController */ }

  private async start() { /* spawn CLI or call SDK; translate events to emit(...) */ }
}

Step 2 — Register in the factory

Edit src/providers/index.ts:

import { FooAdapter } from "./foo-adapter";
// ...
case "foo": return new FooAdapter();

Update the error message (Supported: claude, pi, codex, foo) so the unknown-provider error stays accurate.

Step 3 — Translate native events to ProviderEvent

This is the heart of the adapter. For each event your SDK / CLI emits, decide which ProviderEvent type to produce:

  • Session start → session_init { sessionId } (set this.sessionId first, then emit).
  • Assistant text → message { role: "assistant", content }.
  • Tool call start/end → tool_start / tool_end with { toolCallId, toolName, args|result }.
  • Turn/usage stats → context_usage.
  • Auto-compaction → compaction.
  • Any provider-specific event with no direct mapping → custom { name, data } (e.g. Codex uses custom for codex.reasoning and codex.todo_list).
  • Terminal → result { cost, output, isError, errorCategory } then resolve waitForCompletion().
  • Non-fatal diagnostics → raw_log / raw_stderr.

Reference code:

Step 4 — Emit CostData

The result event must carry a populated CostData so the swarm can track spend:

emit({
  type: "result",
  cost: {
    sessionId: this.sessionId!,
    taskId: config.taskId,
    agentId: config.agentId,
    totalCostUsd, inputTokens, outputTokens,
    cacheReadTokens, cacheWriteTokens,
    durationMs, numTurns, model, isError: false,
  },
  isError: false,
});

If your SDK returns tokens but not USD, compute cost from a pricing table (see src/providers/codex-models.ts for the Codex model/pricing resolver pattern).

Step 5 — Model selection

ProviderSessionConfig.model is set by the runner from opts.model || process.env.MODEL_OVERRIDE || "" and may be overridden per task (task.model). Decide:

  • What is the provider default when model === ""? Read it from a provider-specific env var (CODEX_DEFAULT_MODEL is the existing convention).
  • Do you accept shortnames ("sonnet", "gpt-5") and expand to full IDs? If so, build a resolver — see resolveCodexModel in src/providers/codex-models.ts and resolveModel in src/providers/pi-mono-adapter.ts.

Step 6 — Credentials & auth

Three patterns are already in the codebase; pick the one that fits your provider:

Claude-style. Validate at adapter start; throw a clear error if missing. Example: validateClaudeCredentials() in src/providers/claude-adapter.ts.

Codex ChatGPT-style. See §6 below. This is the most involved path but required for desktop-login-style flows.

Pi-style. Reads ~/.pi/agent/auth.json. If your SDK looks up its own auth file, the adapter may not need to do anything beyond ensuring the file exists at worker boot.

Secret scrubbing: any credential you log must go through the project's scrubber. See src/utils/secret-scrubber.ts and the CLAUDE.md "Secret scrubbing" section.

Step 7 — MCP server injection

Every adapter fetches per-agent MCP servers from the swarm API and wires them into the provider:

GET {apiUrl}/api/agents/{agentId}/mcp-servers?resolveSecrets=true
Authorization: Bearer {apiKey}

Then:

  • Claude writes /tmp/mcp-<taskId>.json and passes it via --mcp-config (claude-adapter.ts L46–183).
  • Pi instantiates an McpHttpClient per HTTP/SSE server and registers tools prefixed mcp__<name>__ (pi-mono-adapter.ts L408–493).
  • Codex builds a structured mcp_servers object for new Codex({ config }) (codex-adapter.ts L132–243).

Always include the swarm's own MCP server with an X-Source-Task-Id header so nested tool calls attribute back correctly.

Step 8 — Swarm event hooks (cancellation, heartbeat, tool-loop detection)

The adapter is responsible for polling swarm-side signals during a run. Pattern files:

  • Codex: src/providers/codex-swarm-events.ts — throttled fireAndForget fetches for cancel/heartbeat/activity/context-usage, attached via this.listeners.push(...) inside the session.
  • Pi: src/providers/pi-mono-extension.tscreateSwarmHooksExtension passed into DefaultResourceLoader({ extensionFactories: [swarmExtension] }).
  • Claude: external hook process reads a task file (/tmp/agent-swarm-task-<pid>.json) written by claude-adapter.ts L31–43; hook logic lives under src/hooks/ (e.g. tool-loop-detection.ts).

Tool-loop detection (src/hooks/tool-loop-detection.ts::checkToolLoop) is reusable — call it from your event translator when you see tool_start.

Step 9 — Skills and slash-commands

Skills are the swarm's portable procedural knowledge (/review-pr, /implement-issue, etc.). Each provider handles them differently:

  • Claude: native slash-commands, so formatCommand(name) => "/" + name (claude-adapter.ts L601).
  • Pi: prefixed, formatCommand(name) => "/skill:" + name (pi-mono-adapter.ts L541); resolved from ~/.pi/agent/skills/<name>/SKILL.md.
  • Codex: no native skills support → src/providers/codex-skill-resolver.ts intercepts a leading /<name> in the prompt, reads ${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md, and inlines it before calling thread.runStreamed. The system prompt is delivered by writing AGENTS.md into cwd (see codex-agents-md.ts).

Pick the model that matches your runtime; if your provider has no native skill mechanism, follow the Codex inline-resolver pattern.

The swarm syncs skill files into each provider's skill dir at container boot in docker-entrypoint.sh (copies SKILL.md into ~/.claude/skills/, ~/.pi/agent/skills/, and ~/.codex/skills/). Add an entry for your provider there.

Step 10 — Worker bootstrap (Docker entrypoint)

Edit docker-entrypoint.sh:

  1. Credential validation branch — mirror the pattern used for pi (L7–12), codex (L13–71), or claude (L72–79).
  2. Binary reachability check — add a block similar to the CODEX_BINARY / CLAUDE_BINARY checks (L87–108).
  3. Skill sync — extend the skill-copy block (L764–803) with your provider's skill directory.
  4. If your provider has a CLI binary, install it in Dockerfile.worker.

Step 11 — Login CLI (only if OAuth)

If your provider needs a user-interactive OAuth flow (like Codex's ChatGPT login), add a CLI command:

  1. Implement PKCE + local callback server. Reference: src/providers/codex-oauth/flow.tscreateAuthorizationFlow (URL with code_challenge=S256), startLocalOAuthServer (node:http on 127.0.0.1:1455/auth/callback), exchangeAuthorizationCode.
  2. Add storage helpers at src/providers/<foo>-oauth/storage.ts that PUT /api/config with { scope: "global", key: "<foo>_oauth", value: JSON.stringify(creds), isSecret: true }. See storeCodexOAuth and getValidCodexOAuth (with auto-refresh) in src/providers/codex-oauth/storage.ts.
  3. Add the CLI command at src/commands/<foo>-login.ts. Reference: src/commands/codex-login.ts — uses promptHiddenInput for masked API-key entry and attempts to auto-open the browser via open / start / xdg-open by platform.
  4. Register the command in src/cli.tsx (non-UI command: console.log + process.exit(0) style) and update COMMAND_HELP.
  5. In docker-entrypoint.sh, restore credentials at boot by fetching them from /api/config/resolved?includeSecrets=true&key=<foo>_oauth and writing the provider's expected auth-file format (see the codex block, L13–71, for the jq-based reshape).
  6. At adapter session-creation time, re-fetch-and-refresh as a fallback if the token is expired (see codex-adapter.ts L810–844).

Step 12 — Types & enums

Update union-type entries that enumerate providers:

  • src/types.tsHarnessProvider union.
  • templates/schema.ts — template provider enum.
  • Any migration that stores a provider column — do not modify existing migrations; create a new one under src/be/migrations/ if a schema update is needed.

Step 13 — Prompts

If your provider benefits from a distinct system-prompt shape, add src/prompts/<foo>/ mirroring src/prompts/claude/ and src/prompts/codex/. Wire it into the adapter's createSession. Prompt files must remain pure (no DB imports) — the DB boundary is enforced by scripts/check-db-boundary.sh.

Step 14 — Tests

Add at minimum:

  • src/tests/<foo>-adapter.test.ts — unit tests for event translation (feed fake native events, assert emitted ProviderEvents).
  • src/tests/<foo>-oauth.test.ts (if OAuth) — PKCE helpers, storage round-trip, token refresh.

Existing analogs: src/tests/codex-*.test.ts, src/tests/claude-*.test.ts, src/tests/pi-*.test.ts. Tests must use isolated SQLite files and clean up -wal / -shm in afterAll (see the "Unit tests" block in CLAUDE.md).

Step 15 — Documentation

  • Update CLAUDE.md: add foo to the HARNESS_PROVIDER accepted values list and document any required env vars.
  • Update this guide's "Reference implementations" table.
  • Update the README's "Multi-provider" line.
  • Update Harness Configuration with the new provider's setup instructions.
  • If the provider needs new HTTP endpoints, regenerate OpenAPI: bun run docs:openapi.

9. Codex OAuth: the full reference flow

This is documented separately because it is the most involved integration.

┌────────────────┐   codex-login CLI      ┌──────────────────┐
│ User's laptop  │ ─── PKCE auth URL ──▶  │ auth.openai.com  │
│                │ ◀── code (state) ───── │ OAuth server     │
│                │                        └──────────────────┘
│ Local callback │         ▲
│ :1455/auth/... │─────────┘
└────────┬───────┘
         │ exchangeAuthorizationCode

┌────────────────┐
│  Creds JSON    │
│  (tokens)      │
└────────┬───────┘
         │ PUT /api/config { key:"codex_oauth", isSecret:true }

┌────────────────────────────┐
│ swarm_config (encrypted)   │
└────────┬───────────────────┘
         │ docker-entrypoint.sh fetches at boot

┌────────────────────────────┐
│ ~/.codex/auth.json (0600)  │
└────────────────────────────┘

Key files: src/providers/codex-oauth/{flow.ts,storage.ts,auth-json.ts,pkce.ts,types.ts}, src/commands/codex-login.ts, docker-entrypoint.sh L13–71, adapter fallback codex-adapter.ts L810–844.

The shape conversion (our flat {access, refresh, expires, accountId} → Codex CLI's {auth_mode: "chatgpt", tokens: {...}}) lives in credentialsToAuthJson() at src/providers/codex-oauth/auth-json.ts L37–49.

See also: Codex OAuth setup guide.


10. Pre-PR checklist for a new provider

Run before opening the PR (per CLAUDE.md):

bun run lint:fix
bun run tsc:check
bun test
bash scripts/check-db-boundary.sh
bun run docs:openapi   # only if you added HTTP endpoints

Manual verification:

  • HARNESS_PROVIDER=foo bun run src/cli.tsx worker starts and connects.
  • A trivial task ("Say hi") runs to completion and posts progress + cost.
  • cancel-task via MCP actually aborts the in-flight run.
  • docker build -f Dockerfile.worker . succeeds.
  • Full E2E with Docker (see CLAUDE.md "E2E testing with Docker") with -e HARNESS_PROVIDER=foo.
  • For OAuth providers: run bun run src/cli.tsx <foo>-login end-to-end, then boot a worker in Docker and verify it picks up the stored creds.

11. Files to touch — quick checklist

ConcernFile(s)
Adapter implementationsrc/providers/<foo>-adapter.ts
Factory registrationsrc/providers/index.ts
Types / enumssrc/types.ts, templates/schema.ts
Prompts (optional)src/prompts/<foo>/
OAuth (optional)src/providers/<foo>-oauth/*, src/commands/<foo>-login.ts
Setup CLI (optional, e.g. claude-managed)src/commands/<foo>-setup.ts
CLI wiringsrc/cli.tsx (COMMAND_HELP, command routing)
Docker bootstrapdocker-entrypoint.sh, Dockerfile.worker
Hooks (optional)src/hooks/*, src/providers/<foo>-swarm-events.ts
Skills resolver (if provider lacks native support)src/providers/<foo>-skill-resolver.ts
Models / pricing (optional)src/providers/<foo>-models.ts
Testssrc/tests/<foo>-*.test.ts
Integrations UI (optional)ui/src/lib/integrations-catalog.ts
DocsCLAUDE.md, README.md, this guide, Harness Configuration

claude-managed reference files: src/providers/claude-managed-adapter.ts, src/providers/claude-managed-swarm-events.ts, src/providers/claude-managed-models.ts, src/commands/claude-managed-setup.ts, ui/src/lib/integrations-catalog.ts.


12. Claude Managed Agents — pre-existing Agent + Environment pattern

claude-managed is the first reference adapter where the session runtime executes outside the worker container: the worker only opens an SSE stream against client.beta.sessions.events.stream and relays normalized events to the runner. This forces a few design decisions that are worth calling out — they apply to any future provider with a similar "managed cloud session" shape (Devin's /sessions API is the closest existing analog).

a. We don't agents.create at runtime

Anthropic's beta API has a 1:1 Agent ↔ identity model. Calling client.beta.agents.create(...) from each worker on each task would (a) leak agents into the customer's account at the rate of one per task, and (b) make skill / tool inventory non-deterministic per session. Instead, we treat the Agent and Environment as persistent infrastructure, created once during operator onboarding and persisted by ID. The adapter only ever calls client.beta.sessions.create({ agent: MANAGED_AGENT_ID, environment_id: MANAGED_ENVIRONMENT_ID, ... }).

b. The claude-managed-setup CLI

bun run src/cli.tsx claude-managed-setup

Implementation: src/commands/claude-managed-setup.ts. Behavior:

  1. Reads ANTHROPIC_API_KEY from .env / env (or prompts).
  2. Creates the Environment (client.beta.environments.create) — the long-lived sandbox configuration (allowed networks, default packages, persistent volumes).
  3. Uploads each plugin/commands/*.md skill via client.beta.skills.create (one-shot per skill content hash; the CLI dedupes against an existing inventory).
  4. Creates the Agent (client.beta.agents.create) and attaches the freshly uploaded skills.
  5. PUT /api/config persists MANAGED_AGENT_ID + MANAGED_ENVIRONMENT_ID into swarm_config so deployed workers (and the integrations UI) can restore them at boot.
  6. Re-run with --force to recreate (rare — only if upstream rotates IDs).

This is the shape every future "managed cloud session" provider should follow: setup-once → IDs live in swarm_config → workers fail-fast at boot if absent.

c. System prompt in the user message + prompt-cache breakpoint

Managed-agents has no system field on sessions.create. The closest analog is client.beta.sessions.events.send({ events: [{ type: "user.message", content: [...] }] }). We compose the swarm's full assembled systemPrompt as the first content block and the per-task prompt as the second:

[
  { type: "text", text: <full system prompt + agent identity + skills>, cache_control: { type: "ephemeral" } },
  { type: "text", text: `User request:\n\n${prompt}` },  // no cache_control
]

The cache_control: { type: "ephemeral" } marker on the first block creates a prompt-cache breakpoint — Anthropic caches everything up to that boundary across sessions for the same agent, so subsequent tasks for the same agent re-use the static prefix at cache-read pricing. The per-task block sits after the breakpoint and is allowed to differ without invalidating the cache.

This is enforced by composeManagedUserMessage in claude-managed-adapter.ts (asserted byte-identical-prefix in src/tests/claude-managed-adapter.test.ts).

d. X-Source-Task-Id is dropped

The MCP integration §5 calls out that every adapter MUST set X-Source-Task-Id on the swarm MCP connection. claude-managed cannot. The MCP servers are configured server-side on the Anthropic-managed Agent (not per-session), and the SDK doesn't expose a per-session HTTP-header override. We instead pass the task ID via metadata.swarmTaskId on sessions.create, and the swarm MCP tools accept task_id as an explicit tool argument when the header is missing. New providers that hit the same constraint should follow this fallback.

e. Skill upload via beta.skills.create

Skills are not synced into a filesystem path on the worker (the worker doesn't run the model). They're uploaded once during claude-managed-setup via client.beta.skills.create({ content, name, ... }) and referenced by ID on the Agent. The skill content is the same plugin/commands/*.md body that other providers copy into ~/.claude/skills/<name>/SKILL.md — so the source of truth stays in the repo.

f. SDK shape deviations to be aware of

The Anthropic Beta SDK has a few non-obvious surface differences from the conventional Anthropic messages.create API. The header comments in src/providers/claude-managed-adapter.ts (top-of-file block, ~L13–35) document them in detail, but the highlights for anyone reading the code:

  • Resource type is github_repository, not github_repo. The SDK type is BetaGitHubRepositoryResource and the literal field is type: "github_repository".
  • events.send takes { events: [...] } — an array, not a single event arg. The naming makes it look like events.send(event) would work; it would not.
  • Session status enum is 'rescheduling' | 'running' | 'idle' | 'terminated'. "Archived" is not a status — it's signaled by archived_at !== null. canResume() therefore rejects on terminated or non-null archived_at.
  • cache_control is a runtime-honored field that's NOT in the TS definition for BetaManagedAgentsTextBlock. We attach it via a typed extension and cast on the way out so the runtime payload includes it.
  • events.stream returns an AsyncIterable, not a Promise of an array — iterate with for await.
  • events.list is a PagePromise that's also AsyncIterable over historical session events; the resume path uses it to pre-fetch + dedupe against the live stream.

13. Further reading

On this page

1. The ProviderAdapter contractWhat each member doesProviderSessionConfig inputsProviderEvent (the normalized stream)2. Reference implementationspi-mono + Amazon Bedrock authCost & context trackingSwitching providers without restartPer-task outputSchema support3. How a harness run fits into a task lifecycleThe flowSession-end identity / config sync (FS → DB)What fields on the task become ProviderSessionConfigAPI endpoints touched per taskResume semantics4. Raw session logs & the task details pagePath conventionWhat each adapter writes to logFileHow raw logs reach the task details pageWhat this means for a new provider5. Exposing the swarm MCP to the runtimeWhere the MCP server livesHow each reference adapter wires itRequired headersKey tools a harness will call mid-runFallback when MCP is unavailable6. System prompt composition & deliveryHow it is builtHow each adapter delivers the promptWhat this means for a new provider7. Skills (how the three providers handle them)The three patternsWhere skill files come from8. Adding a new provider — step by stepStep 1 — Scaffold the adapterStep 2 — Register in the factoryStep 3 — Translate native events to ProviderEventStep 4 — Emit CostDataStep 5 — Model selectionStep 6 — Credentials & authStep 7 — MCP server injectionStep 8 — Swarm event hooks (cancellation, heartbeat, tool-loop detection)Step 9 — Skills and slash-commandsStep 10 — Worker bootstrap (Docker entrypoint)Step 11 — Login CLI (only if OAuth)Step 12 — Types & enumsStep 13 — PromptsStep 14 — TestsStep 15 — Documentation9. Codex OAuth: the full reference flow10. Pre-PR checklist for a new provider11. Files to touch — quick checklist12. Claude Managed Agents — pre-existing Agent + Environment patterna. We don't agents.create at runtimeb. The claude-managed-setup CLIc. System prompt in the user message + prompt-cache breakpointd. X-Source-Task-Id is droppede. Skill upload via beta.skills.createf. SDK shape deviations to be aware of13. Further reading