Adding a Harness Provider
Implement a new harness provider (like claude, pi, or codex) — the full adapter contract, reference implementations, and wiring checklist
This guide documents the harness-provider contract used by agent-swarm workers, the three reference implementations (claude, pi, codex), and every hook that must be wired when adding a new provider.
A harness provider is the runtime that actually drives an LLM: it owns the subprocess (or in-process SDK) that reads the user prompt, talks to the model, invokes tools, and streams events back. The swarm treats providers as plug-ins behind a single TypeScript interface (ProviderAdapter). Workers select one at boot via HARNESS_PROVIDER.
For configuring an existing provider, see Harness Configuration. This guide is for implementing a new one.
Supported today: claude (Anthropic Claude Code CLI), pi (pi-mono, in-process via @earendil-works/pi-coding-agent), codex (OpenAI Codex SDK hosted inside a per-task subprocess), devin (Cognition Devin via /sessions), claude-managed (Anthropic Managed Agents — sessions execute in Anthropic's cloud sandbox), opencode (in-process @opencode-ai/sdk server with SSE event mapping; experimental, rolling out across DES-295 → DES-299).
1. The ProviderAdapter contract
Source of truth: src/providers/types.ts.
export interface ProviderAdapter {
readonly name: string;
createSession(config: ProviderSessionConfig): Promise<ProviderSession>;
canResume(sessionId: string): Promise<boolean>;
formatCommand(commandName: string): string;
}
export interface ProviderSession {
readonly sessionId: string | undefined;
onEvent(listener: (event: ProviderEvent) => void): void;
waitForCompletion(): Promise<ProviderResult>;
abort(): Promise<void>;
}What each member does
| Member | Responsibility |
|---|---|
name | Short identifier used in logs ("claude", "pi", "codex"). |
createSession(config) | Spin up a new session for a task. Must not block on model output — return a ProviderSession immediately and stream events asynchronously. |
canResume(sessionId) | Return true iff the provider can continue a previous session by ID. Used when the runner resumes paused work. |
formatCommand(commandName) | Map a swarm slash-command (e.g. "review-pr") to the form the underlying CLI expects (e.g. /review-pr, /skill:review-pr, or inlined via skill resolver). |
session.onEvent | Register a listener. The adapter must call every registered listener for every ProviderEvent. |
session.waitForCompletion() | Resolve once the session ends; returns { exitCode, sessionId, cost, output, isError, errorCategory, failureReason }. |
session.abort() | Cancel in flight (SIGTERM, SDK abort signal, etc.). Must be idempotent. |
ProviderSessionConfig inputs
interface ProviderSessionConfig {
prompt: string; // user prompt
systemPrompt: string; // composed system prompt
model: string; // resolved model id or ""
role: string; // "lead" | "worker" | ...
agentId: string;
taskId: string;
apiUrl: string; // swarm API base URL for callbacks
apiKey: string; // swarm API key
cwd: string; // workspace dir
/** @deprecated Always undefined — native resume removed in the 2026-05-28 plan. */
resumeSessionId?: string;
iteration?: number;
logFile: string; // jsonl log path
additionalArgs?: string[];
env?: Record<string, string>;
}ProviderEvent (the normalized stream)
Every provider translates its native events into this tagged union:
type ProviderEvent =
| { type: "session_init"; sessionId: string }
| { type: "message"; role: "assistant" | "user"; content: string }
| { type: "tool_start"; toolCallId: string; toolName: string; args: unknown }
| { type: "tool_end"; toolCallId: string; toolName: string; result: unknown }
| { type: "result"; cost: CostData; output?: string; isError: boolean; errorCategory?: string }
| { type: "error"; message: string; category?: string }
| { type: "raw_log"; content: string }
| { type: "raw_stderr"; content: string }
| { type: "custom"; name: string; data: unknown }
| { type: "context_usage"; contextUsedTokens: number; contextTotalTokens: number; contextPercent: number; outputTokens: number }
| { type: "compaction"; preCompactTokens: number; compactTrigger: "auto" | "manual"; contextTotalTokens: number };The runner consumes these events (not the provider's native ones) to post progress to the swarm API, detect tool loops, charge cost, and update task state. Implementing this translation is the bulk of the work for a new provider.
2. Reference implementations
| File | Transport | Auth |
|---|---|---|
src/providers/claude-adapter.ts | Bun.spawn of claude CLI with --output-format stream-json, JSONL stdout parsing | CLAUDE_CODE_OAUTH_TOKEN or ANTHROPIC_API_KEY |
src/providers/pi-mono-adapter.ts | In-process via createAgentSession from @earendil-works/pi-coding-agent; no subprocess | ANTHROPIC_API_KEY / OPENROUTER_API_KEY / ~/.pi/agent/auth.json — or, when MODEL_OVERRIDE=amazon-bedrock/*, anything the AWS SDK default chain accepts (see pi-mono + Amazon Bedrock below) |
src/providers/codex-adapter.ts | Parent adapter spawns src/commands/codex-session-runner.ts, sends session config over stdin, and receives line-delimited events/results over stdout; the child hosts @openai/codex-sdk Codex / Thread and streams ThreadEvents back | OPENAI_API_KEY or ChatGPT OAuth stored in swarm_config.codex_oauth (see §6) |
src/providers/claude-managed-adapter.ts | Anthropic SDK client.beta.sessions.events.stream (SSE); session executes in Anthropic's managed cloud sandbox, worker is a thin relay | ANTHROPIC_API_KEY plus pre-existing MANAGED_AGENT_ID + MANAGED_ENVIRONMENT_ID from one-time claude-managed-setup CLI (see §13) |
pi-mono + Amazon Bedrock auth
pi-mono routes to Amazon Bedrock when MODEL_OVERRIDE=amazon-bedrock/<model-id> (e.g. amazon-bedrock/anthropic.claude-sonnet-4-20250514-v1:0). Bedrock authenticates through the AWS SDK's default credential chain — not a single API key — so agent-swarm delegates credential resolution to the SDK. The boot credential gate detects the amazon-bedrock/ prefix and short-circuits to satisfiedBy: "sdk-delegated" without inspecting any AWS env var or file.
Accepted credential sources — anything the AWS SDK accepts:
AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEY(+ optionalAWS_SESSION_TOKEN)AWS_PROFILEresolved against~/.aws/credentialsand~/.aws/config- AWS SSO sessions configured in
~/.aws/config - EC2 IMDS instance role / ECS task role
- Web-identity / OIDC (
AWS_WEB_IDENTITY_TOKEN_FILE,AWS_ROLE_ARN) credential_processand assume-role chains
AWS_REGION (or AWS_DEFAULT_REGION) is required by the SDK and must be a Bedrock-enabled region.
Failure mode. No credential-wait parking — the worker claims tasks immediately. If creds are missing, the first Bedrock inference call fails with the AWS SDK's own error (e.g. CredentialsProviderError: Could not load credentials from any providers), which surfaces in the session log via the existing error path (scrubbed through scrubSecrets). Treat this the same way you treat a stale codex auth.json: the underlying SDK is the source of truth.
This is intentionally the same pattern codex uses for OAuth credentials (codexAuthFileExists → presenceCheckOk in src/commands/provider-credentials.ts) — presence-only gate, runtime validation owned by the adapter/SDK, no upstream call from agent-swarm. pi-ai does not currently expose a Bedrock-specific credential check we could call; if it does in the future, the live-test branch can be upgraded without touching the boot gate.
Cost & context tracking
Every adapter emits one CostData row per CLI invocation and one or more context_usage events per session. The dollar value is recomputed server-side against the seeded pricing table (Phase 2 of the cost-tracking plan); the resulting costSource enum is surfaced in the UI. The unified input + cache + output context formula (Phase 9) replaces every adapter's previous per-provider arithmetic so cross-provider percent comparisons make sense.
Full story: Cost & context computation.
All four are wired together by the factory at src/providers/index.ts:
export async function createProviderAdapter(provider: string): Promise<ProviderAdapter> {
switch (provider) {
case "claude":
return new ClaudeAdapter();
case "pi": {
const { PiMonoAdapter } = await import("./pi-mono-adapter");
return new PiMonoAdapter();
}
case "codex":
return new CodexAdapter();
case "claude-managed":
return new ClaudeManagedAdapter();
default:
throw new Error(`Unknown HARNESS_PROVIDER: "${provider}". Supported: claude, pi, codex, claude-managed`);
}
}The runner reads the resolved HARNESS_PROVIDER and awaits the factory at boot. The async factory is intentional: providers with module-level side effects (notably pi) are lazy-loaded only when selected, so unused adapters cannot crash unrelated workers during startup. The resolved value comes from swarm_config overlaid on process.env, with this precedence (highest first):
swarm_configHARNESS_PROVIDER— repo > agent > global, via/api/config/resolvedprocess.env.HARNESS_PROVIDER— container env"claude"— default
Resolution lives in src/utils/harness-provider.ts (resolveHarnessProvider); it's threaded through fetchResolvedEnv in src/commands/runner.ts so credential pool selection uses the same resolved value.
Switching providers without restart
The worker re-evaluates the resolved provider on each poll iteration (throttled to ~10s). When the value changes — typically because an operator wrote a new swarm_config row, or called PATCH /api/agents/{id}/harness-provider (which mirrors the value into swarm_config at scope=agent) — the worker:
- Creates a fresh adapter via
createProviderAdapter(resolvedProvider). - Updates
state.harnessProvider. - Rebuilds
basePromptso traits-driven prompt sections (e.g. local-environment vs. cloud-managed) match the new provider. - Resets the cached
cred_statussnapshot so the dashboard shows credential health for the new adapter.
In-flight task sessions hold their own ProviderSession reference and continue on the old adapter unaffected; only future spawns pick up the swap. Failures during reconciliation (network blip, invalid value, adapter init error) log a warning and stay on the current provider — the worker is never wedged by a bad config.
Invalid HARNESS_PROVIDER values are rejected at write time (validateConfigValue in src/be/swarm-config-guard.ts), so a typo via PUT /api/config or the set-config MCP tool returns 400 instead of being silently stored.
Per-task outputSchema support
Tasks may carry an optional JSON Schema on outputSchema that the agent's final output must conform to. Enforcement depends on the harness:
| Provider | Supported | Notes |
|---|---|---|
claude | Yes | Via MCP store-progress; CLI extraction fallback if missed |
claude-managed | Yes | Via MCP store-progress |
codex | Yes | Via MCP store-progress |
opencode | Yes | Via MCP store-progress |
pi (pi-mono) | Yes | Via MCP store-progress |
devin | Yes | Via MCP store-progress when HAS_MCP=true; otherwise the runner validates direct providerOutput before finishing the task |
The primary enforcement point is the MCP store-progress tool: a non-conforming output fails the tool call and the agent is asked to retry. For providers that return direct providerOutput to the runner, ensureTaskFinished() now performs the same JSON-parse + schema-validation check before persisting task.output; if validation fails, the task is marked failed instead of silently storing invalid structured output.
3. How a harness run fits into a task lifecycle
Every harness run is scoped to exactly one task. The runner owns the task's lifecycle and gives the adapter only what it needs to execute that single unit of work.
The flow
All steps live in src/commands/runner.ts.
| Step | What it does |
|---|---|
pollForTrigger | GET /api/poll until a Trigger is returned |
buildPromptForTrigger | Constructs the user prompt from the trigger |
fetchRelevantMemories | Enriches context (memory vectors + installed skills) |
buildSystemPrompt | Composes the full system prompt (see §6) |
spawnProviderProcess | Calls adapter.createSession(config) |
session.onEvent(...) | Each ProviderEvent → API call (progress, logs, cost, context) |
session.waitForCompletion | Blocks until the adapter resolves |
ensureTaskFinished | POST /api/tasks/{id}/finish |
syncProfileFilesToServer | Session-end FS → DB sync of the agent's self-editable files (see below). Runs for every hasLocalEnvironment harness. |
One task → one session. The runner tracks concurrent tasks in state.activeTasks: Map<taskId, RunningTask> up to MAX_CONCURRENT_TASKS. An adapter never needs to multiplex tasks internally — if the worker should handle two at once, the runner spawns two sessions.
Session-end identity / config sync (FS → DB)
When a session finishes, the runner syncs the agent's self-editable files back to the API so edits the agent made during the session persist into its profile (context_versions):
SOUL.md/IDENTITY.md/TOOLS.md/HEARTBEAT.md(bundled identity update)- the
CLAUDE.mdsource — provider-dependent (see below) - the agent-managed section of
/workspace/start-up.sh(setupScript)
The sync lives in src/commands/profile-sync.ts (syncProfileFilesToServer) and is called from checkCompletedProcesses in runner.ts. The decision to sync is made per finished session, using the provider / local-env trait snapshotted on RunningTask at spawn time — not the mutable global state.hasLocalEnvironment. The runner lets an in-flight session finish on its original adapter after a live provider swap, so reading the global would skip a session that started local (worker since flipped remote) or sync stale local files after a remote session finished (worker since flipped local). The sync fires when any finished session in the batch ran in a local environment:
| Provider | local environment | Session-end sync |
|---|---|---|
claude, pi, codex, opencode | yes | Yes — when the finished session ran on this provider |
devin, claude-managed | no | No — these have no /workspace FS |
CLAUDE.mdsource is provider-routed (resolveClaudeMdPath).claudeedits its personal file at~/.claude/CLAUDE.md(also synced by the Claude Stop hook); every other local harness (codex/pi/opencode) edits/workspace/CLAUDE.md— the file the runner materializes from theclaudeMdDB field at boot and that the base-prompt truncation notice points them to. An all-claudebatch syncs the personal file (the runner is a backstop and never overwrites it with the stale workspace copy); any non-claudesession in the batch routes to/workspace/CLAUDE.md. This closes the previously-open non-ClaudeCLAUDE.mdFS → DB gap.
Notes:
- Harness-agnostic, runner-driven. Because the sync runs in the runner — at the single point where every completed session converges, including crashes — it does not depend on the harness emitting a shutdown event. This is what makes it reliable for
codex/opencode(which previously had no sync path) and forpi(whose in-extensionsession_shutdownsync could silently not-fire). The Claude plugin Stop hook (src/hooks/hook.ts) and the pi extension (src/providers/pi-mono-extension.ts) still run their own sync; the runner-level call is the authoritative backstop. - Idempotent. The profile route only writes a new
context_versionsrow when the content hash changes (updateAgentProfileinsrc/be/db.ts), so a redundant sync — pi's extension + runner double-POST, or an unchanged file — collapses to a no-op. - Non-fatal but visible. A failed sync never fails the task, but unlike the original copies it checks
resp.okand logs a scrubbed warning instead of silently swallowing the error. - Consequence (the "Rule 19" reversion). Because the FS is now the authoritative source at session end for all local-environment harnesses, a Lead
update-profileedit to another agent'sSOUL.md/IDENTITY.md/setupScript(which writes only the DB row, not the remote container's FS) is reverted on that agent's next session end. Previously onlyclaudeexhibited this; nowpi(reliably),codex, andopencodedo too. Pushing Lead edits all the way to a remote agent's FS is a separate, unsolved problem.
What fields on the task become ProviderSessionConfig
Assembled around runner.ts L1582–1596 and L3093–3133:
ProviderSessionConfig field | Comes from |
|---|---|
prompt | buildPromptForTrigger(trigger, task, memories, ...), optionally prefixed with a bounded follow-up context preamble when task.parentTaskId is set |
systemPrompt | buildSystemPrompt() + task.additionalSystemPrompt (see §6) |
model | task.model → MODEL_OVERRIDE env → "" |
agentId, taskId, apiUrl, apiKey | Worker env + task row |
cwd | task.dir → repoContext.clonePath → process.cwd() (L3050–3072) |
resumeSessionId | Deprecated — always undefined. The runner stopped threading native session resume in the 2026-05-28 plan; follow-up continuity flows entirely through the context preamble prepended to prompt. |
iteration | Retry counter (runner state) |
logFile | ${logDir}/${timestamp}-${taskIdSlice}.jsonl (L3102) — see §4 |
env | Merged worker env + task-provided env |
API endpoints touched per task
The runner (not the adapter) owns all of these. They are listed here so you know what the adapter's event stream is ultimately driving:
POST /api/tasks/{id}/progress— human-readable progress (L402–410)POST /api/tasks/{id}/context— oncontext_usage,compaction, completion (L1777, L1797, L1900)POST /api/session-logs— onraw_log(L983, see §4)POST /api/events/batch— tool/session events (L1648)PUT /api/tasks/{id}/claude-session— onsession_init(L1037)POST /api/session-costs— onresult(L1014)POST /api/active-sessions/DELETE /api/active-sessions/by-task/{id}— L1178, L1199POST /api/tasks/{id}/pause/resume— L711, L797POST /api/tasks/{id}/finish— L579GET /cancelled-tasks?taskId=...— L2909 (also polled by adapter-side hooks)
Resume semantics
Native session resume (the claude --resume <UUID> CLI flag and codex.resumeThread(id) SDK call) was removed in the 2026-05-28 deprecation plan (thoughts/taras/plans/2026-05-28-deprecate-native-resume.md). The reasoning: native resume relied on an on-disk transcript that disappears when the worker container restarts (deploy, OOM, autoscaler reschedule), and the harness then either errored out or silently spawned a context-less session. The bounded context preamble survives any worker restart because it is rebuilt from the parent-task chain held in the API DB.
- Parent → child continuity is now single-layered: the runner prepends a bounded preamble (cap ~2000 tokens, see
src/commands/context-preamble.ts) for all providers whentask.parentTaskIdis set, and adapters always spawn a fresh harness session.resolveResumeSessionis preserved as an observability shim that logs which session ids would have been used; it never returns aresumeSessionId. - Pause → resume: a paused task restarts with a new session, re-applying
task.progressviabuildResumePrompt(L809–829); if the task is also part of a parent chain, the context-preamble path is applied before execution resumes. - Adapter behavior: claude / claude-managed / codex each warn and ignore any stray
resumeSessionIdthey receive; the runner no longer sets one.CodexAdapter.canResume()returnsfalseunconditionally.
What your adapter owes the task: emit session_init as early as possible (so the runner can persist the provider session id), emit tool_start/tool_end faithfully (so the UI shows the agent's work), and emit a result with populated CostData before your waitForCompletion() resolves.
4. Raw session logs & the task details page
The logFile field in ProviderSessionConfig is not optional decoration — it is the system of record for what happened inside a run. The task details UI reads from this pipeline.
Path convention
${LOG_DIR:-/logs}/<sessionId>/<timestamp>-<taskId8>.jsonlConstructed at runner.ts:3102. LOG_DIR defaults to /logs in Docker workers, so the effective path is /workspace/logs/<sessionId>/<...>.jsonl. The runner writes the first line — a metadata record — at L3120 before spawning.
What each adapter writes to logFile
All three open the file with Bun.file(config.logFile).writer() and append JSONL lines.
- Claude (
claude-adapter.ts):- Raw NDJSON stdout from the Claude CLI is piped through (L284).
- stderr wrapped as
{type: "stderr", content, timestamp}(L321–323). - File handle closed at L329.
- Pi-mono (
pi-mono-adapter.ts):- Every normalized
ProviderEventis written as{...event, timestamp}insideemit()(L167–187). Closed inrunSession'sfinallyat L327.
- Every normalized
- Codex (
codex-adapter.ts):- Every
ProviderEventwritten with timestamp inemit()(L347–374). - Raw SDK
ThreadEventmirrored asraw_logat L466. - Closed at L734.
- Every
Secret scrubbing is mandatory at every log egress. Import scrubSecrets from src/utils/secret-scrubber.ts and wrap every string before writing. All three reference adapters do this; a new adapter must too (see claude L284, L294, L303, L320, L344; pi L170–173; codex L352–355).
How raw logs reach the task details page
The .jsonl file on disk is the adapter-side dump. The UI does not read the file directly — it reads the DB-backed copy.
adapter emits raw_log ─▶ runner flushLogBuffer ─▶ POST /api/session-logs
│
▼
session_logs (SQLite)
│
▼
ui ─ useTaskSessionLogs ─▶ GET /api/tasks/{id}/session-logs
│
▼
<SessionLogViewer />- Runner upload:
runner.ts:1814–1833. Onlyraw_logtriggers the remote push;raw_stderris pretty-printed to worker stdout only (L1834–1836). - API write:
src/http/session-data.tsL135–153 →createSessionLogsinsrc/be/db. - API read: same file L155–166 —
GET /api/tasks/{taskId}/session-logs→getSessionLogsByTaskId. - UI hook:
useTaskSessionLogsinui/src/api/hooks/use-tasks.ts, consumed atui/src/pages/tasks/[id]/page.tsx:42and rendered by<SessionLogViewer />atui/src/components/shared/session-log-viewer.tsx.
What this means for a new provider
Your emit() implementation must do three things for the UI to light up:
- Write every event to
logFileas JSONL (for offline diagnostics //workspace/logs/). - Emit a
raw_logProviderEventfor anything the user might want to inspect in the task details page. The runner will upload it. - Run every string through
scrubSecretsbefore emitting or writing.
Tool calls (tool_start/tool_end) are not shown via session-logs — they go to /api/events/batch. You still need to emit them, but they reach the UI through a different channel (the agent's tool-activity timeline).
5. Exposing the swarm MCP to the runtime
This is the single most important integration point after event translation. The swarm MCP server is how the agent actually interacts with the swarm: store progress, offer subtasks, read/write memory, request human input, cancel itself. A provider that runs code but cannot call swarm MCP tools is effectively a read-only model invocation — it will never drive real swarm behavior.
Where the MCP server lives
The swarm API exposes its MCP server at {apiUrl}/mcp. Tools are defined under src/tools/.
How each reference adapter wires it
| Provider | Wiring | File |
|---|---|---|
| Claude | Discovers an existing .mcp.json (walking up from cwd), injects X-Source-Task-Id into the agent-swarm entry, writes a per-session copy to /tmp/mcp-<taskId>.json, launches CLI with --mcp-config <path> --strict-mcp-config. | claude-adapter.ts L117–184, L251 |
| Pi | Constructs McpHttpClient(apiUrl, apiKey, agentId, taskId), calls listTools(), wraps each as a pi-mono ToolDefinition with prefix mcp__<name>__. | pi-mono-adapter.ts L410–421, pi-mono-mcp-client.ts L53 |
| Codex | buildCodexConfig registers mcp_servers["agent-swarm"] = { url: "{apiUrl}/mcp", http_headers: { Authorization, X-Agent-ID, X-Source-Task-Id }, bearer_token_env_var, startup_timeout_sec }. Passed to new Codex({ config }). | codex-adapter.ts L132–243, L857–861 |
Required headers
Whatever the transport, the adapter MUST set these three headers on the swarm MCP connection:
Authorization: Bearer ${apiKey}X-Agent-ID: <agentId>X-Source-Task-Id: <taskId>— so nested tool calls attribute back to the right task
Key tools a harness will call mid-run
Pretty labels at runner.ts:254–317. Representative list:
store-progress— structured progress updates + memoriesoffer-task/send-task— delegate to another agentcancel-task/poll-task— lifecyclememory-search/memory-get/inject-learning— shared memoryget-task-details,post-message,read-messages— inter-agent coordinationrequest-human-input— HITL gatestrigger-workflow— start a workflow DAG
Fallback when MCP is unavailable
Each reference adapter fails open (the run continues without MCP tools), but the agent is effectively blind to the swarm:
- Pi:
try/catcharound discovery (pi-mono-adapter.tsL409–424) — on failure,customTools = []. - Codex: adds the
agent-swarmentry unconditionally; failures fetching installed servers are non-fatal (codex-adapter.tsL217–229). - Claude:
createSessionMcpConfigreturnsnullif nothing found; CLI runs without--mcp-config.
Even if the in-process MCP connection fails, the runner and the adapter's own swarm-event hooks still talk to the swarm HTTP API directly for lifecycle, heartbeat, and cancellation polling — see src/providers/codex-swarm-events.ts and src/providers/pi-mono-extension.ts L1–50. That is the safety net; MCP is what lets the model call tools.
6. System prompt composition & delivery
ProviderSessionConfig.systemPrompt is the full assembled system prompt, not a fragment. Your adapter's job is to hand it to the underlying runtime verbatim — not to add preamble.
How it is built
Composed by getBasePrompt(args) at src/prompts/base-prompt.ts:55–225 and orchestrated by buildSystemPrompt() at runner.ts:2269–2286. The pieces, in order:
| Source | Section | Resolver |
|---|---|---|
Template system.session.{lead|worker} | Base role prompt | resolveTemplateAsync L61–63 |
Template system.agent.worker.slack | Slack section (workers only) | L66–72 |
agentSoulMd + agentIdentityMd + name/description | ## Your Identity | L75–90 |
| Installed skills list | Skill summary | L93–96 |
| Installed MCP servers list | MCP summary | L99–101 |
Repo CLAUDE.md (from cwd) | ## Repository Context | L111–118 (read via readClaudeMd at runner.ts:77–86) |
Templates system.agent.agent_fs, ...services, ...artifacts | Conditional suffixes | L157–188 |
agentClaudeMd | ## Agent Instructions (truncated) | L197–207 |
agentToolsMd | ## Your Tools & Capabilities (truncated) | L210–219 |
SYSTEM_PROMPT env / --system-prompt | Appended | runner.ts:2321–2323 |
task.additionalSystemPrompt | Appended per-task | runner.ts:3093–3097 |
Truncation caps: BOOTSTRAP_MAX_CHARS=20_000 per section, BOOTSTRAP_TOTAL_MAX_CHARS=150_000 total (base-prompt.ts L16–19).
Identity sources (soulMd, identityMd, claudeMd, toolsMd, heartbeatMd) are fetched from GET /me at runner.ts:2427–2449. If missing, defaults come from the template, then from the generators in src/prompts/defaults.ts, then pushed back to the server.
If the runner reuses an existing repo clone that has local changes, ensureRepoForTask() now auto-stashes that work before refreshing from origin. The resulting swarm-autostash refs are threaded into repoContext.autoStashes and appended to the base prompt so the active session can restore them deliberately instead of silently losing or ignoring dirty work.
Template resolution goes over HTTP (configureHttpResolver(apiUrl, apiKey) at runner.ts:2206) to obey the API/worker DB boundary. Prompt files under src/prompts/ must remain pure — no bun:sqlite or src/be/db imports. Enforced by scripts/check-db-boundary.sh.
How each adapter delivers the prompt
- Claude — CLI flag:
cmd.push("--append-system-prompt", this.config.systemPrompt)atclaude-adapter.ts:245–247. - Codex — written into
AGENTS.mdincwdinside a<swarm_system_prompt>block. The Codex SDK has no--append-system-promptequivalent; this file is the only channel. Entry:writeCodexAgentsMd(config.cwd, config.systemPrompt)atcodex-adapter.ts:761; implementation atcodex-agents-md.ts:56–119. If anAGENTS.mdexists, the block is prepended; otherwise stacked atop anyCLAUDE.md. Cleanup in the sessionfinallyatcodex-adapter.ts:738. - Pi — SDK parameter:
new DefaultResourceLoader({ appendSystemPrompt: [config.systemPrompt], ... })passed viaCreateAgentSessionOptionsatpi-mono-adapter.ts:508–519.
What this means for a new provider
Pick the delivery path your runtime supports, in order of preference:
- A dedicated system-prompt argument (flag, SDK field) — cleanest, no filesystem side effects.
- A file-based convention (like Codex's
AGENTS.md) — fine, but you must clean up infinallyand not clobber user files. - Prepending to the user prompt — last resort; may confuse the model.
If your runtime has a distinct prompt shape (e.g. a different preamble format), add a subdirectory under src/prompts/<foo>/ and invoke it from your adapter, but keep the merge logic in base-prompt.ts as the single source of truth.
7. Skills (how the three providers handle them)
Skills are the swarm's portable procedural knowledge — reusable markdown files invoked as slash-commands like /review-pr, /implement-issue, /create-pr. Each provider surfaces them differently; your adapter must implement formatCommand(name) and may need a resolver if your runtime doesn't have native support.
The three patterns
Claude. The CLI already knows about skills installed under ~/.claude/skills/<name>/SKILL.md. The adapter just returns /<name>.
formatCommand(name: string): string { return `/${name}`; }Pi. Pi-mono supports skills but namespaces them:
formatCommand(name: string): string { return `/skill:${name}`; }See pi-mono-adapter.ts:541. Skills live at ~/.pi/agent/skills/<name>/SKILL.md.
Codex. The Codex SDK has no skill mechanism, so the adapter resolves the skill itself before calling the model:
src/providers/codex-skill-resolver.tsintercepts a leading/<name>in the user prompt.- Reads
${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md. - Inlines the SKILL.md content into the prompt before calling
thread.runStreamed. formatCommand(name)simply returns/${name}so the swarm can still emit the canonical form; the resolver does the work.
This is the template to follow if your provider lacks native skill support.
Where skill files come from
The swarm syncs skills into each provider's skill dir at container boot in docker-entrypoint.sh L764–803. The same SKILL.md content is copied into:
~/.claude/skills/<name>/SKILL.md~/.pi/agent/skills/<name>/SKILL.md~/.codex/skills/<name>/SKILL.md
When adding a new provider, extend this block with ~/.<foo>/skills/<name>/SKILL.md (or whatever path your runtime expects).
8. Adding a new provider — step by step
Assume you are adding a provider called foo. Follow every step; "optional" is called out where true.
Step 1 — Scaffold the adapter
Create src/providers/foo-adapter.ts:
import type {
ProviderAdapter,
ProviderEvent,
ProviderResult,
ProviderSession,
ProviderSessionConfig,
} from "./types";
export class FooAdapter implements ProviderAdapter {
readonly name = "foo";
async createSession(config: ProviderSessionConfig): Promise<ProviderSession> {
return new FooSession(config);
}
async canResume(_sessionId: string): Promise<boolean> {
return false; // or true if your SDK/CLI supports resume
}
formatCommand(commandName: string): string {
return `/${commandName}`; // adjust to your runtime's convention
}
}
class FooSession implements ProviderSession {
sessionId: string | undefined;
private listeners: Array<(e: ProviderEvent) => void> = [];
// ... abort controller, pending promise, etc.
constructor(private config: ProviderSessionConfig) {
this.start();
}
onEvent(listener: (e: ProviderEvent) => void) { this.listeners.push(listener); }
private emit(event: ProviderEvent) { for (const l of this.listeners) l(event); }
async waitForCompletion(): Promise<ProviderResult> { /* resolve when native stream ends */ }
async abort(): Promise<void> { /* kill subprocess / signal AbortController */ }
private async start() { /* spawn CLI or call SDK; translate events to emit(...) */ }
}Step 2 — Register in the factory
Edit src/providers/index.ts:
import { FooAdapter } from "./foo-adapter";
// ...
case "foo": return new FooAdapter();Update the error message (Supported: claude, pi, codex, foo) so the unknown-provider error stays accurate.
Step 3 — Translate native events to ProviderEvent
This is the heart of the adapter. For each event your SDK / CLI emits, decide which ProviderEvent type to produce:
- Session start →
session_init { sessionId }(setthis.sessionIdfirst, then emit). - Assistant text →
message { role: "assistant", content }. - Tool call start/end →
tool_start/tool_endwith{ toolCallId, toolName, args|result }. - Turn/usage stats →
context_usage. - Auto-compaction →
compaction. - Any provider-specific event with no direct mapping →
custom { name, data }(e.g. Codex usescustomforcodex.reasoningandcodex.todo_list). - Terminal →
result { cost, output, isError, errorCategory }then resolvewaitForCompletion(). - Non-fatal diagnostics →
raw_log/raw_stderr.
Reference code:
- Claude JSONL branching:
src/providers/claude-adapter.tsaround L368–470. - Codex
ThreadEventdispatcher:src/providers/codex-adapter.tsaround L463–618. - Pi
AgentSessionEventdispatcher:src/providers/pi-mono-adapter.tsaround L161+.
Step 4 — Emit CostData
The result event must carry a populated CostData so the swarm can track spend:
emit({
type: "result",
cost: {
sessionId: this.sessionId!,
taskId: config.taskId,
agentId: config.agentId,
totalCostUsd, inputTokens, outputTokens,
cacheReadTokens, cacheWriteTokens,
durationMs, numTurns, model, isError: false,
},
isError: false,
});If your SDK returns tokens but not USD, compute cost from a pricing table (see src/providers/codex-models.ts for the Codex model/pricing resolver pattern).
Step 5 — Model selection
ProviderSessionConfig.model is set by the runner from opts.model || process.env.MODEL_OVERRIDE || "" and may be overridden per task (task.model). Decide:
- What is the provider default when
model === ""? Read it from a provider-specific env var (CODEX_DEFAULT_MODELis the existing convention). - Do you accept shortnames (
"sonnet","gpt-5") and expand to full IDs? If so, build a resolver — seeresolveCodexModelinsrc/providers/codex-models.tsandresolveModelinsrc/providers/pi-mono-adapter.ts.
Step 6 — Credentials & auth
Three patterns are already in the codebase; pick the one that fits your provider:
Claude-style. Validate at adapter start; throw a clear error if missing. Example: validateClaudeCredentials() in src/providers/claude-adapter.ts.
Codex ChatGPT-style. See §6 below. This is the most involved path but required for desktop-login-style flows.
Pi-style. Reads ~/.pi/agent/auth.json. If your SDK looks up its own auth file, the adapter may not need to do anything beyond ensuring the file exists at worker boot.
Secret scrubbing: any credential you log must go through the project's scrubber. See src/utils/secret-scrubber.ts and the CLAUDE.md "Secret scrubbing" section.
Step 7 — MCP server injection
Every adapter fetches per-agent MCP servers from the swarm API and wires them into the provider:
GET {apiUrl}/api/agents/{agentId}/mcp-servers?resolveSecrets=true
Authorization: Bearer {apiKey}Then:
- Claude writes
/tmp/mcp-<taskId>.jsonand passes it via--mcp-config(claude-adapter.tsL46–183). - Pi instantiates an
McpHttpClientper HTTP/SSE server and registers tools prefixedmcp__<name>__(pi-mono-adapter.tsL408–493). - Codex builds a structured
mcp_serversobject fornew Codex({ config })(codex-adapter.tsL132–243).
Always include the swarm's own MCP server with an X-Source-Task-Id header so nested tool calls attribute back correctly.
Step 8 — Swarm event hooks (cancellation, heartbeat, tool-loop detection)
The adapter is responsible for polling swarm-side signals during a run. Pattern files:
- Codex:
src/providers/codex-swarm-events.ts— throttledfireAndForgetfetches for cancel/heartbeat/activity/context-usage, attached viathis.listeners.push(...)inside the session. - Pi:
src/providers/pi-mono-extension.ts—createSwarmHooksExtensionpassed intoDefaultResourceLoader({ extensionFactories: [swarmExtension] }). - Claude: external hook process reads a task file (
/tmp/agent-swarm-task-<pid>.json) written byclaude-adapter.tsL31–43; hook logic lives undersrc/hooks/(e.g.tool-loop-detection.ts).
Tool-loop detection (src/hooks/tool-loop-detection.ts::checkToolLoop) is reusable — call it from your event translator when you see tool_start.
Step 9 — Skills and slash-commands
Skills are the swarm's portable procedural knowledge (/review-pr, /implement-issue, etc.). Each provider handles them differently:
- Claude: native slash-commands, so
formatCommand(name) => "/" + name(claude-adapter.tsL601). - Pi: prefixed,
formatCommand(name) => "/skill:" + name(pi-mono-adapter.tsL541); resolved from~/.pi/agent/skills/<name>/SKILL.md. - Codex: no native skills support →
src/providers/codex-skill-resolver.tsintercepts a leading/<name>in the prompt, reads${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md, and inlines it before callingthread.runStreamed. The system prompt is delivered by writingAGENTS.mdintocwd(seecodex-agents-md.ts).
Pick the model that matches your runtime; if your provider has no native skill mechanism, follow the Codex inline-resolver pattern.
The swarm syncs skill files into each provider's skill dir at container boot in docker-entrypoint.sh (copies SKILL.md into ~/.claude/skills/, ~/.pi/agent/skills/, and ~/.codex/skills/). Add an entry for your provider there.
Step 10 — Worker bootstrap (Docker entrypoint)
Edit docker-entrypoint.sh:
- Credential validation branch — mirror the pattern used for pi (L7–12), codex (L13–71), or claude (L72–79).
- Binary reachability check — add a block similar to the
CODEX_BINARY/CLAUDE_BINARYchecks (L87–108). - Skill sync — extend the skill-copy block (L764–803) with your provider's skill directory.
- If your provider has a CLI binary, install it in
Dockerfile.worker.
Step 11 — Login CLI (only if OAuth)
If your provider needs a user-interactive OAuth flow (like Codex's ChatGPT login), add a CLI command:
- Implement PKCE + local callback server. Reference:
src/providers/codex-oauth/flow.ts—createAuthorizationFlow(URL withcode_challenge=S256),startLocalOAuthServer(node:httpon127.0.0.1:1455/auth/callback),exchangeAuthorizationCode. - Add storage helpers at
src/providers/<foo>-oauth/storage.tsthatPUT /api/configwith{ scope: "global", key: "<foo>_oauth", value: JSON.stringify(creds), isSecret: true }. SeestoreCodexOAuthandgetValidCodexOAuth(with auto-refresh) insrc/providers/codex-oauth/storage.ts. - Add the CLI command at
src/commands/<foo>-login.ts. Reference:src/commands/codex-login.ts— usespromptHiddenInputfor masked API-key entry and attempts to auto-open the browser viaopen/start/xdg-openby platform. - Register the command in
src/cli.tsx(non-UI command:console.log+process.exit(0)style) and updateCOMMAND_HELP. - In
docker-entrypoint.sh, restore credentials at boot by fetching them from/api/config/resolved?includeSecrets=true&key=<foo>_oauthand writing the provider's expected auth-file format (see the codex block, L13–71, for the jq-based reshape). - At adapter session-creation time, re-fetch-and-refresh as a fallback if the token is expired (see
codex-adapter.tsL810–844).
Step 12 — Types & enums
Update union-type entries that enumerate providers:
src/types.ts—HarnessProviderunion.templates/schema.ts— template provider enum.- Any migration that stores a provider column — do not modify existing migrations; create a new one under
src/be/migrations/if a schema update is needed.
Step 13 — Prompts
If your provider benefits from a distinct system-prompt shape, add src/prompts/<foo>/ mirroring src/prompts/claude/ and src/prompts/codex/. Wire it into the adapter's createSession. Prompt files must remain pure (no DB imports) — the DB boundary is enforced by scripts/check-db-boundary.sh.
Step 14 — Tests
Add at minimum:
src/tests/<foo>-adapter.test.ts— unit tests for event translation (feed fake native events, assert emittedProviderEvents).src/tests/<foo>-oauth.test.ts(if OAuth) — PKCE helpers, storage round-trip, token refresh.
Existing analogs: src/tests/codex-*.test.ts, src/tests/claude-*.test.ts, src/tests/pi-*.test.ts. Tests must use isolated SQLite files and clean up -wal / -shm in afterAll (see the "Unit tests" block in CLAUDE.md).
Step 15 — Documentation
- Update
CLAUDE.md: addfooto theHARNESS_PROVIDERaccepted values list and document any required env vars. - Update this guide's "Reference implementations" table.
- Update the README's "Multi-provider" line.
- Update Harness Configuration with the new provider's setup instructions.
- If the provider needs new HTTP endpoints, regenerate OpenAPI:
bun run docs:openapi.
9. Codex OAuth: the full reference flow
This is documented separately because it is the most involved integration.
┌────────────────┐ codex-login CLI ┌──────────────────┐
│ User's laptop │ ─── PKCE auth URL ──▶ │ auth.openai.com │
│ │ ◀── code (state) ───── │ OAuth server │
│ │ └──────────────────┘
│ Local callback │ ▲
│ :1455/auth/... │─────────┘
└────────┬───────┘
│ exchangeAuthorizationCode
▼
┌────────────────┐
│ Creds JSON │
│ (tokens) │
└────────┬───────┘
│ PUT /api/config { key:"codex_oauth", isSecret:true }
▼
┌────────────────────────────┐
│ swarm_config (encrypted) │
└────────┬───────────────────┘
│ docker-entrypoint.sh fetches at boot
▼
┌────────────────────────────┐
│ ~/.codex/auth.json (0600) │
└────────────────────────────┘Key files: src/providers/codex-oauth/{flow.ts,storage.ts,auth-json.ts,pkce.ts,types.ts}, src/commands/codex-login.ts, docker-entrypoint.sh L13–71, adapter fallback codex-adapter.ts L810–844.
The shape conversion (our flat {access, refresh, expires, accountId} → Codex CLI's {auth_mode: "chatgpt", tokens: {...}}) lives in credentialsToAuthJson() at src/providers/codex-oauth/auth-json.ts L37–49.
See also: Codex OAuth setup guide.
10. Pre-PR checklist for a new provider
Run before opening the PR (per CLAUDE.md):
bun run lint:fix
bun run tsc:check
bun test
bash scripts/check-db-boundary.sh
bun run docs:openapi # only if you added HTTP endpointsManual verification:
HARNESS_PROVIDER=foo bun run src/cli.tsx workerstarts and connects.- A trivial task ("Say hi") runs to completion and posts progress + cost.
cancel-taskvia MCP actually aborts the in-flight run.docker build -f Dockerfile.worker .succeeds.- Full E2E with Docker (see
CLAUDE.md"E2E testing with Docker") with-e HARNESS_PROVIDER=foo. - For OAuth providers: run
bun run src/cli.tsx <foo>-loginend-to-end, then boot a worker in Docker and verify it picks up the stored creds.
11. Files to touch — quick checklist
| Concern | File(s) |
|---|---|
| Adapter implementation | src/providers/<foo>-adapter.ts |
| Factory registration | src/providers/index.ts |
| Types / enums | src/types.ts, templates/schema.ts |
| Prompts (optional) | src/prompts/<foo>/ |
| OAuth (optional) | src/providers/<foo>-oauth/*, src/commands/<foo>-login.ts |
| Setup CLI (optional, e.g. claude-managed) | src/commands/<foo>-setup.ts |
| CLI wiring | src/cli.tsx (COMMAND_HELP, command routing) |
| Docker bootstrap | docker-entrypoint.sh, Dockerfile.worker |
| Hooks (optional) | src/hooks/*, src/providers/<foo>-swarm-events.ts |
| Skills resolver (if provider lacks native support) | src/providers/<foo>-skill-resolver.ts |
| Models / pricing (optional) | src/providers/<foo>-models.ts |
| Tests | src/tests/<foo>-*.test.ts |
| Integrations UI (optional) | ui/src/lib/integrations-catalog.ts |
| Docs | CLAUDE.md, README.md, this guide, Harness Configuration |
claude-managed reference files: src/providers/claude-managed-adapter.ts, src/providers/claude-managed-swarm-events.ts, src/providers/claude-managed-models.ts, src/commands/claude-managed-setup.ts, ui/src/lib/integrations-catalog.ts.
12. Claude Managed Agents — pre-existing Agent + Environment pattern
claude-managed is the first reference adapter where the session runtime executes outside the worker container: the worker only opens an SSE stream against client.beta.sessions.events.stream and relays normalized events to the runner. This forces a few design decisions that are worth calling out — they apply to any future provider with a similar "managed cloud session" shape (Devin's /sessions API is the closest existing analog).
a. We don't agents.create at runtime
Anthropic's beta API has a 1:1 Agent ↔ identity model. Calling client.beta.agents.create(...) from each worker on each task would (a) leak agents into the customer's account at the rate of one per task, and (b) make skill / tool inventory non-deterministic per session. Instead, we treat the Agent and Environment as persistent infrastructure, created once during operator onboarding and persisted by ID. The adapter only ever calls client.beta.sessions.create({ agent: MANAGED_AGENT_ID, environment_id: MANAGED_ENVIRONMENT_ID, ... }).
b. The claude-managed-setup CLI
bun run src/cli.tsx claude-managed-setupImplementation: src/commands/claude-managed-setup.ts. Behavior:
- Reads
ANTHROPIC_API_KEYfrom.env/ env (or prompts). - Creates the Environment (
client.beta.environments.create) — the long-lived sandbox configuration (allowed networks, default packages, persistent volumes). - Uploads each
plugin/commands/*.mdskill viaclient.beta.skills.create(one-shot per skill content hash; the CLI dedupes against an existing inventory). - Creates the Agent (
client.beta.agents.create) and attaches the freshly uploaded skills. PUT /api/configpersistsMANAGED_AGENT_ID+MANAGED_ENVIRONMENT_IDintoswarm_configso deployed workers (and the integrations UI) can restore them at boot.- Re-run with
--forceto recreate (rare — only if upstream rotates IDs).
This is the shape every future "managed cloud session" provider should follow: setup-once → IDs live in swarm_config → workers fail-fast at boot if absent.
c. System prompt in the user message + prompt-cache breakpoint
Managed-agents has no system field on sessions.create. The closest analog is client.beta.sessions.events.send({ events: [{ type: "user.message", content: [...] }] }). We compose the swarm's full assembled systemPrompt as the first content block and the per-task prompt as the second:
[
{ type: "text", text: <full system prompt + agent identity + skills>, cache_control: { type: "ephemeral" } },
{ type: "text", text: `User request:\n\n${prompt}` }, // no cache_control
]The cache_control: { type: "ephemeral" } marker on the first block creates a prompt-cache breakpoint — Anthropic caches everything up to that boundary across sessions for the same agent, so subsequent tasks for the same agent re-use the static prefix at cache-read pricing. The per-task block sits after the breakpoint and is allowed to differ without invalidating the cache.
This is enforced by composeManagedUserMessage in claude-managed-adapter.ts (asserted byte-identical-prefix in src/tests/claude-managed-adapter.test.ts).
d. X-Source-Task-Id is dropped
The MCP integration §5 calls out that every adapter MUST set X-Source-Task-Id on the swarm MCP connection. claude-managed cannot. The MCP servers are configured server-side on the Anthropic-managed Agent (not per-session), and the SDK doesn't expose a per-session HTTP-header override. We instead pass the task ID via metadata.swarmTaskId on sessions.create, and the swarm MCP tools accept task_id as an explicit tool argument when the header is missing. New providers that hit the same constraint should follow this fallback.
e. Skill upload via beta.skills.create
Skills are not synced into a filesystem path on the worker (the worker doesn't run the model). They're uploaded once during claude-managed-setup via client.beta.skills.create({ content, name, ... }) and referenced by ID on the Agent. The skill content is the same plugin/commands/*.md body that other providers copy into ~/.claude/skills/<name>/SKILL.md — so the source of truth stays in the repo.
f. SDK shape deviations to be aware of
The Anthropic Beta SDK has a few non-obvious surface differences from the conventional Anthropic messages.create API. The header comments in src/providers/claude-managed-adapter.ts (top-of-file block, ~L13–35) document them in detail, but the highlights for anyone reading the code:
- Resource type is
github_repository, notgithub_repo. The SDK type isBetaGitHubRepositoryResourceand the literal field istype: "github_repository". events.sendtakes{ events: [...] }— an array, not a single event arg. The naming makes it look likeevents.send(event)would work; it would not.- Session status enum is
'rescheduling' | 'running' | 'idle' | 'terminated'. "Archived" is not a status — it's signaled byarchived_at !== null.canResume()therefore rejects onterminatedor non-nullarchived_at. cache_controlis a runtime-honored field that's NOT in the TS definition forBetaManagedAgentsTextBlock. We attach it via a typed extension and cast on the way out so the runtime payload includes it.events.streamreturns anAsyncIterable, not a Promise of an array — iterate withfor await.events.listis aPagePromisethat's alsoAsyncIterableover historical session events; the resume path uses it to pre-fetch + dedupe against the live stream.
13. Further reading
- Harness Configuration — how to use the existing providers.
- Codex OAuth setup — end-user OAuth flow.
CLAUDE.md— project-wide rules, especially "Architecture invariants" (DB boundary) and "Secret scrubbing".src/providers/types.ts— canonical interface definitions.src/providers/codex-adapter.ts— the most feature-complete reference (OAuth + skills resolver + hooks + model resolver).
Harness Configuration
Configure the AI provider (Claude Code, Codex, opencode, pi-mono, Devin, or Claude Managed Agents) that powers your agents
Observability with OpenTelemetry
Send Agent Swarm traces to SigNoz or any OTLP-compatible backend, filter local runs, and inspect API, worker, MCP, and tool execution spans.