Implement a new harness provider (like claude, pi, or codex) — the full adapter contract, reference implementations, and wiring checklist

This guide documents the harness-provider contract used by agent-swarm workers, the three reference implementations (claude, pi, codex), and every hook that must be wired when adding a new provider.

A harness provider is the runtime that actually drives an LLM: it owns the subprocess (or in-process SDK) that reads the user prompt, talks to the model, invokes tools, and streams events back. The swarm treats providers as plug-ins behind a single TypeScript interface (ProviderAdapter). Workers select one at boot via HARNESS_PROVIDER.

For configuring an existing provider, see Harness Configuration. This guide is for implementing a new one.

Supported today: claude (Anthropic Claude Code CLI), pi (pi-mono, in-process via @earendil-works/pi-coding-agent), codex (OpenAI Codex SDK hosted inside a per-task subprocess), devin (Cognition Devin via /sessions), claude-managed (Anthropic Managed Agents — sessions execute in Anthropic's cloud sandbox), opencode (in-process @opencode-ai/sdk server with SSE event mapping; experimental, rolling out across DES-295 → DES-299).

1. The `ProviderAdapter` contract

Source of truth: src/providers/types.ts.

export interface ProviderAdapter {
  readonly name: string;
  createSession(config: ProviderSessionConfig): Promise<ProviderSession>;
  canResume(sessionId: string): Promise<boolean>;
  formatCommand(commandName: string): string;
}

export interface ProviderSession {
  readonly sessionId: string | undefined;
  onEvent(listener: (event: ProviderEvent) => void): void;
  waitForCompletion(): Promise<ProviderResult>;
  abort(): Promise<void>;
}

What each member does

Member	Responsibility
`name`	Short identifier used in logs (`"claude"`, `"pi"`, `"codex"`).
`createSession(config)`	Spin up a new session for a task. Must not block on model output — return a `ProviderSession` immediately and stream events asynchronously.
`canResume(sessionId)`	Return `true` iff the provider can continue a previous session by ID. Used when the runner resumes paused work.
`formatCommand(commandName)`	Map a swarm slash-command (e.g. `"review-pr"`) to the form the underlying CLI expects (e.g. `/review-pr`, `/skill:review-pr`, or inlined via skill resolver).
`session.onEvent`	Register a listener. The adapter must call every registered listener for every `ProviderEvent`.
`session.waitForCompletion()`	Resolve once the session ends; returns `{ exitCode, sessionId, cost, output, isError, errorCategory, failureReason }`.
`session.abort()`	Cancel in flight (SIGTERM, SDK abort signal, etc.). Must be idempotent.

`ProviderSessionConfig` inputs

interface ProviderSessionConfig {
  prompt: string;           // user prompt
  systemPrompt: string;     // composed system prompt
  model: string;            // resolved model id or ""
  role: string;             // "lead" | "worker" | ...
  agentId: string;
  taskId: string;
  apiUrl: string;           // swarm API base URL for callbacks
  apiKey: string;           // swarm API key
  cwd: string;              // workspace dir
  /** @deprecated Always undefined — native resume removed in the 2026-05-28 plan. */
  resumeSessionId?: string;
  iteration?: number;
  logFile: string;          // jsonl log path
  additionalArgs?: string[];
  env?: Record<string, string>;
}

`ProviderEvent` (the normalized stream)

Every provider translates its native events into this tagged union:

type ProviderEvent =
  | { type: "session_init"; sessionId: string }
  | { type: "message"; role: "assistant" | "user"; content: string }
  | { type: "tool_start"; toolCallId: string; toolName: string; args: unknown }
  | { type: "tool_end"; toolCallId: string; toolName: string; result: unknown }
  | { type: "result"; cost: CostData; output?: string; isError: boolean; errorCategory?: string }
  | { type: "error"; message: string; category?: string }
  | { type: "raw_log"; content: string }
  | { type: "raw_stderr"; content: string }
  | { type: "custom"; name: string; data: unknown }
  | { type: "context_usage"; contextUsedTokens: number; contextTotalTokens: number; contextPercent: number; outputTokens: number }
  | { type: "compaction"; preCompactTokens: number; compactTrigger: "auto" | "manual"; contextTotalTokens: number };

The runner consumes these events (not the provider's native ones) to post progress to the swarm API, detect tool loops, charge cost, and update task state. Implementing this translation is the bulk of the work for a new provider.

2. Reference implementations

File	Transport	Auth
`src/providers/claude-adapter.ts`	`Bun.spawn` of `claude` CLI with `--output-format stream-json`, JSONL stdout parsing	`CLAUDE_CODE_OAUTH_TOKEN` or `ANTHROPIC_API_KEY`
`src/providers/pi-mono-adapter.ts`	In-process via `createAgentSession` from `@earendil-works/pi-coding-agent`; no subprocess	`ANTHROPIC_API_KEY` / `OPENROUTER_API_KEY` / `~/.pi/agent/auth.json` — or, when `BEDROCK_AUTH_MODE=sdk` or `MODEL_OVERRIDE=amazon-bedrock/*`, AWS SDK default chain (probed via `ListFoundationModels`; see pi-mono + Amazon Bedrock below)
`src/providers/codex-adapter.ts`	Parent adapter spawns `src/commands/codex-session-runner.ts`, sends session config over stdin, and receives line-delimited events/results over stdout; the child hosts `@openai/codex-sdk` `Codex` / `Thread` and streams `ThreadEvent`s back	`OPENAI_API_KEY` or ChatGPT OAuth stored in `swarm_config.codex_oauth` (see §6)
`src/providers/claude-managed-adapter.ts`	Anthropic SDK `client.beta.sessions.events.stream` (SSE); session executes in Anthropic's managed cloud sandbox, worker is a thin relay	`ANTHROPIC_API_KEY` plus pre-existing `MANAGED_AGENT_ID` + `MANAGED_ENVIRONMENT_ID` from one-time `claude-managed-setup` CLI (see §13)

pi-mono + Amazon Bedrock auth

Mode selection

Bedrock SDK mode is active when either:

BEDROCK_AUTH_MODE=sdk is set in swarm_config (explicit), or
BEDROCK_AUTH_MODE is absent and MODEL_OVERRIDE starts with amazon-bedrock/ (prefix-inference fallback — preserves the earlier prefix-inference behavior).

BEDROCK_AUTH_MODE=bearer is a declared/validated value reserved for future bearer-token support; for now, workers in bearer mode fall through to the standard credential check (key / auth.json).

BEDROCK_AUTH_MODE is a validated optional swarm_config key (values: sdk | bearer; see src/be/swarm-config-guard.ts) and a reloadable env key (see src/commands/runner.ts).

Credential probe

When Bedrock SDK mode is active, the worker runs a real ListFoundationModels call via @aws-sdk/client-bedrock (dynamically imported — the API binary never loads the SDK):

Success → ready: true, satisfiedBy: "sdk-delegated". The worker proceeds to claim tasks.
Failure → ready: false with a classified error hint. The worker parks in credential-wait until credentials are fixed.

Error categories classified by classifyAwsSdkError (src/utils/aws-error-classifier.ts):

Category	Trigger example	Hint
`aws-auth`	`ExpiredTokenException`, `CredentialsProviderError`	Run `aws sso login` or refresh credentials
`aws-throttle`	`ThrottlingException`, `ServiceQuotaExceededException`	Wait / request quota increase
`aws-access`	`AccessDeniedException: not authorized`	Check IAM policy for `bedrock:*`
`aws-model`	`ValidationException`, `ResourceNotFoundException`	Check `MODEL_OVERRIDE` and region

Accepted credential sources

Anything the AWS SDK accepts:

AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (+ optional AWS_SESSION_TOKEN)
AWS_PROFILE resolved against ~/.aws/credentials and ~/.aws/config
AWS SSO sessions configured in ~/.aws/config
EC2 IMDS instance role / ECS task role
Web-identity / OIDC (AWS_WEB_IDENTITY_TOKEN_FILE, AWS_ROLE_ARN)
credential_process and assume-role chains

Configuration reference

Key	Values	Default
`BEDROCK_AUTH_MODE`	`sdk` \| `bearer`	inferred from `MODEL_OVERRIDE` prefix
`AWS_REGION`	any Bedrock-enabled region	required — unset reports a not-ready Bedrock state (no region is fabricated)

AWS_REGION must be set explicitly so enumeration runs against the same region as inference. When it is unset, the worker reports a not-ready Bedrock state with a "set AWS_REGION" hint and does not guess a region.

Live model enumeration

The credential enumeration also produces the usable model set, region-scoped to AWS_REGION. Usable = harness-drivable ∩ AWS-invocable:

AWS-invocable — the union of:
- ListFoundationModels filtered to on-demand TEXT models that are ACTIVE (base foundation-model ids), and
- ListInferenceProfiles ids — the cross-region inference-profile ids (us. / eu. / apac. / au. / global.). The newest Claude models on Bedrock are invocable only through an inference profile and never appear in ListFoundationModels, so this union is what keeps the current Claude models available.
Harness-drivable — the subset pi-ai's Converse harness can drive, from getModels("amazon-bedrock"). Each id is a valid pi-ai id (base or profile) and round-trips through MODEL_OVERRIDE=amazon-bedrock/<id>.

Ids are matched exactly and the pi-ai id is stored/displayed (the id the harness can drive). Models the harness can't drive — and harness models the account can't invoke — are both excluded, so the picker never surfaces an id that would fail with invalid model identifier at inference time.

ListFoundationModels returns models that exist in the region, not strictly those the account has enabled access to; the on-demand/ACTIVE filter narrows it, but base access-grant is not fully enumerable from the catalog. The inference-profile union is what makes the current models accurate.

The worker reports the intersected list to the API via the existing PUT /api/agents/:id/credential-status channel as an optional bedrock block inside cred_status (no new DB column — rides the migration 055 JSON column). The block carries: { region, probedAt, ready, models: [{id, name}], error? }. It refreshes at boot and on a throttled ~5-minute interval, so access enabled after boot appears without a worker restart.

Bedrock probe card (Credentials tab)

A dedicated AWS Bedrock card appears in the Credentials tab for all pi-harness agents. It renders a read-only ready/blocked/pending classification with parity to the main credentials card, plus region, probe timestamp, usable model count, and error text when blocked.

🟢 Ready — SDK credential chain valid; models enumerated.
🔴 Blocked — probe failed; error text shown; worker is parked at credential-wait.
⚫ Pending — worker hasn't reported yet (booting, or Bedrock mode not active).

The dashboard model picker for the pi harness:

Prefers the live list when the worker has reported it.
Falls back to the static modelsdev-cache.json snapshot until the first worker report arrives.
Is never blank — there is always at least the snapshot list to choose from.
Surfaces the probe failure reason as subtext when a worker reported but its probe failed (ready:false), instead of a silently disabled group.

Cost & context tracking

Every adapter emits one CostData row per CLI invocation and one or more context_usage events per session. The dollar value is recomputed server-side against the seeded pricing table (Phase 2 of the cost-tracking plan); the resulting costSource enum is surfaced in the UI. The unified input + cache + output context formula (Phase 9) replaces every adapter's previous per-provider arithmetic so cross-provider percent comparisons make sense.

Full story: Cost & context computation.

All four are wired together by the factory at src/providers/index.ts:

export async function createProviderAdapter(provider: string): Promise<ProviderAdapter> {
  switch (provider) {
    case "claude":
      return new ClaudeAdapter();
    case "pi": {
      const { PiMonoAdapter } = await import("./pi-mono-adapter");
      return new PiMonoAdapter();
    }
    case "codex":
      return new CodexAdapter();
    case "claude-managed":
      return new ClaudeManagedAdapter();
    default:
      throw new Error(`Unknown HARNESS_PROVIDER: "${provider}". Supported: claude, pi, codex, claude-managed`);
  }
}

The runner reads the resolved HARNESS_PROVIDER and awaits the factory at boot. The async factory is intentional: providers with module-level side effects (notably pi) are lazy-loaded only when selected, so unused adapters cannot crash unrelated workers during startup. The resolved value comes from swarm_config overlaid on process.env, with this precedence (highest first):

swarm_config HARNESS_PROVIDER — repo > agent > global, via /api/config/resolved
process.env.HARNESS_PROVIDER — container env
"claude" — default

Resolution lives in src/utils/harness-provider.ts (resolveHarnessProvider); it's threaded through fetchResolvedEnv in src/commands/runner.ts so credential pool selection uses the same resolved value.

Switching providers without restart

The worker re-evaluates the resolved provider on each poll iteration (throttled to ~10s). When the value changes — typically because an operator wrote a new swarm_config row, or called PATCH /api/agents/{id}/harness-provider (which mirrors the value into swarm_config at scope=agent) — the worker:

Creates a fresh adapter via createProviderAdapter(resolvedProvider).
Updates state.harnessProvider.
Rebuilds basePrompt so traits-driven prompt sections (e.g. local-environment vs. cloud-managed) match the new provider.
Resets the cached cred_status snapshot so the dashboard shows credential health for the new adapter.

In-flight task sessions hold their own ProviderSession reference and continue on the old adapter unaffected; only future spawns pick up the swap. Failures during reconciliation (network blip, invalid value, adapter init error) log a warning and stay on the current provider — the worker is never wedged by a bad config.

Invalid HARNESS_PROVIDER values are rejected at write time (validateConfigValue in src/be/swarm-config-guard.ts), so a typo via PUT /api/config or the set-config MCP tool returns 400 instead of being silently stored.

Per-task `outputSchema` support

Tasks may carry an optional JSON Schema on outputSchema that the agent's final output must conform to. Enforcement depends on the harness:

Provider	Supported	Notes
`claude`	Yes	Via MCP `store-progress`; CLI extraction fallback if missed
`claude-managed`	Yes	Via MCP `store-progress`
`codex`	Yes	Via MCP `store-progress`
`opencode`	Yes	Via MCP `store-progress`
`pi` (`pi-mono`)	Yes	Via MCP `store-progress`
`devin`	Yes	Via MCP `store-progress` when `HAS_MCP=true`; otherwise the runner validates direct `providerOutput` before finishing the task

The primary enforcement point is the MCP store-progress tool: a non-conforming output fails the tool call and the agent is asked to retry. For providers that return direct providerOutput to the runner, ensureTaskFinished() now performs the same JSON-parse + schema-validation check before persisting task.output; if validation fails, the task is marked failed instead of silently storing invalid structured output.

Reasoning / effort control

PATCH /api/agents/{id}/runtime accepts an optional reasoning_effort field — a normalized, closed enum off | low | medium | high | xhigh | max — persisted like MODEL_OVERRIDE as the agent-scoped swarm_config key REASONING_EFFORT_OVERRIDE. The runner resolves it independently of the model/modelTier axis and populates ProviderSessionConfig.reasoningEffort; when unset, every adapter behaves exactly as it does today (no fleet-wide default is injected). minimal remains excluded because Codex *-codex models reject it. max is capability-gated and Codex-only: non-Codex harnesses filter it even when an upstream model snapshot advertises it.

src/providers/reasoning-effort.ts owns capability gating (reasoningCapability(harness, model)) and per-harness translation (applyReasoningEffort(harness, model, level), a discriminated union telling each adapter what to merge). Capability data is hybrid: the models.dev reasoning_options snapshot (src/providers/modelsdev-reasoning.json, derived from the canonical src/be/modelsdev-cache.json) wins where present; otherwise a hand-authored {low, medium, high} fallback, plus a small harness-specific override table for quirks the cache doesn't encode. The runtime route validates the requested level against this lookup and 400s unsupported combos with { error, harness, model, level, allowed }.

Provider	Transport	Notes
`claude`	`CLAUDE_CODE_EFFORT_LEVEL` env var	`off` on a legacy budget_tokens-capable model sets `MAX_THINKING_TOKENS=0` instead (the effort env is omitted). No CLI flag is used — `--effort` is buggy in `-p` mode. Precedence: if an operator's `additionalArgs` includes `--effort`, the CLI flag wins over `CLAUDE_CODE_EFFORT_LEVEL` (Claude CLI's own precedence) — the existing "`additionalArgs` is an escape hatch" behavior, not special-cased here.
`codex`	`model_reasoning_effort` config field	`off` maps to `'none'`; `max` passes through for capability-advertising models such as GPT-5.6. `show_raw_agent_reasoning` stays pinned `false` regardless of the effort level — higher effort costs reasoning tokens (visible via `reasoning_output_tokens` cost telemetry) but produces no visible reasoning trace in the dashboard. `-codex` (non-`max`) models reject `xhigh`; `-codex-max` models accept it.
`pi`	`thinkingLevel` session option	Top-level sibling of `model` on `CreateAgentSessionOptions`; pi's native vocabulary already includes `off`.
`opencode`	Provider-keyed `options` in the per-task `opencode.json`	`anthropic/` models: `thinking.budgetTokens` (an internal numeric translation, not a user-facing knob). `openrouter/` models: `reasoning.effort`. OpenAI-compatible models: `reasoningEffort`. `off` omits reasoning keys entirely (a noop application) — Opencode has no explicit off switch.

Each adapter reports the level it actually applied via ProviderResult.appliedReasoningEffort (null when applyReasoningEffort() returned a capability-rejected noop). The runner forwards that into agents.cred_status.latestModel.reasoningEffort, which the dashboard surfaces in the agent runtime editor, the harness credential-status tooltip, and the agents-list Model column (a compact [|||]-style badge — more bars mean higher effort).

See thoughts/taras/research/2026-05-26-agent-reasoning-effort-runtime-control.md for the full cross-harness normalization derivation.

3. How a harness run fits into a task lifecycle

Every harness run is scoped to exactly one task. The runner owns the task's lifecycle and gives the adapter only what it needs to execute that single unit of work.

The flow

All steps live in src/commands/runner.ts.

Step	What it does
`pollForTrigger`	`GET /api/poll` until a `Trigger` is returned
`buildPromptForTrigger`	Constructs the user prompt from the trigger
`fetchRelevantMemories`	Enriches context (memory vectors + installed skills)
`buildSystemPrompt`	Composes the full system prompt (see §6)
`spawnProviderProcess`	Calls `adapter.createSession(config)`
`session.onEvent(...)`	Each `ProviderEvent` → API call (progress, logs, cost, context)
`session.waitForCompletion`	Blocks until the adapter resolves
`ensureTaskFinished`	`POST /api/tasks/{id}/finish`
`syncProfileFilesToServer`	Session-end FS → DB sync of the agent's self-editable files (see below). Runs for every `hasLocalEnvironment` harness.

One task → one session. The runner tracks concurrent tasks in state.activeTasks: Map<taskId, RunningTask> up to MAX_CONCURRENT_TASKS. An adapter never needs to multiplex tasks internally — if the worker should handle two at once, the runner spawns two sessions.

Session-end identity / config sync (FS → DB)

When a session finishes, the runner syncs the agent's self-editable files back to the API so edits the agent made during the session persist into its profile (context_versions):

SOUL.md / IDENTITY.md / TOOLS.md / HEARTBEAT.md (bundled identity update)
the CLAUDE.md source — provider-dependent (see below)
the agent-managed section of /workspace/start-up.sh (setupScript)

The sync lives in src/commands/profile-sync.ts (syncProfileFilesToServer) and is called from checkCompletedProcesses in runner.ts. The decision to sync is made per finished session, using the provider / local-env trait snapshotted on RunningTask at spawn time — not the mutable global state.hasLocalEnvironment. The runner lets an in-flight session finish on its original adapter after a live provider swap, so reading the global would skip a session that started local (worker since flipped remote) or sync stale local files after a remote session finished (worker since flipped local). The sync fires when any finished session in the batch ran in a local environment:

Provider	local environment	Session-end sync
`claude`, `pi`, `codex`, `opencode`	yes	Yes — when the finished session ran on this provider
`devin`, `claude-managed`	no	No — these have no `/workspace` FS

CLAUDE.md source is provider-routed (resolveClaudeMdPath). claude edits its personal file at ~/.claude/CLAUDE.md (also synced by the Claude Stop hook); every other local harness (codex/pi/opencode) edits /workspace/CLAUDE.md — the file the runner materializes from the claudeMd DB field at boot and that the base-prompt truncation notice points them to. An all-claude batch syncs the personal file (the runner is a backstop and never overwrites it with the stale workspace copy); any non-claude session in the batch routes to /workspace/CLAUDE.md. This closes the previously-open non-Claude CLAUDE.md FS → DB gap.
Baseline hashes protect lead-side profile edits. At session start the runner records SHA-256 baselines for the identity files it just materialized from the DB. On session-end sync, unchanged files are skipped; only files the agent actually modified sync back. That preserves update-profile edits made by the lead during a running session instead of having the stale local copy blindly overwrite the DB row at shutdown.

Notes:

Harness-agnostic, runner-driven. Because the sync runs in the runner — at the single point where every completed session converges, including crashes — it does not depend on the harness emitting a shutdown event. This is what makes it reliable for codex/opencode (which previously had no sync path) and for pi (whose in-extension session_shutdown sync could silently not-fire). The Claude plugin Stop hook (src/hooks/hook.ts) and the pi extension (src/providers/pi-mono-extension.ts) still run their own sync; the runner-level call is the authoritative backstop.
Idempotent. The profile route only writes a new context_versions row when the content hash changes (updateAgentProfile in src/be/db.ts), so a redundant sync — pi's extension + runner double-POST, or an unchanged file — collapses to a no-op.
Non-fatal but visible. A failed sync never fails the task, but unlike the original copies it checks resp.ok and logs a scrubbed warning instead of silently swallowing the error.
Lead edits now survive unchanged sessions. The previous "Rule 19" reversion problem is resolved for identity-file syncs: if the agent never changed the local file, the session-end sync skips it and preserves the newer DB-side content from update-profile. Files the agent explicitly edits still sync normally, which keeps self-evolution working without clobbering lead-driven profile changes.

What fields on the task become `ProviderSessionConfig`

Assembled around runner.ts L1582–1596 and L3093–3133:

`ProviderSessionConfig` field	Comes from
`prompt`	`buildPromptForTrigger(trigger, task, memories, ...)`, optionally prefixed with a bounded follow-up context preamble when `task.parentTaskId` is set
`systemPrompt`	`buildSystemPrompt()` + `task.additionalSystemPrompt` (see §6)
`model`	`task.model` → `MODEL_OVERRIDE` env → `""`
`agentId`, `taskId`, `apiUrl`, `apiKey`	Worker env + task row
`cwd`	`task.dir` → `repoContext.clonePath` → `process.cwd()` (L3050–3072)
`resumeSessionId`	Deprecated — always `undefined`. The runner stopped threading native session resume in the 2026-05-28 plan; follow-up continuity flows entirely through the context preamble prepended to `prompt`.
`iteration`	Retry counter (runner state)
`logFile`	`${logDir}/${timestamp}-${taskIdSlice}.jsonl` (L3102) — see §4
`env`	Merged worker env + task-provided env

API endpoints touched per task

The runner (not the adapter) owns all of these. They are listed here so you know what the adapter's event stream is ultimately driving:

POST /api/tasks/{id}/progress — human-readable progress (L402–410)
POST /api/tasks/{id}/context — on context_usage, compaction, completion (L1777, L1797, L1900)
POST /api/session-logs — on raw_log (L983, see §4)
POST /api/events/batch — tool/session events (L1648)
PUT /api/tasks/{id}/claude-session — on session_init (L1037)
POST /api/session-costs — on result (L1014)
POST /api/active-sessions / DELETE /api/active-sessions/by-task/{id} — L1178, L1199
POST /api/tasks/{id}/pause / resume — L711, L797
POST /api/tasks/{id}/finish — L579
GET /cancelled-tasks?taskId=... — L2909 (also polled by adapter-side hooks)

Resume semantics

Native session resume (the claude --resume <UUID> CLI flag and codex.resumeThread(id) SDK call) was removed in the 2026-05-28 deprecation plan (thoughts/taras/plans/2026-05-28-deprecate-native-resume.md). The reasoning: native resume relied on an on-disk transcript that disappears when the worker container restarts (deploy, OOM, autoscaler reschedule), and the harness then either errored out or silently spawned a context-less session. The bounded context preamble survives any worker restart because it is rebuilt from the parent-task chain held in the API DB.

Parent → child continuity is now single-layered: the runner prepends a bounded preamble (cap ~2000 tokens, see src/commands/context-preamble.ts) for all providers when task.parentTaskId is set, and adapters always spawn a fresh harness session. resolveResumeSession is preserved as an observability shim that logs which session ids would have been used; it never returns a resumeSessionId.
Pause → resume: a paused task restarts with a new session, re-applying task.progress via buildResumePrompt (L809–829); if the task is also part of a parent chain, the context-preamble path is applied before execution resumes.
Adapter behavior: claude / claude-managed / codex each warn and ignore any stray resumeSessionId they receive; the runner no longer sets one. CodexAdapter.canResume() returns false unconditionally.

What your adapter owes the task: emit session_init as early as possible (so the runner can persist the provider session id), emit tool_start/tool_end faithfully (so the UI shows the agent's work), and emit a result with populated CostData before your waitForCompletion() resolves.

4. Raw session logs & the task details page

The logFile field in ProviderSessionConfig is not optional decoration — it is the system of record for what happened inside a run. The task details UI reads from this pipeline.

Path convention

${LOG_DIR:-/logs}/<sessionId>/<timestamp>-<taskId8>.jsonl

Constructed at runner.ts:3102. LOG_DIR defaults to /logs in Docker workers, so the effective path is /workspace/logs/<sessionId>/<...>.jsonl. The runner writes the first line — a metadata record — at L3120 before spawning.

What each adapter writes to `logFile`

All three open the file with Bun.file(config.logFile).writer() and append JSONL lines.

Claude (claude-adapter.ts):
- Raw NDJSON stdout from the Claude CLI is piped through (L284).
- stderr wrapped as {type: "stderr", content, timestamp} (L321–323).
- File handle closed at L329.
Pi-mono (pi-mono-adapter.ts):
- Every normalized ProviderEvent is written as {...event, timestamp} inside emit() (L167–187). Closed in runSession's finally at L327.
Codex (codex-adapter.ts):
- Every ProviderEvent written with timestamp in emit() (L347–374).
- Raw SDK ThreadEvent mirrored as raw_log at L466.
- Closed at L734.

Secret scrubbing is mandatory at every log egress. Import scrubSecrets from src/utils/secret-scrubber.ts and wrap every string before writing. All three reference adapters do this; a new adapter must too (see claude L284, L294, L303, L320, L344; pi L170–173; codex L352–355).

How raw logs reach the task details page

The .jsonl file on disk is the adapter-side dump. The UI does not read the file directly — it reads the DB-backed copy.

adapter emits raw_log  ─▶  runner flushLogBuffer  ─▶  POST /api/session-logs
                                                           │
                                                           ▼
                                                 session_logs (SQLite)
                                                           │
                                                           ▼
              ui    ─ useTaskSessionLogs ─▶ GET /api/tasks/{id}/session-logs
                                                           │
                                                           ▼
                                              <SessionLogViewer />

Runner upload: runner.ts:1814–1833. Only raw_log triggers the remote push; raw_stderr is pretty-printed to worker stdout only (L1834–1836).
API write: src/http/session-data.ts L135–153 → createSessionLogs in src/be/db.
API read: same file L155–166 — GET /api/tasks/{taskId}/session-logs → getSessionLogsByTaskId.
UI hook: useTaskSessionLogs in ui/src/api/hooks/use-tasks.ts, consumed at ui/src/pages/tasks/[id]/page.tsx:42 and rendered by <SessionLogViewer /> at ui/src/components/shared/session-log-viewer.tsx.

What this means for a new provider

Your emit() implementation must do three things for the UI to light up:

Write every event to logFile as JSONL (for offline diagnostics / /workspace/logs/).
Emit a raw_log ProviderEvent for anything the user might want to inspect in the task details page. The runner will upload it.
Run every string through scrubSecrets before emitting or writing.

Tool calls (tool_start/tool_end) are not shown via session-logs — they go to /api/events/batch. You still need to emit them, but they reach the UI through a different channel (the agent's tool-activity timeline).

5. Exposing the swarm MCP to the runtime

This is the single most important integration point after event translation. The swarm MCP server is how the agent actually interacts with the swarm: store progress, offer subtasks, read/write memory, request human input, cancel itself. A provider that runs code but cannot call swarm MCP tools is effectively a read-only model invocation — it will never drive real swarm behavior.

Where the MCP server lives

The swarm API exposes its MCP server at {apiUrl}/mcp. Tools are defined under src/tools/.

How each reference adapter wires it

Provider	Wiring	File
Claude	Discovers an existing `.mcp.json` (walking up from `cwd`), injects `X-Source-Task-Id` into the `agent-swarm` entry, writes a per-session copy to `/tmp/mcp-<taskId>.json`, launches CLI with `--mcp-config <path> --strict-mcp-config`.	`claude-adapter.ts` L117–184, L251
Pi	Constructs `McpHttpClient(apiUrl, apiKey, agentId, taskId)`, calls `listTools()`, wraps each as a pi-mono `ToolDefinition` with prefix `mcp__<name>__`.	`pi-mono-adapter.ts` L410–421, `pi-mono-mcp-client.ts` L53
Codex	`buildCodexConfig` registers `mcp_servers["agent-swarm"] = { url: "{apiUrl}/mcp", http_headers: { Authorization, X-Agent-ID, X-Source-Task-Id }, bearer_token_env_var, startup_timeout_sec }`. Passed to `new Codex({ config })`.	`codex-adapter.ts` L132–243, L857–861

Required headers

Whatever the transport, the adapter MUST set these three headers on the swarm MCP connection:

Authorization: Bearer ${apiKey}
X-Agent-ID: <agentId>
X-Source-Task-Id: <taskId> — so nested tool calls attribute back to the right task

Key tools a harness will call mid-run

Pretty labels at runner.ts:254–317. Representative list:

store-progress — structured progress updates + memories
offer-task / send-task — delegate to another agent
cancel-task / poll-task — lifecycle
memory-search / memory-get / inject-learning — shared memory
get-task-details, post-message, read-messages — inter-agent coordination
request-human-input — HITL gates
trigger-workflow — start a workflow DAG

Fallback when MCP is unavailable

Each reference adapter fails open (the run continues without MCP tools), but the agent is effectively blind to the swarm:

Pi: try/catch around discovery (pi-mono-adapter.ts L409–424) — on failure, customTools = [].
Codex: adds the agent-swarm entry unconditionally; failures fetching installed servers are non-fatal (codex-adapter.ts L217–229).
Claude: createSessionMcpConfig returns null if nothing found; CLI runs without --mcp-config.

Even if the in-process MCP connection fails, the runner and the adapter's own swarm-event hooks still talk to the swarm HTTP API directly for lifecycle, heartbeat, and cancellation polling — see src/providers/codex-swarm-events.ts and src/providers/pi-mono-extension.ts L1–50. That is the safety net; MCP is what lets the model call tools.

6. System prompt composition & delivery

ProviderSessionConfig.systemPrompt is the full assembled system prompt, not a fragment. Your adapter's job is to hand it to the underlying runtime verbatim — not to add preamble.

How it is built

Composed by getBasePrompt(args) at src/prompts/base-prompt.ts:55–225 and orchestrated by buildSystemPrompt() at runner.ts:2269–2286. The pieces, in order:

Source	Section	Resolver
Template `system.session.{lead\|worker}`	Base role prompt	`resolveTemplateAsync` L61–63
Template `system.agent.worker.slack`	Slack section (workers only)	L66–72
`agentSoulMd` + `agentIdentityMd` + name/description	`## Your Identity`	L75–90
Installed skills list	Skill summary	L93–96
Installed MCP servers list	MCP summary	L99–101
Repo `CLAUDE.md` (from cwd)	`## Repository Context`	L111–118 (read via `readClaudeMd` at `runner.ts:77–86`)
Templates `system.agent.agent_fs`, `...services`, `...artifacts`	Conditional suffixes	L157–188
`agentClaudeMd`	`## Agent Instructions` (truncated)	L197–207
`agentToolsMd`	`## Your Tools & Capabilities` (truncated)	L210–219
`SYSTEM_PROMPT` env / `--system-prompt`	Appended	`runner.ts:2321–2323`
`task.additionalSystemPrompt`	Appended per-task	`runner.ts:3093–3097`

Truncation caps: BOOTSTRAP_MAX_CHARS=20_000 per section, BOOTSTRAP_TOTAL_MAX_CHARS=150_000 total (base-prompt.ts L16–19).

Identity sources (soulMd, identityMd, claudeMd, toolsMd, heartbeatMd) are fetched from GET /me at runner.ts:2427–2449. If missing, defaults come from the template, then from the generators in src/prompts/defaults.ts, then pushed back to the server.

If the runner reuses an existing repo clone that has local changes, ensureRepoForTask() now auto-stashes that work before refreshing from origin. The resulting swarm-autostash refs are threaded into repoContext.autoStashes and appended to the base prompt so the active session can restore them deliberately instead of silently losing or ignoring dirty work.

Template resolution goes over HTTP (configureHttpResolver(apiUrl, apiKey) at runner.ts:2206) to obey the API/worker DB boundary. Prompt files under src/prompts/ must remain pure — no bun:sqlite or src/be/db imports. Enforced by scripts/check-db-boundary.sh.

How each adapter delivers the prompt

Claude — CLI flag: cmd.push("--append-system-prompt", this.config.systemPrompt) at claude-adapter.ts:245–247.
Codex — written into AGENTS.md in cwd inside a <swarm_system_prompt> block. The Codex SDK has no --append-system-prompt equivalent; this file is the only channel. Entry: writeCodexAgentsMd(config.cwd, config.systemPrompt) at codex-adapter.ts:761; implementation at codex-agents-md.ts:56–119. If an AGENTS.md exists, the block is prepended; otherwise stacked atop any CLAUDE.md. Cleanup in the session finally at codex-adapter.ts:738.
Pi — SDK parameter: new DefaultResourceLoader({ appendSystemPrompt: [config.systemPrompt], ... }) passed via CreateAgentSessionOptions at pi-mono-adapter.ts:508–519.

What this means for a new provider

Pick the delivery path your runtime supports, in order of preference:

A dedicated system-prompt argument (flag, SDK field) — cleanest, no filesystem side effects.
A file-based convention (like Codex's AGENTS.md) — fine, but you must clean up in finally and not clobber user files.
Prepending to the user prompt — last resort; may confuse the model.

If your runtime has a distinct prompt shape (e.g. a different preamble format), add a subdirectory under src/prompts/<foo>/ and invoke it from your adapter, but keep the merge logic in base-prompt.ts as the single source of truth.

7. Skills (how the three providers handle them)

Skills are the swarm's portable procedural knowledge — reusable markdown files invoked as slash-commands like /review-pr, /implement-issue, /create-pr. Each provider surfaces them differently; your adapter must implement formatCommand(name) and may need a resolver if your runtime doesn't have native support.

The three patterns

Claude. The CLI already knows about skills installed under ~/.claude/skills/<name>/SKILL.md. The adapter just returns /<name>.

formatCommand(name: string): string { return `/${name}`; }

See claude-adapter.ts:601.

Pi. Pi-mono supports skills but namespaces them:

formatCommand(name: string): string { return `/skill:${name}`; }

See pi-mono-adapter.ts:541. Skills live at ~/.pi/agent/skills/<name>/SKILL.md.

Codex and OpenCode. These SDKs have no SKILL.md mechanism, so the adapter resolves the skill itself before calling the model:

src/providers/codex-skill-resolver.ts intercepts a leading /<name> in the user prompt.
Codex reads ${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md.
OpenCode reads ${OPENCODE_SKILLS_DIR ?? ~/.opencode/skills}/<name>/SKILL.md.
Inlines the SKILL.md content into the prompt before calling the provider SDK.
formatCommand(name) simply returns /${name} so the swarm can still emit the canonical form; the resolver does the work.

This is the template to follow if your provider lacks native skill support.

Where skill files come from

The swarm still seeds skills into each provider's skill dir at container boot in docker-entrypoint.sh L764–803, but bundled complex-skill files are now also synced from the database during normal skill refreshes. At minimum, every skill installs its SKILL.md into:

~/.claude/skills/<name>/SKILL.md
~/.pi/agent/skills/<name>/SKILL.md
~/.codex/skills/<name>/SKILL.md
~/.opencode/skills/<name>/SKILL.md
~/.agents/skills/<name>/SKILL.md

For DB-backed complex skills, sibling bundled files are mirrored alongside SKILL.md under the same skill directory. Legacy remote/sourceRepo-only complex skills still rely on the entrypoint fallback when the bundle is not present in the database yet.

When adding a new provider, extend both sync paths with ~/.<foo>/skills/<name>/... (or whatever directory layout your runtime expects).

8. Adding a new provider — step by step

Assume you are adding a provider called foo. Follow every step; "optional" is called out where true.

Step 1 — Scaffold the adapter

Create src/providers/foo-adapter.ts:

import type {
  ProviderAdapter,
  ProviderEvent,
  ProviderResult,
  ProviderSession,
  ProviderSessionConfig,
} from "./types";

export class FooAdapter implements ProviderAdapter {
  readonly name = "foo";

  async createSession(config: ProviderSessionConfig): Promise<ProviderSession> {
    return new FooSession(config);
  }

  async canResume(_sessionId: string): Promise<boolean> {
    return false; // or true if your SDK/CLI supports resume
  }

  formatCommand(commandName: string): string {
    return `/${commandName}`; // adjust to your runtime's convention
  }
}

class FooSession implements ProviderSession {
  sessionId: string | undefined;
  private listeners: Array<(e: ProviderEvent) => void> = [];
  // ... abort controller, pending promise, etc.

  constructor(private config: ProviderSessionConfig) {
    this.start();
  }

  onEvent(listener: (e: ProviderEvent) => void) { this.listeners.push(listener); }
  private emit(event: ProviderEvent) { for (const l of this.listeners) l(event); }

  async waitForCompletion(): Promise<ProviderResult> { /* resolve when native stream ends */ }
  async abort(): Promise<void> { /* kill subprocess / signal AbortController */ }

  private async start() { /* spawn CLI or call SDK; translate events to emit(...) */ }
}

Step 2 — Register in the factory

Edit src/providers/index.ts:

import { FooAdapter } from "./foo-adapter";
// ...
case "foo": return new FooAdapter();

Update the error message (Supported: claude, pi, codex, foo) so the unknown-provider error stays accurate.

Step 3 — Translate native events to `ProviderEvent`

This is the heart of the adapter. For each event your SDK / CLI emits, decide which ProviderEvent type to produce:

Session start → session_init { sessionId } (set this.sessionId first, then emit).
Assistant text → message { role: "assistant", content }.
Tool call start/end → tool_start / tool_end with { toolCallId, toolName, args|result }.
Turn/usage stats → context_usage.
Auto-compaction → compaction.
Any provider-specific event with no direct mapping → custom { name, data } (e.g. Codex uses custom for codex.reasoning and codex.todo_list).
Terminal → result { cost, output, isError, errorCategory } then resolve waitForCompletion().
Non-fatal diagnostics → raw_log / raw_stderr.

Reference code:

Claude JSONL branching: src/providers/claude-adapter.ts around L368–470.
Codex ThreadEvent dispatcher: src/providers/codex-adapter.ts around L463–618.
Pi AgentSessionEvent dispatcher: src/providers/pi-mono-adapter.ts around L161+.

Step 4 — Emit `CostData`

The result event must carry a populated CostData so the swarm can track spend:

emit({
  type: "result",
  cost: {
    sessionId: this.sessionId!,
    taskId: config.taskId,
    agentId: config.agentId,
    totalCostUsd, inputTokens, outputTokens,
    cacheReadTokens, cacheWriteTokens,
    durationMs, numTurns, model, isError: false,
  },
  isError: false,
});

If your SDK returns tokens but not USD, compute cost from a pricing table (see src/providers/codex-models.ts for the Codex model/pricing resolver pattern).

Step 5 — Model selection

ProviderSessionConfig.model is set by the runner from opts.model || process.env.MODEL_OVERRIDE || "" and may be overridden per task (task.model). Decide:

What is the provider default when model === ""? Read it from a provider-specific env var (CODEX_DEFAULT_MODEL is the existing convention).
Do you accept shortnames ("sonnet", "gpt-5") and expand to full IDs? If so, build a resolver — see resolveCodexModel in src/providers/codex-models.ts and resolveModel in src/providers/pi-mono-adapter.ts.

Step 6 — Credentials & auth

Three patterns are already in the codebase; pick the one that fits your provider:

Claude-style. Validate at adapter start; throw a clear error if missing. Example: validateClaudeCredentials() in src/providers/claude-adapter.ts.

Codex ChatGPT-style. See §6 below. This is the most involved path but required for desktop-login-style flows.

Pi-style. Reads ~/.pi/agent/auth.json. If your SDK looks up its own auth file, the adapter may not need to do anything beyond ensuring the file exists at worker boot.

Secret scrubbing: any credential you log must go through the project's scrubber. See src/utils/secret-scrubber.ts and the CLAUDE.md "Secret scrubbing" section.

Step 7 — MCP server injection

Every adapter fetches per-agent MCP servers from the swarm API and wires them into the provider:

GET {apiUrl}/api/agents/{agentId}/mcp-servers?resolveSecrets=true
Authorization: Bearer {apiKey}

Then:

Claude writes /tmp/mcp-<taskId>.json and passes it via --mcp-config (claude-adapter.ts L46–183).
Pi instantiates an McpHttpClient per HTTP/SSE server and registers tools prefixed mcp__<name>__ (pi-mono-adapter.ts L408–493).
Codex builds a structured mcp_servers object for new Codex({ config }) (codex-adapter.ts L132–243).

Always include the swarm's own MCP server with an X-Source-Task-Id header so nested tool calls attribute back correctly.

Step 8 — Swarm event hooks (cancellation, heartbeat, tool-loop detection)

The adapter is responsible for polling swarm-side signals during a run. Pattern files:

Codex: src/providers/codex-swarm-events.ts — throttled fireAndForget fetches for cancel/heartbeat/activity/context-usage, attached via this.listeners.push(...) inside the session.
Pi: src/providers/pi-mono-extension.ts — createSwarmHooksExtension passed into DefaultResourceLoader({ extensionFactories: [swarmExtension] }).
Claude: external hook process reads a task file (/tmp/agent-swarm-task-<pid>.json) written by claude-adapter.ts L31–43; hook logic lives under src/hooks/ (e.g. tool-loop-detection.ts).

Tool-loop detection (src/hooks/tool-loop-detection.ts::checkToolLoop) is reusable — call it from your event translator when you see tool_start.

Step 9 — Skills and slash-commands

Skills are the swarm's portable procedural knowledge (/review-pr, /implement-issue, etc.). Each provider handles them differently:

Claude: native slash-commands, so formatCommand(name) => "/" + name (claude-adapter.ts L601).
Pi: prefixed, formatCommand(name) => "/skill:" + name (pi-mono-adapter.ts L541); resolved from ~/.pi/agent/skills/<name>/SKILL.md.
Codex: no native skills support → src/providers/codex-skill-resolver.ts intercepts a leading /<name> in the prompt, reads ${CODEX_SKILLS_DIR ?? ~/.codex/skills}/<name>/SKILL.md, and inlines it before calling thread.runStreamed. The system prompt is delivered by writing AGENTS.md into cwd (see codex-agents-md.ts).
OpenCode: no native SKILL.md support → uses the same inline resolver, reads ${OPENCODE_SKILLS_DIR ?? ~/.opencode/skills}/<name>/SKILL.md, and inlines it before calling client.session.prompt.

Pick the model that matches your runtime; if your provider has no native skill mechanism, follow the Codex inline-resolver pattern.

The swarm syncs skill files into each provider's skill dir at container boot and during per-task refreshes (copies SKILL.md into ~/.claude/skills/, ~/.pi/agent/skills/, ~/.codex/skills/, ~/.opencode/skills/, and ~/.agents/skills/). Add an entry for your provider there.

Step 10 — Worker bootstrap (Docker entrypoint)

Edit docker-entrypoint.sh:

Credential validation branch — mirror the pattern used for pi (L7–12), codex (L13–71), or claude (L72–79).
Binary reachability check — add a block similar to the CODEX_BINARY / CLAUDE_BINARY checks (L87–108).
Skill sync — extend the skill-copy block (L764–803) with your provider's skill directory.
If your provider has a CLI binary, install it in Dockerfile.worker.

If your provider needs a user-interactive OAuth flow (like Codex's ChatGPT login), add a CLI command:

Implement PKCE + local callback server. Reference: src/providers/codex-oauth/flow.ts — createAuthorizationFlow (URL with code_challenge=S256), startLocalOAuthServer (node:http on 127.0.0.1:1455/auth/callback), exchangeAuthorizationCode.
Add storage helpers at src/providers/<foo>-oauth/storage.ts that PUT /api/config with { scope: "global", key: "<foo>_oauth", value: JSON.stringify(creds), isSecret: true }. See storeCodexOAuth and getValidCodexOAuth (with auto-refresh) in src/providers/codex-oauth/storage.ts.
Add the CLI command at src/commands/<foo>-login.ts. Reference: src/commands/codex-login.ts — uses promptHiddenInput for masked API-key entry and attempts to auto-open the browser via open / start / xdg-open by platform.
Register the command in src/cli.tsx (non-UI command: console.log + process.exit(0) style) and update COMMAND_HELP.
In docker-entrypoint.sh, restore credentials at boot by fetching them from /api/config/resolved?includeSecrets=true&key=<foo>_oauth and writing the provider's expected auth-file format (see the codex block, L13–71, for the jq-based reshape).
At adapter session-creation time, re-fetch-and-refresh as a fallback if the token is expired (see codex-adapter.ts L810–844).

Step 12 — Types & enums

Update union-type entries that enumerate providers:

src/types.ts — HarnessProvider union.
templates/schema.ts — template provider enum.
Any migration that stores a provider column — do not modify existing migrations; create a new one under src/be/migrations/ if a schema update is needed.

Step 13 — Prompts

If your provider benefits from a distinct system-prompt shape, add src/prompts/<foo>/ mirroring src/prompts/claude/ and src/prompts/codex/. Wire it into the adapter's createSession. Prompt files must remain pure (no DB imports) — the DB boundary is enforced by scripts/check-db-boundary.sh.

Step 14 — Tests

Add at minimum:

src/tests/<foo>-adapter.test.ts — unit tests for event translation (feed fake native events, assert emitted ProviderEvents).
src/tests/<foo>-oauth.test.ts (if OAuth) — PKCE helpers, storage round-trip, token refresh.

Existing analogs: src/tests/codex-*.test.ts, src/tests/claude-*.test.ts, src/tests/pi-*.test.ts. Tests must use isolated SQLite files and clean up -wal / -shm in afterAll (see the "Unit tests" block in CLAUDE.md).

Step 15 — Documentation

Update CLAUDE.md: add foo to the HARNESS_PROVIDER accepted values list and document any required env vars.
Update this guide's "Reference implementations" table.
Update the README's "Multi-provider" line.
Update Harness Configuration with the new provider's setup instructions.
If the provider needs new HTTP endpoints, regenerate OpenAPI: bun run docs:openapi.

9. Codex OAuth: the full reference flow

This is documented separately because it is the most involved integration.

┌────────────────┐   codex-login CLI      ┌──────────────────┐
│ User's laptop  │ ─── PKCE auth URL ──▶  │ auth.openai.com  │
│                │ ◀── code (state) ───── │ OAuth server     │
│                │                        └──────────────────┘
│ Local callback │         ▲
│ :1455/auth/... │─────────┘
└────────┬───────┘
         │ exchangeAuthorizationCode
         ▼
┌────────────────┐
│  Creds JSON    │
│  (tokens)      │
└────────┬───────┘
         │ PUT /api/config { key:"codex_oauth", isSecret:true }
         ▼
┌────────────────────────────┐
│ swarm_config (encrypted)   │
└────────┬───────────────────┘
         │ docker-entrypoint.sh fetches at boot
         ▼
┌────────────────────────────┐
│ ~/.codex/auth.json (0600)  │
└────────────────────────────┘

Key files: src/providers/codex-oauth/{flow.ts,storage.ts,auth-json.ts,pkce.ts,types.ts}, src/commands/codex-login.ts, docker-entrypoint.sh L13–71, adapter fallback codex-adapter.ts L810–844.

The shape conversion (our flat {access, refresh, expires, accountId} → Codex CLI's {auth_mode: "chatgpt", tokens: {...}}) lives in credentialsToAuthJson() at src/providers/codex-oauth/auth-json.ts L37–49.

10. Pre-PR checklist for a new provider

Run before opening the PR (per CLAUDE.md):

bun run lint:fix
bun run tsc:check
bun test
bash scripts/check-db-boundary.sh
bun run docs:openapi   # only if you added HTTP endpoints

Manual verification:

HARNESS_PROVIDER=foo bun run src/cli.tsx worker starts and connects.
A trivial task ("Say hi") runs to completion and posts progress + cost.
cancel-task via MCP actually aborts the in-flight run.
docker build -f Dockerfile.worker . succeeds.
Full E2E with Docker (see CLAUDE.md "E2E testing with Docker") with -e HARNESS_PROVIDER=foo.
For OAuth providers: run bun run src/cli.tsx <foo>-login end-to-end, then boot a worker in Docker and verify it picks up the stored creds.

11. Files to touch — quick checklist

Concern	File(s)
Adapter implementation	`src/providers/<foo>-adapter.ts`
Factory registration	`src/providers/index.ts`
Types / enums	`src/types.ts`, `templates/schema.ts`
Prompts (optional)	`src/prompts/<foo>/`
OAuth (optional)	`src/providers/<foo>-oauth/*`, `src/commands/<foo>-login.ts`
Setup CLI (optional, e.g. claude-managed)	`src/commands/<foo>-setup.ts`
CLI wiring	`src/cli.tsx` (`COMMAND_HELP`, command routing)
Docker bootstrap	`docker-entrypoint.sh`, `Dockerfile.worker`
Hooks (optional)	`src/hooks/*`, `src/providers/<foo>-swarm-events.ts`
Skills resolver (if provider lacks native support)	`src/providers/<foo>-skill-resolver.ts`
Models / pricing (optional)	`src/providers/<foo>-models.ts`
Tests	`src/tests/<foo>-*.test.ts`
Integrations UI (optional)	`ui/src/lib/integrations-catalog.ts`
Docs	`CLAUDE.md`, `README.md`, this guide, Harness Configuration

claude-managed reference files: src/providers/claude-managed-adapter.ts, src/providers/claude-managed-swarm-events.ts, src/providers/claude-managed-models.ts, src/commands/claude-managed-setup.ts, ui/src/lib/integrations-catalog.ts.

12. Claude Managed Agents — pre-existing Agent + Environment pattern

claude-managed is the first reference adapter where the session runtime executes outside the worker container: the worker only opens an SSE stream against client.beta.sessions.events.stream and relays normalized events to the runner. This forces a few design decisions that are worth calling out — they apply to any future provider with a similar "managed cloud session" shape (Devin's /sessions API is the closest existing analog).

a. We don't `agents.create` at runtime

Anthropic's beta API has a 1:1 Agent ↔ identity model. Calling client.beta.agents.create(...) from each worker on each task would (a) leak agents into the customer's account at the rate of one per task, and (b) make skill / tool inventory non-deterministic per session. Instead, we treat the Agent and Environment as persistent infrastructure, created once during operator onboarding and persisted by ID. The adapter only ever calls client.beta.sessions.create({ agent: MANAGED_AGENT_ID, environment_id: MANAGED_ENVIRONMENT_ID, ... }).

b. The `claude-managed-setup` CLI

bun run src/cli.tsx claude-managed-setup

Implementation: src/commands/claude-managed-setup.ts. Behavior:

Reads ANTHROPIC_API_KEY from .env / env (or prompts).
Creates the Environment (client.beta.environments.create) — the long-lived sandbox configuration (allowed networks, default packages, persistent volumes).
Uploads each plugin/commands/*.md skill via client.beta.skills.create (one-shot per skill content hash; the CLI dedupes against an existing inventory).
Creates the Agent (client.beta.agents.create) and attaches the freshly uploaded skills.
PUT /api/config persists MANAGED_AGENT_ID + MANAGED_ENVIRONMENT_ID into swarm_config so deployed workers (and the integrations UI) can restore them at boot.
Re-run with --force to recreate (rare — only if upstream rotates IDs).

This is the shape every future "managed cloud session" provider should follow: setup-once → IDs live in swarm_config → workers fail-fast at boot if absent.

c. System prompt in the user message + prompt-cache breakpoint

Managed-agents has no system field on sessions.create. The closest analog is client.beta.sessions.events.send({ events: [{ type: "user.message", content: [...] }] }). We compose the swarm's full assembled systemPrompt as the first content block and the per-task prompt as the second:

[
  { type: "text", text: <full system prompt + agent identity + skills>, cache_control: { type: "ephemeral" } },
  { type: "text", text: `User request:\n\n${prompt}` },  // no cache_control
]

The cache_control: { type: "ephemeral" } marker on the first block creates a prompt-cache breakpoint — Anthropic caches everything up to that boundary across sessions for the same agent, so subsequent tasks for the same agent re-use the static prefix at cache-read pricing. The per-task block sits after the breakpoint and is allowed to differ without invalidating the cache.

This is enforced by composeManagedUserMessage in claude-managed-adapter.ts (asserted byte-identical-prefix in src/tests/claude-managed-adapter.test.ts).

d. `X-Source-Task-Id` is dropped

The MCP integration §5 calls out that every adapter MUST set X-Source-Task-Id on the swarm MCP connection. claude-managed cannot. The MCP servers are configured server-side on the Anthropic-managed Agent (not per-session), and the SDK doesn't expose a per-session HTTP-header override. We instead pass the task ID via metadata.swarmTaskId on sessions.create, and the swarm MCP tools accept task_id as an explicit tool argument when the header is missing. New providers that hit the same constraint should follow this fallback.

e. Skill upload via `beta.skills.create`

Skills are not synced into a filesystem path on the worker (the worker doesn't run the model). They're uploaded once during claude-managed-setup via client.beta.skills.create({ content, name, ... }) and referenced by ID on the Agent. The skill content is the same plugin/commands/*.md body that other providers copy into ~/.claude/skills/<name>/SKILL.md — so the source of truth stays in the repo.

f. SDK shape deviations to be aware of

The Anthropic Beta SDK has a few non-obvious surface differences from the conventional Anthropic messages.create API. The header comments in src/providers/claude-managed-adapter.ts (top-of-file block, ~L13–35) document them in detail, but the highlights for anyone reading the code:

Resource type is github_repository, not github_repo. The SDK type is BetaGitHubRepositoryResource and the literal field is type: "github_repository".
events.send takes { events: [...] } — an array, not a single event arg. The naming makes it look like events.send(event) would work; it would not.
Session status enum is 'rescheduling' | 'running' | 'idle' | 'terminated'. "Archived" is not a status — it's signaled by archived_at !== null. canResume() therefore rejects on terminated or non-null archived_at.
cache_control is a runtime-honored field that's NOT in the TS definition for BetaManagedAgentsTextBlock. We attach it via a typed extension and cast on the way out so the runtime payload includes it.
events.stream returns an AsyncIterable, not a Promise of an array — iterate with for await.
events.list is a PagePromise that's also AsyncIterable over historical session events; the resume path uses it to pre-fetch + dedupe against the live stream.

13. Further reading

Harness Configuration — how to use the existing providers.
Codex OAuth setup — end-user OAuth flow.
CLAUDE.md — project-wide rules, especially "Architecture invariants" (DB boundary) and "Secret scrubbing".
src/providers/types.ts — canonical interface definitions.
src/providers/codex-adapter.ts — the most feature-complete reference (OAuth + skills resolver + hooks + model resolver).

Adding a Harness Provider

On this page