Agent SwarmAgent Swarm
Architecture

Memory System

How agents build compounding knowledge across sessions

Agent Swarm agents aren't stateless. They build compounding knowledge through multiple automatic mechanisms. The memory system uses provider abstractions, vector search, and intelligent reranking to surface the most relevant knowledge at the right time.

The memory system was redesigned in #212 to add TTL-based expiry, reranking with recency and access signals, and swappable provider interfaces.

How Memory Works

Every agent has a searchable memory backed by embeddings and stored in SQLite. The system is built on two provider abstractions:

  • EmbeddingProvider — Converts text to vectors. Default implementation uses OpenAI text-embedding-3-small (512 dimensions). Swappable for other providers.
  • MemoryStore — Persists and retrieves memories. Default implementation uses SQLite with sqlite-vec for KNN vector search. Falls back to brute-force cosine similarity when the extension is unavailable.

Memory Sources

Memories are automatically created from:

  • Session summaries — At the end of each session, a lightweight model (Haiku) extracts key learnings: mistakes made, patterns discovered, failed approaches, and codebase knowledge. These summaries become searchable memories.
  • Task completions — Every completed (or failed) task's output is indexed. Failed tasks include notes about what went wrong, so the agent avoids repeating the same mistake.
  • File-based notes — Agents write to /workspace/personal/memory/ in their per-agent directory. Files written here are automatically indexed via the PostToolUse hook and can be promoted to swarm scope.
  • Lead-to-worker injection — The lead agent can push specific learnings into any worker's memory using the inject-learning tool, closing the feedback loop.

Memory Scopes

ScopePathVisibility
Agent (private)/workspace/personal/memory/Only the owning agent
Swarm (shared)Promoted via inject-learningAll agents in the swarm

Automatic Scope Promotion

The inject-learning tool creates swarm-scoped memories by default, so learnings injected by the lead are available to all workers.

TTL & Expiry

Memories have a time-to-live (TTL) based on their source type. Expired memories are automatically filtered from search results but not proactively deleted from the database.

SourceTTLRationale
task_completion7 daysTask outputs become stale quickly
session_summary3 daysSession context is ephemeral
file_index30 daysFile contents may change
manualNever expiresExplicitly stored knowledge is permanent

Expired memories can still be retrieved by ID via memory-get — only search results are filtered. The memory-delete tool provides explicit cleanup when needed.

Memory Retrieval & Reranking

Before starting each task, the runner automatically searches for relevant memories and includes them in the agent's context.

Search Process

  1. Task description is used as the search query
  2. An embedding is generated via the EmbeddingProvider
  3. Candidate memories are retrieved via KNN search (sqlite-vec) or brute-force cosine similarity (fallback)
  4. Candidates are reranked using a composite score
  5. Top matches are included in the task context as "Relevant Past Knowledge"

Reranking

Raw vector similarity alone isn't enough — a memory from yesterday about the same topic should rank higher than a similar but month-old memory. The reranker computes:

finalScore = similarity × recencyDecay × accessBoost
SignalFormulaEffect
Recency decay2^(-ageDays / 14)Memories lose half their score every 14 days
Access boost1 + min(accessCount/10, 0.5) × recencyFactorFrequently accessed memories get up to 1.5× boost

The candidate set is fetched at 3× the requested limit, then narrowed after reranking. This ensures that a highly relevant but older memory can still surface if its similarity is strong enough.

Agents can search and manage memories using MCP tools:

  • memory-search — Search with natural language queries. Returns reranked results.
  • memory-get — Retrieve full details of a specific memory by ID (increments access count).
  • memory-delete — Delete a memory. Agents can delete their own; leads can also delete swarm-scoped memories.

Writing Memories

The best practice is to write memories to files immediately when something important is learned:

Write("/workspace/personal/memory/auth-header-fix.md",
  "The API requires Bearer prefix on all auth headers.
   Without it, you get a misleading 403 instead of 401.")

Files are automatically indexed by the PostToolUse hook — no additional action needed.

What to Save

  • Solutions to problems you solved
  • Codebase patterns you discovered
  • Mistakes you made and how to avoid them
  • Important configurations
  • Instructions from the lead or user

What Not to Save

  • Session-specific context (temporary state)
  • Unverified conclusions
  • Information that duplicates existing documentation

Memory Categories

When the lead injects learnings via inject-learning, they're categorized:

CategoryPurpose
mistake-patternCommon mistakes to avoid
best-practicePreferred approaches
codebase-knowledgeFacts about the codebase
preferenceUser or team preferences

Configuration

Reranking parameters are tunable via environment variables:

VariableDefaultDescription
MEMORY_RECENCY_HALF_LIFE_DAYS14Days until recency decay reaches 0.5
MEMORY_ACCESS_BOOST_MAX1.5Maximum access boost multiplier
MEMORY_ACCESS_RECENCY_HOURS48Hours within which access counts for full boost
MEMORY_CANDIDATE_MULTIPLIER3Candidate set size relative to requested limit

Architecture

src/be/memory/
├── types.ts              # EmbeddingProvider + MemoryStore interfaces
├── constants.ts          # TTL defaults + reranking params (env-overridable)
├── reranker.ts           # Scoring: similarity × recency × access
├── index.ts              # Singleton getters
└── providers/
    ├── openai-embedding.ts   # OpenAI text-embedding-3-small
    └── sqlite-store.ts       # SQLite + sqlite-vec KNN search

The provider interfaces make it straightforward to add alternative implementations (e.g., a different embedding model or a Postgres-backed store) without changing any consumer code.

On this page