{
  "newsletter_slug": "station-press",
  "section": "press",
  "slug": "unified-agent-architecture",
  "title": "Proposal: Unified Agent Architecture",
  "summary": "I read through the entire stack: Pi&#39;s extensions/skills/system-prompt, COT&#39;s runner/tools/briefing/prompt/maintain-runner, the architecture analysis document, and every major extension. Here&#39;s what I found. The Core Problem: Two Systems, One Mechanism Pi and COT...",
  "published_at": "2026-02-28T00:00:00.000Z",
  "page_html": "<p>I read through the entire stack: Pi&#39;s extensions/skills/system-prompt, COT&#39;s runner/tools/briefing/prompt/maintain-runner, the architecture analysis document, and every major extension. Here&#39;s what I found.</p>\n<h2>The Core Problem: Two Systems, One Mechanism</h2>\n<p>Pi and COT both do the same thing at their core:</p>\n<pre><code>loop:\n  LLM generates → tools execute → context accumulates → repeat\n</code></pre>\n<p>Both use <code>createAgentSession</code> from the Pi SDK. Both call <code>session.prompt()</code>. Both have tool registries, context management, error handling. But they share <strong>zero implementation</strong> above the SDK level:</p>\n<table>\n<thead>\n<tr>\n<th>Concept</th>\n<th>Pi Implementation</th>\n<th>COT Implementation</th>\n</tr>\n</thead>\n<tbody><tr>\n<td>Tool registry</td>\n<td>Extensions (loaded at startup, permanent)</td>\n<td><code>registry.ts</code> (factory pattern, on-demand)</td>\n</tr>\n<tr>\n<td>bash/read/write/edit</td>\n<td>Pi SDK built-in tools</td>\n<td>Separate tool modules (import side-effects)</td>\n</tr>\n<tr>\n<td>WhatsApp</td>\n<td><code>wa/index.ts</code> extension (CLI wrapper)</td>\n<td><code>wa-*.ts</code> tools (DB query wrappers)</td>\n</tr>\n<tr>\n<td>Email</td>\n<td><code>google/index.ts</code> (API wrapper)</td>\n<td><code>email-*.ts</code> tools (DB query wrappers)</td>\n</tr>\n<tr>\n<td>Memory</td>\n<td><code>agent-memory/index.ts</code> extension</td>\n<td><code>cot-memory.ts</code> + <code>search-agent-memory.ts</code></td>\n</tr>\n<tr>\n<td>Search</td>\n<td><code>unified-search/index.ts</code> extension</td>\n<td><code>unified-search.ts</code> tool</td>\n</tr>\n<tr>\n<td>Agent delegation</td>\n<td><code>agents-runtime</code> (child processes)</td>\n<td><code>spawn_sub_agent</code> (in-process sessions)</td>\n</tr>\n<tr>\n<td>Context management</td>\n<td>Smart-read progressive modes</td>\n<td>Sub-agent isolation</td>\n</tr>\n<tr>\n<td>Guidance</td>\n<td>AGENTS.md + 27 skills</td>\n<td>242-line system prompt + briefing</td>\n</tr>\n</tbody></table>\n<p>Every capability is implemented twice. When you add a new data source, you build it twice. When you fix a bug, you fix it twice. This isn&#39;t DRY — it&#39;s two complete agent systems that happen to share a database.</p>\n<h2>The Five Deeper Problems</h2>\n<p><strong>1. The guidance stack is inverted.</strong></p>\n<p>Pi loads ~10K tokens of system prompt (AGENTS.md) on every turn. It includes environment details, Codex delegation rules, tool routing decision trees, workspace conventions — most irrelevant to most turns. Then skills add another 2-7K tokens each when loaded. The model processes all of this before doing anything.</p>\n<p>The architecture doc proved: 52% of sessions are &lt;5K chars with 1.6 tools. The system is optimized for the 0.5% complex case while every session pays the cost.</p>\n<p><strong>2. Tool loading is all-or-nothing.</strong></p>\n<p>Pi loads 51+ tools permanently. Each tool definition costs tokens in the cached prefix — and more importantly, costs model attention on every turn (the model must consider all tools before choosing). The architecture doc showed 4 tools account for 70% of usage. The other 47 tools are loaded for every session but used rarely.</p>\n<p>COT loads all 65+ tools too, but organized into groups. Only <code>COT_PROCESS_TOOLS</code> + <code>ORCHESTRATOR_READ_TOOLS</code> go to the main session. Sub-agents get scoped subsets. This is better but still manual and static.</p>\n<p><strong>3. Context has no economic model.</strong></p>\n<p>Neither system gives the model information about its own context budget. The <code>read</code> tool consumes 52.2% of all tool result bytes. Smart-read added progressive disclosure modes (peek/head/grep/structure), but the model doesn&#39;t know <em>why</em> it should use them. It doesn&#39;t know how much budget remains, what it&#39;s already consumed, or what the cost of a <code>mode: full</code> read will be.</p>\n<p><strong>4. Agent composition is three different systems.</strong></p>\n<ul>\n<li>Pi&#39;s <code>agent_spawn</code> creates child pi processes (full extension loading, DB-backed lifecycle, message passing)</li>\n<li>COT&#39;s <code>spawn_sub_agent</code> creates lightweight in-process sessions (shared tool deps, no DB, synchronous)</li>\n<li>Pi&#39;s Codex extension creates OpenAI Codex sessions (external API, sandbox, context management)</li>\n</ul>\n<p>These can&#39;t interoperate. A Pi child agent can&#39;t use COT&#39;s database tools. A COT sub-agent can&#39;t use Pi&#39;s browser extension. Codex can&#39;t do either. They&#39;re three islands.</p>\n<p><strong>5. The ESA pattern is ceremony, not architecture.</strong></p>\n<p>COT&#39;s system prompt mandates Explore→Synthesize→Act every run. This makes sense when there&#39;s lots of new data to process. But when nothing&#39;s changed? The model still goes through the motions — spawning empty explorers, synthesizing nothing, writing the same scratchpad. The pattern is burned into the prompt, not emergent from the situation.</p>\n<hr>\n<h2>The Proposal: Sessions All the Way Down</h2>\n<h3>Core Insight</h3>\n<p>Every &quot;agent&quot; in the current system is the same thing: <strong>an LLM session with tools, a context budget, and a lifecycle policy.</strong> The differences are configuration, not architecture.</p>\n<table>\n<thead>\n<tr>\n<th>Current Thing</th>\n<th>What It Actually Is</th>\n</tr>\n</thead>\n<tbody><tr>\n<td>Pi interactive session</td>\n<td>Session(trigger=human, tools=universal, trust=high, lifecycle=multi-turn)</td>\n</tr>\n<tr>\n<td>COT process run</td>\n<td>Session(trigger=timer, tools=orchestrator, trust=medium, lifecycle=stateful-one-shot)</td>\n</tr>\n<tr>\n<td>COT sub-agent</td>\n<td>Session(trigger=parent, tools=scoped, trust=inherited, lifecycle=one-shot)</td>\n</tr>\n<tr>\n<td>COT maintain worker</td>\n<td>Session(trigger=queue, tools=task-specific, trust=low, lifecycle=one-shot)</td>\n</tr>\n<tr>\n<td>Pi child agent</td>\n<td>Session(trigger=parent, tools=inherited, trust=inherited, lifecycle=multi-turn)</td>\n</tr>\n<tr>\n<td>Codex delegation</td>\n<td>Session(trigger=parent, tools=sandboxed, trust=sandboxed, lifecycle=multi-turn)</td>\n</tr>\n</tbody></table>\n<p>They should all be the same runtime with different configuration.</p>\n<h3>Five Primitives</h3>\n<p>The entire system reduces to five concepts:</p>\n<h4>1. Session (the core loop)</h4>\n<pre><code class=\"language-typescript\">interface Session {\n  id: string;\n  model: Model;\n  tools: Tool[];\n  budget: ContextBudget;\n  lifecycle: Lifecycle;\n  trust: TrustLevel;\n  state: SessionState;\n\n  prompt(input: string): AsyncIterable&lt;Event&gt;;\n  spawn(contract: SessionContract): Session;\n  dispose(): void;\n}\n</code></pre>\n<p>One loop. One implementation. Every &quot;agent&quot; type uses it.</p>\n<p>The critical difference from what exists today: <strong><code>budget</code> is a first-class concept</strong>, not an afterthought. The session knows how much context it has consumed, what percentage remains, and can make intelligent decisions (or expose this to the model).</p>\n<h4>2. Capability (tools + guidance as a unit)</h4>\n<pre><code class=\"language-typescript\">interface Capability {\n  name: string;                    // e.g., &quot;email&quot;, &quot;whatsapp&quot;, &quot;code-editing&quot;\n  tools: ToolDefinition[];         // the tools this capability provides\n  guidance: string;                // usage guidance injected with the tools\n  dependencies?: string[];         // other capabilities this requires\n  cost: number;                    // token cost of loading this capability\n}\n</code></pre>\n<p>This replaces three separate concepts:</p>\n<ul>\n<li>Pi extensions (tool providers)</li>\n<li>Pi skills (usage guidance)</li>\n<li>COT tool groups (named tool sets)</li>\n</ul>\n<p>A capability is <strong>tools + the knowledge to use them correctly</strong>, loaded as one unit. When you load &quot;email,&quot; you get <code>email_read</code>, <code>email_search</code>, <code>email_list_threads</code> AND the guidance about using snippets instead of full bodies, two-pass patterns, triage integration.</p>\n<p>This kills the current problem where tools exist in extensions but their usage guidance lives in skills. They should never be separated.</p>\n<p><strong>Capability tiers:</strong></p>\n<table>\n<thead>\n<tr>\n<th>Tier</th>\n<th>Loaded When</th>\n<th>Examples</th>\n</tr>\n</thead>\n<tbody><tr>\n<td><strong>Core</strong></td>\n<td>Always</td>\n<td>bash, read, write, edit, memory</td>\n</tr>\n<tr>\n<td><strong>Domain</strong></td>\n<td>When session contract includes them</td>\n<td>email, whatsapp, calendar, web, search</td>\n</tr>\n<tr>\n<td><strong>Specialist</strong></td>\n<td>When explicitly requested</td>\n<td>browser, codex, agents, garmin</td>\n</tr>\n<tr>\n<td><strong>Ephemeral</strong></td>\n<td>Created for specific contexts</td>\n<td>database query wrappers, workflow-specific tools</td>\n</tr>\n</tbody></table>\n<h4>3. Profile (session configuration template)</h4>\n<pre><code class=\"language-typescript\">interface Profile {\n  name: string;\n  capabilities: string[];          // which capabilities to load\n  trust: TrustLevel;               // approval requirements\n  lifecycle: LifecycleConfig;      // how the session starts, runs, ends\n  budget: BudgetConfig;            // context limits, compaction strategy\n  guidance: string;                // profile-specific system prompt\n}\n</code></pre>\n<p>Profiles replace the current fragmented configuration:</p>\n<ul>\n<li>Pi&#39;s AGENTS.md → &quot;interactive&quot; profile guidance</li>\n<li>COT&#39;s system prompt → &quot;orchestrator&quot; profile guidance</li>\n<li>COT&#39;s maintain worker prompt → &quot;worker&quot; profile guidance</li>\n</ul>\n<pre><code class=\"language-typescript\">const PROFILES = {\n  interactive: {\n    capabilities: [&quot;core&quot;, &quot;memory&quot;, &quot;search&quot;],\n    // other capabilities loaded on demand\n    trust: &quot;high&quot;,\n    lifecycle: { type: &quot;multi-turn&quot;, humanInLoop: true },\n    budget: { compactAt: 0.7, defaultReadMode: &quot;peek&quot; },\n    guidance: &quot;...&quot; // minimal — human is watching\n  },\n  \n  orchestrator: {\n    capabilities: [&quot;core&quot;, &quot;memory&quot;, &quot;delegation&quot;, &quot;orchestrator-tools&quot;],\n    trust: &quot;medium&quot;,\n    lifecycle: { type: &quot;stateful-one-shot&quot;, stateStore: &quot;scratchpad&quot; },\n    budget: { compactAt: 0.6, defaultReadMode: &quot;structure&quot; },\n    guidance: &quot;...&quot; // ESA-like but advisory, not mandatory\n  },\n  \n  worker: {\n    capabilities: [&quot;core&quot;],\n    // additional capabilities specified per-task\n    trust: &quot;low&quot;,\n    lifecycle: { type: &quot;one-shot&quot;, maxTurns: 10 },\n    budget: { compactAt: 0.5 },\n    guidance: &quot;Execute the task. Return structured results. No exploration.&quot;\n  },\n  \n  child: {\n    capabilities: [], // inherited from parent + task-specific\n    trust: &quot;inherited&quot;,\n    lifecycle: { type: &quot;scoped&quot;, maxTurns: 5 },\n    budget: { inherit: &quot;parent-remaining&quot; },\n    guidance: &quot;&quot; // set by parent\n  }\n};\n</code></pre>\n<h4>4. State (persistence across sessions)</h4>\n<pre><code class=\"language-typescript\">interface StateStore {\n  // Ephemeral (within session)\n  context: Message[];              // conversation history\n  \n  // Persistent (across sessions)  \n  memory: MemoryStore;             // long-term knowledge (current system)\n  scratchpad: ScratchpadStore;     // inter-run state (COT&#39;s current approach)\n  queue: QueueStore;               // background task queue (maintain)\n}\n</code></pre>\n<p>The current memory system works. Keep it. The scratchpad pattern (inter-run state for autonomous sessions) works. Keep it. The maintain queue works. Keep it.</p>\n<p>What changes: these are all accessible through the same state interface, not through separate tool implementations per system.</p>\n<h4>5. Budget (context economics)</h4>\n<pre><code class=\"language-typescript\">interface ContextBudget {\n  total: number;                   // model&#39;s context window\n  used: number;                    // current consumption\n  remaining: number;               // total - used\n  percentUsed: number;             // 0-100\n  \n  // Exposed to the model via tool results\n  report(): BudgetReport;\n  \n  // Automatic policies\n  compactAt: number;               // percentage threshold\n  readDefault: ReadMode;           // default mode for read tool\n  \n  // Adaptive behavior\n  suggestReadMode(fileSize: number): ReadMode;\n  shouldDelegate(estimatedCost: number): boolean;\n}\n</code></pre>\n<p>This is the genuinely new thing. Neither system has it today.</p>\n<p>Every tool result includes a budget footer:</p>\n<pre><code>[Context: 47% used | 106K remaining | Read mode: peek recommended]\n</code></pre>\n<p>The model can then make informed decisions without explicit instructions. It doesn&#39;t need 500 tokens of AGENTS.md telling it when to use peek mode — it can see the budget and reason about it.</p>\n<hr>\n<h2>How Current Systems Map to the New Model</h2>\n<h3>Pi Interactive → Profile(&quot;interactive&quot;)</h3>\n<pre><code>Before: AGENTS.md (10K tokens) + 51 permanent tools + 27 skills\nAfter:  Profile guidance (2K tokens) + 5 core tools + capabilities loaded on demand\n</code></pre>\n<p>AGENTS.md gets decomposed:</p>\n<ul>\n<li>Environment info → loaded only when relevant (SSH, cross-machine tasks)</li>\n<li>Codex rules → part of the &quot;codex&quot; capability, loaded only when delegating</li>\n<li>Tool routing → eliminated (the model sees only tools it needs)</li>\n<li>Workspace conventions → part of the &quot;code-editing&quot; capability</li>\n<li>Memory instructions → part of the &quot;memory&quot; capability (already loaded as core)</li>\n</ul>\n<p>Skills get merged into capabilities:</p>\n<ul>\n<li><code>research</code> skill + web_search/web_fetch tools → &quot;research&quot; capability</li>\n<li><code>whatsapp</code> skill + wa tools → &quot;whatsapp&quot; capability</li>\n<li><code>github</code> skill + git tools → &quot;github&quot; capability</li>\n</ul>\n<p>The model starts with 5 tools and ~2K guidance. It asks for more when it needs them (or the system auto-loads based on the conversation).</p>\n<h3>COT Process → Profile(&quot;orchestrator&quot;)</h3>\n<pre><code>Before: 242-line system prompt + 65+ tools + hardcoded ESA + briefing builder\nAfter:  Profile guidance (advisory ESA) + orchestrator capabilities + briefing as input\n</code></pre>\n<p>The ESA pattern becomes advisory, not mandatory:</p>\n<pre><code>You receive a briefing with the current state of Aaron&#39;s digital life.\nYour goal: surface what needs attention, maintain state, queue background work.\n\nWhen data is large or complex, delegate reads to child sessions to \npreserve your context for synthesis and action. When the situation is \nsimple, act directly.\n</code></pre>\n<p>Same tools, but loaded as capabilities:</p>\n<ul>\n<li>Core: bash, read, write, edit, memory</li>\n<li>Orchestration: scratchpad, priorities, telegram, queue</li>\n<li>Data reading: email, whatsapp, calendar, garmin (loaded when briefing shows data)</li>\n</ul>\n<p>The briefing builder stays. It&#39;s good. But it now also includes budget information.</p>\n<h3>COT Sub-agents → Session.spawn(contract)</h3>\n<pre><code>Before: spawn_sub_agent with manual tool list and 5 params\nAfter:  session.spawn({ capabilities: [&quot;email&quot;], prompt: &quot;...&quot;, maxTurns: 3 })\n</code></pre>\n<p>Same mechanism, cleaner contract. The sub-agent gets the email capability (tools + guidance) instead of a raw tool list. Trust and budget are inherited.</p>\n<h3>COT Maintain Workers → Profile(&quot;worker&quot;)</h3>\n<pre><code>Before: maintain_runner.ts with task claiming, model resolution, pipeline execution\nAfter:  Queue trigger → Session(profile=&quot;worker&quot;, capabilities=task.capabilities)\n</code></pre>\n<p>The maintain runner&#39;s lifecycle management (claiming, locking, heartbeats, retries) wraps around a standard session. The session itself is identical to any other.</p>\n<h3>Pi Child Agents → Session.spawn(contract)</h3>\n<pre><code>Before: agent_spawn with pi process launch, DB lifecycle, message passing\nAfter:  session.spawn({ capabilities: [...], prompt: &quot;...&quot;, lifecycle: &quot;multi-turn&quot; })\n</code></pre>\n<p>The agents-runtime&#39;s DB-backed lifecycle and message passing become the implementation of <code>session.spawn</code> for the multi-turn case. Lightweight spawns (one-shot) don&#39;t need DB tracking.</p>\n<h3>Codex → Session.spawn(contract) with external provider</h3>\n<pre><code>Before: codex_start/codex_turn/codex_stop with 14 extension tools\nAfter:  session.spawn({ provider: &quot;openai/codex&quot;, capabilities: [&quot;code-editing&quot;], sandbox: true })\n</code></pre>\n<p>Codex becomes just another session with a different model provider and sandbox constraints. The context management rules (the codex skill&#39;s hard-won empirical data) become part of the Codex capability&#39;s guidance.</p>\n<hr>\n<h2>The Capability Registry (Concrete Design)</h2>\n<p>This is the most important new abstraction. It replaces extensions + skills + COT tool groups.</p>\n<h3>Structure</h3>\n<pre><code>capabilities/\n  core/                     # Always loaded\n    index.ts                # bash, read, write, edit\n    guidance.md             # &quot;Use read with mode:peek for large files...&quot;\n    \n  memory/                   # Always loaded\n    index.ts                # memory_search, memory_read, memory_write\n    guidance.md             # &quot;Search before writing. Concrete examples...&quot;\n    \n  email/                    # Loaded when needed\n    index.ts                # email_read, email_search, email_list_threads\n    guidance.md             # &quot;Use snippets, not full bodies. Two-pass pattern...&quot;\n    pi-adapter.ts           # Gmail API implementation (for Pi)\n    cot-adapter.ts          # DB query implementation (for COT)\n    \n  whatsapp/\n    index.ts\n    guidance.md\n    pi-adapter.ts           # wa CLI wrapper\n    cot-adapter.ts          # DB query wrapper\n    \n  orchestration/            # COT-specific\n    index.ts                # scratchpad, priorities, telegram, queue\n    guidance.md\n    \n  code-editing/             # Pi-specific\n    index.ts                # enhanced read modes, search, find\n    guidance.md             # workspace conventions, file patterns\n    \n  research/\n    index.ts                # web_search, web_fetch\n    guidance.md             # &quot;Verify docs before relying...&quot;\n    \n  agents/\n    index.ts                # session.spawn\n    guidance.md             # delegation patterns, when to parallelize\n</code></pre>\n<h3>Key Design Decision: Adapter Pattern for Data Access</h3>\n<p>The biggest duplication today is data access. Pi reads email via Gmail API (google extension). COT reads email via PostgreSQL queries (tool-builder). Same data, different access patterns.</p>\n<p>The capability system uses adapters:</p>\n<pre><code class=\"language-typescript\">interface EmailCapability extends Capability {\n  tools: [\n    { name: &quot;email_read&quot;, execute: (params) =&gt; adapter.readThread(params) },\n    { name: &quot;email_search&quot;, execute: (params) =&gt; adapter.search(params) },\n    { name: &quot;email_list&quot;, execute: (params) =&gt; adapter.list(params) },\n  ];\n}\n\n// Pi context: use Gmail API directly\nclass GmailAdapter implements EmailAdapter {\n  async readThread(params) { /* Gmail API call */ }\n}\n\n// COT context: use synced DB (faster, no API quota)\nclass CotEmailAdapter implements EmailAdapter {\n  async readThread(params) { /* SELECT FROM cot.emails */ }\n}\n</code></pre>\n<p>The tools and guidance are identical. Only the data access layer changes. This eliminates the entire duplication problem.</p>\n<hr>\n<h2>Context Budget: The Real Innovation</h2>\n<p>The architecture doc identified <code>read</code> as the 52.2% context killer. Smart-read was a good patch. But the real fix is making the model budget-aware.</p>\n<h3>How It Works</h3>\n<pre><code class=\"language-typescript\">class ContextBudget {\n  private total: number;\n  private consumed: number = 0;\n  \n  addToolResult(result: string): string {\n    const cost = estimateTokens(result);\n    this.consumed += cost;\n    \n    // Append budget report to every tool result\n    const report = this.formatReport();\n    return result + &quot;\\n\\n&quot; + report;\n  }\n  \n  formatReport(): string {\n    const pct = Math.round(this.consumed / this.total * 100);\n    const remaining = this.total - this.consumed;\n    \n    if (pct &lt; 50) return `[Budget: ${pct}% used | ${remaining} tokens remaining]`;\n    if (pct &lt; 70) return `[⚠️ Budget: ${pct}% used | Consider using peek/grep modes]`;\n    if (pct &lt; 85) return `[🔴 Budget: ${pct}% used | Delegate large reads to child sessions]`;\n    return `[🚨 Budget: ${pct}% used | Compact or finish soon]`;\n  }\n  \n  suggestReadMode(fileSize: number): ReadMode {\n    const costEstimate = fileSize / 4; // rough token estimate\n    const remainingBudget = this.total - this.consumed;\n    \n    if (costEstimate &lt; remainingBudget * 0.05) return &quot;full&quot;;   // &lt;5% of remaining\n    if (costEstimate &lt; remainingBudget * 0.15) return &quot;head&quot;;   // &lt;15% of remaining  \n    return &quot;peek&quot;;                                                // large relative to budget\n  }\n}\n</code></pre>\n<p>The model doesn&#39;t need instructions about when to use peek mode. It sees:</p>\n<pre><code>[🔴 Budget: 73% used | Delegate large reads to child sessions]\n</code></pre>\n<p>And it reasons about it naturally. This is cheaper and more reliable than 500 tokens of guidance in AGENTS.md.</p>\n<h3>Budget-Aware Read Tool</h3>\n<pre><code class=\"language-typescript\">const readTool = {\n  name: &quot;read&quot;,\n  execute: async (params, { budget }) =&gt; {\n    const stats = await fs.stat(params.path);\n    const suggestedMode = budget.suggestReadMode(stats.size);\n    \n    // If user specified full but budget suggests otherwise, warn\n    if (params.mode === &quot;full&quot; &amp;&amp; suggestedMode !== &quot;full&quot;) {\n      // Still honor the request, but include warning\n      const content = await readFull(params.path);\n      return budget.addToolResult(\n        content + `\\n\\n[Note: This file consumed ~${estimateTokens(content)} tokens. ` +\n        `Suggested mode was &#39;${suggestedMode}&#39; given current budget.]`\n      );\n    }\n    \n    const mode = params.mode ?? suggestedMode;\n    const content = await readWithMode(params.path, mode, params);\n    return budget.addToolResult(content);\n  }\n};\n</code></pre>\n<hr>\n<h2>Guidance Hierarchy (Replacing AGENTS.md + Skills)</h2>\n<p>The current guidance stack:</p>\n<pre><code>AGENTS.md (10K tokens, always loaded)\n  → Extension tool descriptions (2K tokens, always loaded)\n    → Skills (2-7K tokens, loaded on demand)\n      → Memory (retrieved on demand)\n</code></pre>\n<p>The proposed stack:</p>\n<pre><code>Profile guidance (1-2K tokens, always loaded)\n  → Core capability guidance (500 tokens, always loaded)\n    → Domain capability guidance (loaded with capability)\n      → Memory (retrieved on demand)\n        → Budget signals (appended to tool results)\n</code></pre>\n<h3>What Goes Where</h3>\n<p><strong>Profile guidance (always loaded, &lt;2K tokens):</strong></p>\n<ul>\n<li>Identity (who am I, what&#39;s my role)</li>\n<li>Core behavior (be direct, verify before asserting)</li>\n<li>Trust boundaries (what needs approval)</li>\n<li>Budget awareness (how to read budget signals)</li>\n</ul>\n<p><strong>Core capability guidance (always loaded, &lt;500 tokens):</strong></p>\n<ul>\n<li>Read: progressive disclosure modes exist, budget suggests the right one</li>\n<li>Memory: search before writing, concrete examples are better than abstract rules</li>\n</ul>\n<p><strong>Domain capability guidance (loaded with capability, 500-2K each):</strong></p>\n<ul>\n<li>Email: use snippets not bodies, two-pass pattern, triage integration</li>\n<li>WhatsApp: message format, chat lookup patterns</li>\n<li>Research: verify docs, temporal markers, cross-reference</li>\n<li>Code editing: workspace conventions, file naming, version management</li>\n</ul>\n<p><strong>NOT in guidance anymore (moved to memory or eliminated):</strong></p>\n<ul>\n<li>Environment details (detect at runtime, or query memory when needed)</li>\n<li>Codex delegation rules (loaded only when using Codex capability)</li>\n<li>Cross-machine coordination (loaded only when SSH is relevant)</li>\n<li>Detailed tool routing trees (the model sees only relevant tools)</li>\n</ul>\n<hr>\n<h2>What About Delegation?</h2>\n<p>The current system has three delegation mechanisms. The proposal has one: <code>session.spawn(contract)</code>.</p>\n<h3>The Spawn Contract</h3>\n<pre><code class=\"language-typescript\">interface SpawnContract {\n  // What it gets\n  capabilities: string[];        // which capabilities to load\n  prompt: string;                // the task\n  \n  // How it runs\n  model?: string;                // default: inherited or tier-appropriate\n  maxTurns?: number;             // default: 5\n  trust?: TrustLevel;            // default: inherited\n  budget?: BudgetAllocation;     // default: split from parent\n  \n  // How it returns\n  maxOutput?: number;            // truncate result for parent&#39;s context\n  structured?: boolean;          // expect JSON output\n  \n  // Lifecycle\n  sync?: boolean;                // wait for result (default: true)\n  persist?: boolean;             // DB-backed lifecycle (default: false)\n}\n</code></pre>\n<p>This replaces:</p>\n<ul>\n<li><code>spawn_sub_agent({ model, tools, prompt, max_turns, max_output })</code></li>\n<li><code>agent_spawn({ task, tools, model, thinking, timeout, ... })</code></li>\n<li><code>codex_start({ cwd, instructions, model, sandbox })</code></li>\n<li><code>run_workflow(name, context)</code></li>\n<li><code>maintain_write({ task_type, tools, prompt, model, ... })</code></li>\n</ul>\n<p>All five become different configurations of <code>session.spawn()</code>:</p>\n<pre><code class=\"language-typescript\">// COT sub-agent: lightweight, synchronous, scoped\nsession.spawn({\n  capabilities: [&quot;email&quot;],\n  prompt: &quot;Summarize unread threads&quot;,\n  model: &quot;sonnet&quot;,\n  maxTurns: 3,\n  sync: true\n});\n\n// Pi child agent: multi-turn, persistent, full capabilities\nsession.spawn({\n  capabilities: [&quot;core&quot;, &quot;search&quot;, &quot;web&quot;],\n  prompt: &quot;Research X and write a report&quot;,\n  maxTurns: 50,\n  persist: true,  // DB-backed lifecycle\n  sync: false     // parent continues\n});\n\n// Codex delegation: sandboxed, external provider\nsession.spawn({\n  capabilities: [&quot;code-editing&quot;],\n  prompt: &quot;Create these 5 files...&quot;,\n  model: &quot;openai/codex&quot;,\n  trust: &quot;sandboxed&quot;,\n  maxTurns: 20\n});\n\n// Background worker: queued, one-shot, isolated\nsession.spawn({\n  capabilities: [&quot;research&quot;, &quot;memory&quot;],\n  prompt: &quot;Find papers on topic X&quot;,\n  model: &quot;opus&quot;,\n  sync: false,\n  persist: true,  // survives parent session\n  scheduled: &quot;next-maintain-cycle&quot;\n});\n</code></pre>\n<h3>Implementation Strategy</h3>\n<p>Under the hood, <code>spawn</code> routes to the right implementation:</p>\n<ul>\n<li><code>sync: true, persist: false</code> → in-process session (current spawn_sub_agent approach)</li>\n<li><code>sync: false, persist: false</code> → child process (current agents-runtime approach)</li>\n<li><code>sync: false, persist: true</code> → queue-backed (current maintain approach)</li>\n<li><code>model: &quot;openai/codex&quot;</code> → Codex bridge (current codex extension approach)</li>\n</ul>\n<p>The implementations exist. They just need a unified interface.</p>\n<hr>\n<h2>What&#39;s Actually New (vs. What the Architecture Doc Proposed)</h2>\n<p>The architecture doc proposed 7 innovations. This proposal agrees with some, diverges on others:</p>\n<table>\n<thead>\n<tr>\n<th>Architecture Doc</th>\n<th>This Proposal</th>\n<th>Difference</th>\n</tr>\n</thead>\n<tbody><tr>\n<td>Progressive read modes</td>\n<td>✅ Keep (smart-read exists)</td>\n<td>+ Budget-aware auto-selection</td>\n</tr>\n<tr>\n<td>Token-aware tool loading</td>\n<td>✅ Capability system</td>\n<td>Capabilities = tools + guidance, not just tools</td>\n</tr>\n<tr>\n<td>Context % compaction</td>\n<td>✅ Budget-driven</td>\n<td>+ Model sees its own budget in real-time</td>\n</tr>\n<tr>\n<td>Skill triggers</td>\n<td>❌ Replace with capabilities</td>\n<td>Capabilities auto-load when session contract specifies domain</td>\n</tr>\n<tr>\n<td>Memory auto-proposal</td>\n<td>✅ Keep as post-session hook</td>\n<td>No change needed</td>\n</tr>\n<tr>\n<td>Trust gradient</td>\n<td>✅ Profile system</td>\n<td>Profiles are richer than just trust</td>\n</tr>\n<tr>\n<td>Read tool analytics</td>\n<td>✅ Budget tracking provides this</td>\n<td>Natural telemetry from budget system</td>\n</tr>\n</tbody></table>\n<h3>What&#39;s Genuinely New Here</h3>\n<ol>\n<li><p><strong>Capabilities as tools + guidance (not separate).</strong> This is the biggest architectural change. Currently tools live in extensions and guidance lives in skills. They must always be loaded together, never separated.</p>\n</li>\n<li><p><strong>Budget as a runtime concept the model can see.</strong> Not just token counting for compaction. The model receives budget signals in every tool result and can reason about its own resource constraints.</p>\n</li>\n<li><p><strong>Adapter pattern for data access.</strong> Same capability interface, different backends (API vs DB). Eliminates the entire Pi-vs-COT tool duplication.</p>\n</li>\n<li><p><strong>Unified spawn contract.</strong> One interface for all delegation patterns. The implementation varies, but the model only sees one tool.</p>\n</li>\n<li><p><strong>Profile-driven session configuration.</strong> Not two separate systems with different code, but one system with different profiles.</p>\n</li>\n</ol>\n<hr>\n<h2>Migration Path</h2>\n<p>This isn&#39;t a rewrite. It&#39;s a convergence. Each step delivers value independently.</p>\n<h3>Phase 1: Capability Extraction (Week 1-2)</h3>\n<p>Extract the first capability from the overlap zone: <strong>memory</strong>.</p>\n<pre><code>capabilities/memory/\n  tools.ts      → memory_search, memory_read, memory_write (from agent-memory extension)\n  guidance.md   → extracted from memory-architect skill + AGENTS.md memory section\n  adapter.ts    → PostgreSQL (shared by Pi and COT)\n</code></pre>\n<p>Both Pi and COT load the same capability. Pi via extension wrapper, COT via tool registration. Same tools, same guidance, one implementation.</p>\n<p>Then: email, whatsapp, search. Each extraction eliminates one duplication.</p>\n<h3>Phase 2: Budget System (Week 2-3)</h3>\n<p>Add <code>ContextBudget</code> to the session. Doesn&#39;t require any architectural change — it&#39;s a wrapper around token counting that appends budget reports to tool results.</p>\n<p>Measurable impact: reduced context blow from <code>read</code> tool (the 52.2% killer).</p>\n<h3>Phase 3: Profile System (Week 3-4)</h3>\n<p>Create profiles for <code>interactive</code> and <code>orchestrator</code>. Extract guidance from AGENTS.md and COT&#39;s system prompt into profiles + capabilities.</p>\n<p>AGENTS.md shrinks from ~10K tokens to ~2K. COT&#39;s prompt shrinks similarly. The rest moves into capability guidance loaded on demand.</p>\n<h3>Phase 4: Unified Spawn (Week 4-5)</h3>\n<p>Create the <code>spawn</code> contract interface. Implement it as a thin adapter over existing mechanisms:</p>\n<ul>\n<li>In-process → current <code>spawn_sub_agent</code> code</li>\n<li>Child process → current <code>agents-runtime</code> code</li>\n<li>Queue-backed → current <code>maintain_write</code> code</li>\n<li>Codex → current <code>codex_start/codex_turn</code> code</li>\n</ul>\n<p>The model sees one tool. The implementation dispatches to the right backend.</p>\n<h3>Phase 5: Adapter Pattern (Week 5-6)</h3>\n<p>For capabilities with dual implementations (email, whatsapp, calendar), create the adapter interface. Pi uses API adapters, COT uses DB adapters. Same tools, same guidance, different data access.</p>\n<h3>Rollback</h3>\n<p>Every phase is independently reversible:</p>\n<ul>\n<li>Phase 1: Keep old extension/tool alongside capability</li>\n<li>Phase 2: Budget appending can be toggled off</li>\n<li>Phase 3: Profiles are additive (old prompts still work)</li>\n<li>Phase 4: Spawn wrapper delegates to existing code</li>\n<li>Phase 5: Adapters are behind the same tool interface</li>\n</ul>\n<hr>\n<h2>What NOT to Change</h2>\n<p>Some things in the current system work well. Don&#39;t touch them.</p>\n<ol>\n<li><strong>PostgreSQL as the shared data store.</strong> Works. Don&#39;t add anything.</li>\n<li><strong>Memory search/read/write pattern.</strong> 20% adoption. The API is right.</li>\n<li><strong>COT&#39;s briefing builder.</strong> 14 parallel queries → structured text. Good engineering.</li>\n<li><strong>Smart-read&#39;s progressive modes.</strong> Keep all of them. Add budget awareness on top.</li>\n<li><strong>COT&#39;s advisory lock + heartbeat + zombie cleanup.</strong> Production-grade lifecycle management.</li>\n<li><strong>The maintain queue FSM.</strong> DB-enforced state machine. Don&#39;t reinvent.</li>\n<li><strong>Process event logging.</strong> Structured telemetry. Keep exactly as-is.</li>\n</ol>\n<hr>\n<h2>What This Enables (That&#39;s Currently Impossible)</h2>\n<ol>\n<li><strong>Cross-system capability sharing.</strong> Build a new data source once, both Pi and COT use it.</li>\n<li><strong>Dynamic capability loading in COT.</strong> Currently COT loads all tools upfront. With capabilities, it loads what the briefing indicates it needs.</li>\n<li><strong>Budget-aware model behavior.</strong> The model adapts its read strategy to remaining context without explicit instructions.</li>\n<li><strong>Unified delegation.</strong> &quot;Spawn a worker&quot; means the same thing whether called from Pi or COT.</li>\n<li><strong>Incremental system prompt.</strong> Instead of 10K tokens always, 2K base + capabilities loaded on demand. Interactive sessions that only do file editing never pay for email/whatsapp/research guidance.</li>\n<li><strong>Profile switching.</strong> Same runtime can serve interactive and autonomous modes. Test autonomous behavior interactively. Debug COT patterns in Pi.</li>\n</ol>\n<hr>\n<h2>The Honest Assessment</h2>\n<h3>This proposal is right about:</h3>\n<ul>\n<li>Capability = tools + guidance (the separation is the root cause of duplication)</li>\n<li>Budget as a first-class concept (the model should know its own resource constraints)</li>\n<li>Adapter pattern for data access (eliminates the Pi/COT tool duplication)</li>\n<li>Profiles over separate systems (same mechanism, different configuration)</li>\n</ul>\n<h3>This proposal might be wrong about:</h3>\n<ul>\n<li><strong>Unified spawn may add complexity without value.</strong> The four delegation mechanisms serve genuinely different needs. Wrapping them in one interface could obscure important differences (sync vs async, isolated vs shared state, sandboxed vs trusted).</li>\n<li><strong>Capability auto-loading might be premature.</strong> The current explicit skill loading works. Auto-detection could be unreliable or load unnecessary capabilities.</li>\n<li><strong>Budget signals might be noise.</strong> If appended to every tool result, the model might learn to ignore them. The signal-to-noise ratio matters.</li>\n<li><strong>The migration might not converge.</strong> Incremental convergence sounds clean but could result in a system that&#39;s neither old nor new — just two systems with a compatibility layer.</li>\n</ul>\n<h3>The risk:</h3>\n<p>The biggest risk is building the compatibility layer and never completing the convergence. You end up with three systems instead of two: Pi, COT, and &quot;unified capabilities&quot; that neither fully uses. The migration phases must each deliver standalone value, or stop after phase 2 (budget system).</p>\n",
  "body_markdown": "I read through the entire stack: Pi's extensions/skills/system-prompt, COT's runner/tools/briefing/prompt/maintain-runner, the architecture analysis document, and every major extension. Here's what I found.\n\n## The Core Problem: Two Systems, One Mechanism\n\nPi and COT both do the same thing at their core:\n\n```\nloop:\n  LLM generates → tools execute → context accumulates → repeat\n```\n\nBoth use `createAgentSession` from the Pi SDK. Both call `session.prompt()`. Both have tool registries, context management, error handling. But they share **zero implementation** above the SDK level:\n\n| Concept | Pi Implementation | COT Implementation |\n|---------|-------------------|---------------------|\n| Tool registry | Extensions (loaded at startup, permanent) | `registry.ts` (factory pattern, on-demand) |\n| bash/read/write/edit | Pi SDK built-in tools | Separate tool modules (import side-effects) |\n| WhatsApp | `wa/index.ts` extension (CLI wrapper) | `wa-*.ts` tools (DB query wrappers) |\n| Email | `google/index.ts` (API wrapper) | `email-*.ts` tools (DB query wrappers) |\n| Memory | `agent-memory/index.ts` extension | `cot-memory.ts` + `search-agent-memory.ts` |\n| Search | `unified-search/index.ts` extension | `unified-search.ts` tool |\n| Agent delegation | `agents-runtime` (child processes) | `spawn_sub_agent` (in-process sessions) |\n| Context management | Smart-read progressive modes | Sub-agent isolation |\n| Guidance | AGENTS.md + 27 skills | 242-line system prompt + briefing |\n\nEvery capability is implemented twice. When you add a new data source, you build it twice. When you fix a bug, you fix it twice. This isn't DRY — it's two complete agent systems that happen to share a database.\n\n## The Five Deeper Problems\n\n**1. The guidance stack is inverted.**\n\nPi loads ~10K tokens of system prompt (AGENTS.md) on every turn. It includes environment details, Codex delegation rules, tool routing decision trees, workspace conventions — most irrelevant to most turns. Then skills add another 2-7K tokens each when loaded. The model processes all of this before doing anything.\n\nThe architecture doc proved: 52% of sessions are <5K chars with 1.6 tools. The system is optimized for the 0.5% complex case while every session pays the cost.\n\n**2. Tool loading is all-or-nothing.**\n\nPi loads 51+ tools permanently. Each tool definition costs tokens in the cached prefix — and more importantly, costs model attention on every turn (the model must consider all tools before choosing). The architecture doc showed 4 tools account for 70% of usage. The other 47 tools are loaded for every session but used rarely.\n\nCOT loads all 65+ tools too, but organized into groups. Only `COT_PROCESS_TOOLS` + `ORCHESTRATOR_READ_TOOLS` go to the main session. Sub-agents get scoped subsets. This is better but still manual and static.\n\n**3. Context has no economic model.**\n\nNeither system gives the model information about its own context budget. The `read` tool consumes 52.2% of all tool result bytes. Smart-read added progressive disclosure modes (peek/head/grep/structure), but the model doesn't know *why* it should use them. It doesn't know how much budget remains, what it's already consumed, or what the cost of a `mode: full` read will be.\n\n**4. Agent composition is three different systems.**\n\n- Pi's `agent_spawn` creates child pi processes (full extension loading, DB-backed lifecycle, message passing)\n- COT's `spawn_sub_agent` creates lightweight in-process sessions (shared tool deps, no DB, synchronous)\n- Pi's Codex extension creates OpenAI Codex sessions (external API, sandbox, context management)\n\nThese can't interoperate. A Pi child agent can't use COT's database tools. A COT sub-agent can't use Pi's browser extension. Codex can't do either. They're three islands.\n\n**5. The ESA pattern is ceremony, not architecture.**\n\nCOT's system prompt mandates Explore→Synthesize→Act every run. This makes sense when there's lots of new data to process. But when nothing's changed? The model still goes through the motions — spawning empty explorers, synthesizing nothing, writing the same scratchpad. The pattern is burned into the prompt, not emergent from the situation.\n\n---\n\n## The Proposal: Sessions All the Way Down\n\n### Core Insight\n\nEvery \"agent\" in the current system is the same thing: **an LLM session with tools, a context budget, and a lifecycle policy.** The differences are configuration, not architecture.\n\n| Current Thing | What It Actually Is |\n|---------------|---------------------|\n| Pi interactive session | Session(trigger=human, tools=universal, trust=high, lifecycle=multi-turn) |\n| COT process run | Session(trigger=timer, tools=orchestrator, trust=medium, lifecycle=stateful-one-shot) |\n| COT sub-agent | Session(trigger=parent, tools=scoped, trust=inherited, lifecycle=one-shot) |\n| COT maintain worker | Session(trigger=queue, tools=task-specific, trust=low, lifecycle=one-shot) |\n| Pi child agent | Session(trigger=parent, tools=inherited, trust=inherited, lifecycle=multi-turn) |\n| Codex delegation | Session(trigger=parent, tools=sandboxed, trust=sandboxed, lifecycle=multi-turn) |\n\nThey should all be the same runtime with different configuration.\n\n### Five Primitives\n\nThe entire system reduces to five concepts:\n\n#### 1. Session (the core loop)\n\n```typescript\ninterface Session {\n  id: string;\n  model: Model;\n  tools: Tool[];\n  budget: ContextBudget;\n  lifecycle: Lifecycle;\n  trust: TrustLevel;\n  state: SessionState;\n\n  prompt(input: string): AsyncIterable<Event>;\n  spawn(contract: SessionContract): Session;\n  dispose(): void;\n}\n```\n\nOne loop. One implementation. Every \"agent\" type uses it.\n\nThe critical difference from what exists today: **`budget` is a first-class concept**, not an afterthought. The session knows how much context it has consumed, what percentage remains, and can make intelligent decisions (or expose this to the model).\n\n#### 2. Capability (tools + guidance as a unit)\n\n```typescript\ninterface Capability {\n  name: string;                    // e.g., \"email\", \"whatsapp\", \"code-editing\"\n  tools: ToolDefinition[];         // the tools this capability provides\n  guidance: string;                // usage guidance injected with the tools\n  dependencies?: string[];         // other capabilities this requires\n  cost: number;                    // token cost of loading this capability\n}\n```\n\nThis replaces three separate concepts:\n- Pi extensions (tool providers)\n- Pi skills (usage guidance)\n- COT tool groups (named tool sets)\n\nA capability is **tools + the knowledge to use them correctly**, loaded as one unit. When you load \"email,\" you get `email_read`, `email_search`, `email_list_threads` AND the guidance about using snippets instead of full bodies, two-pass patterns, triage integration.\n\nThis kills the current problem where tools exist in extensions but their usage guidance lives in skills. They should never be separated.\n\n**Capability tiers:**\n\n| Tier | Loaded When | Examples |\n|------|-------------|----------|\n| **Core** | Always | bash, read, write, edit, memory |\n| **Domain** | When session contract includes them | email, whatsapp, calendar, web, search |\n| **Specialist** | When explicitly requested | browser, codex, agents, garmin |\n| **Ephemeral** | Created for specific contexts | database query wrappers, workflow-specific tools |\n\n#### 3. Profile (session configuration template)\n\n```typescript\ninterface Profile {\n  name: string;\n  capabilities: string[];          // which capabilities to load\n  trust: TrustLevel;               // approval requirements\n  lifecycle: LifecycleConfig;      // how the session starts, runs, ends\n  budget: BudgetConfig;            // context limits, compaction strategy\n  guidance: string;                // profile-specific system prompt\n}\n```\n\nProfiles replace the current fragmented configuration:\n- Pi's AGENTS.md → \"interactive\" profile guidance\n- COT's system prompt → \"orchestrator\" profile guidance\n- COT's maintain worker prompt → \"worker\" profile guidance\n\n```typescript\nconst PROFILES = {\n  interactive: {\n    capabilities: [\"core\", \"memory\", \"search\"],\n    // other capabilities loaded on demand\n    trust: \"high\",\n    lifecycle: { type: \"multi-turn\", humanInLoop: true },\n    budget: { compactAt: 0.7, defaultReadMode: \"peek\" },\n    guidance: \"...\" // minimal — human is watching\n  },\n  \n  orchestrator: {\n    capabilities: [\"core\", \"memory\", \"delegation\", \"orchestrator-tools\"],\n    trust: \"medium\",\n    lifecycle: { type: \"stateful-one-shot\", stateStore: \"scratchpad\" },\n    budget: { compactAt: 0.6, defaultReadMode: \"structure\" },\n    guidance: \"...\" // ESA-like but advisory, not mandatory\n  },\n  \n  worker: {\n    capabilities: [\"core\"],\n    // additional capabilities specified per-task\n    trust: \"low\",\n    lifecycle: { type: \"one-shot\", maxTurns: 10 },\n    budget: { compactAt: 0.5 },\n    guidance: \"Execute the task. Return structured results. No exploration.\"\n  },\n  \n  child: {\n    capabilities: [], // inherited from parent + task-specific\n    trust: \"inherited\",\n    lifecycle: { type: \"scoped\", maxTurns: 5 },\n    budget: { inherit: \"parent-remaining\" },\n    guidance: \"\" // set by parent\n  }\n};\n```\n\n#### 4. State (persistence across sessions)\n\n```typescript\ninterface StateStore {\n  // Ephemeral (within session)\n  context: Message[];              // conversation history\n  \n  // Persistent (across sessions)  \n  memory: MemoryStore;             // long-term knowledge (current system)\n  scratchpad: ScratchpadStore;     // inter-run state (COT's current approach)\n  queue: QueueStore;               // background task queue (maintain)\n}\n```\n\nThe current memory system works. Keep it. The scratchpad pattern (inter-run state for autonomous sessions) works. Keep it. The maintain queue works. Keep it.\n\nWhat changes: these are all accessible through the same state interface, not through separate tool implementations per system.\n\n#### 5. Budget (context economics)\n\n```typescript\ninterface ContextBudget {\n  total: number;                   // model's context window\n  used: number;                    // current consumption\n  remaining: number;               // total - used\n  percentUsed: number;             // 0-100\n  \n  // Exposed to the model via tool results\n  report(): BudgetReport;\n  \n  // Automatic policies\n  compactAt: number;               // percentage threshold\n  readDefault: ReadMode;           // default mode for read tool\n  \n  // Adaptive behavior\n  suggestReadMode(fileSize: number): ReadMode;\n  shouldDelegate(estimatedCost: number): boolean;\n}\n```\n\nThis is the genuinely new thing. Neither system has it today.\n\nEvery tool result includes a budget footer:\n```\n[Context: 47% used | 106K remaining | Read mode: peek recommended]\n```\n\nThe model can then make informed decisions without explicit instructions. It doesn't need 500 tokens of AGENTS.md telling it when to use peek mode — it can see the budget and reason about it.\n\n---\n\n## How Current Systems Map to the New Model\n\n### Pi Interactive → Profile(\"interactive\")\n\n```\nBefore: AGENTS.md (10K tokens) + 51 permanent tools + 27 skills\nAfter:  Profile guidance (2K tokens) + 5 core tools + capabilities loaded on demand\n```\n\nAGENTS.md gets decomposed:\n- Environment info → loaded only when relevant (SSH, cross-machine tasks)\n- Codex rules → part of the \"codex\" capability, loaded only when delegating\n- Tool routing → eliminated (the model sees only tools it needs)\n- Workspace conventions → part of the \"code-editing\" capability\n- Memory instructions → part of the \"memory\" capability (already loaded as core)\n\nSkills get merged into capabilities:\n- `research` skill + web_search/web_fetch tools → \"research\" capability\n- `whatsapp` skill + wa tools → \"whatsapp\" capability\n- `github` skill + git tools → \"github\" capability\n\nThe model starts with 5 tools and ~2K guidance. It asks for more when it needs them (or the system auto-loads based on the conversation).\n\n### COT Process → Profile(\"orchestrator\")\n\n```\nBefore: 242-line system prompt + 65+ tools + hardcoded ESA + briefing builder\nAfter:  Profile guidance (advisory ESA) + orchestrator capabilities + briefing as input\n```\n\nThe ESA pattern becomes advisory, not mandatory:\n```\nYou receive a briefing with the current state of Aaron's digital life.\nYour goal: surface what needs attention, maintain state, queue background work.\n\nWhen data is large or complex, delegate reads to child sessions to \npreserve your context for synthesis and action. When the situation is \nsimple, act directly.\n```\n\nSame tools, but loaded as capabilities:\n- Core: bash, read, write, edit, memory\n- Orchestration: scratchpad, priorities, telegram, queue\n- Data reading: email, whatsapp, calendar, garmin (loaded when briefing shows data)\n\nThe briefing builder stays. It's good. But it now also includes budget information.\n\n### COT Sub-agents → Session.spawn(contract)\n\n```\nBefore: spawn_sub_agent with manual tool list and 5 params\nAfter:  session.spawn({ capabilities: [\"email\"], prompt: \"...\", maxTurns: 3 })\n```\n\nSame mechanism, cleaner contract. The sub-agent gets the email capability (tools + guidance) instead of a raw tool list. Trust and budget are inherited.\n\n### COT Maintain Workers → Profile(\"worker\")\n\n```\nBefore: maintain_runner.ts with task claiming, model resolution, pipeline execution\nAfter:  Queue trigger → Session(profile=\"worker\", capabilities=task.capabilities)\n```\n\nThe maintain runner's lifecycle management (claiming, locking, heartbeats, retries) wraps around a standard session. The session itself is identical to any other.\n\n### Pi Child Agents → Session.spawn(contract)\n\n```\nBefore: agent_spawn with pi process launch, DB lifecycle, message passing\nAfter:  session.spawn({ capabilities: [...], prompt: \"...\", lifecycle: \"multi-turn\" })\n```\n\nThe agents-runtime's DB-backed lifecycle and message passing become the implementation of `session.spawn` for the multi-turn case. Lightweight spawns (one-shot) don't need DB tracking.\n\n### Codex → Session.spawn(contract) with external provider\n\n```\nBefore: codex_start/codex_turn/codex_stop with 14 extension tools\nAfter:  session.spawn({ provider: \"openai/codex\", capabilities: [\"code-editing\"], sandbox: true })\n```\n\nCodex becomes just another session with a different model provider and sandbox constraints. The context management rules (the codex skill's hard-won empirical data) become part of the Codex capability's guidance.\n\n---\n\n## The Capability Registry (Concrete Design)\n\nThis is the most important new abstraction. It replaces extensions + skills + COT tool groups.\n\n### Structure\n\n```\ncapabilities/\n  core/                     # Always loaded\n    index.ts                # bash, read, write, edit\n    guidance.md             # \"Use read with mode:peek for large files...\"\n    \n  memory/                   # Always loaded\n    index.ts                # memory_search, memory_read, memory_write\n    guidance.md             # \"Search before writing. Concrete examples...\"\n    \n  email/                    # Loaded when needed\n    index.ts                # email_read, email_search, email_list_threads\n    guidance.md             # \"Use snippets, not full bodies. Two-pass pattern...\"\n    pi-adapter.ts           # Gmail API implementation (for Pi)\n    cot-adapter.ts          # DB query implementation (for COT)\n    \n  whatsapp/\n    index.ts\n    guidance.md\n    pi-adapter.ts           # wa CLI wrapper\n    cot-adapter.ts          # DB query wrapper\n    \n  orchestration/            # COT-specific\n    index.ts                # scratchpad, priorities, telegram, queue\n    guidance.md\n    \n  code-editing/             # Pi-specific\n    index.ts                # enhanced read modes, search, find\n    guidance.md             # workspace conventions, file patterns\n    \n  research/\n    index.ts                # web_search, web_fetch\n    guidance.md             # \"Verify docs before relying...\"\n    \n  agents/\n    index.ts                # session.spawn\n    guidance.md             # delegation patterns, when to parallelize\n```\n\n### Key Design Decision: Adapter Pattern for Data Access\n\nThe biggest duplication today is data access. Pi reads email via Gmail API (google extension). COT reads email via PostgreSQL queries (tool-builder). Same data, different access patterns.\n\nThe capability system uses adapters:\n\n```typescript\ninterface EmailCapability extends Capability {\n  tools: [\n    { name: \"email_read\", execute: (params) => adapter.readThread(params) },\n    { name: \"email_search\", execute: (params) => adapter.search(params) },\n    { name: \"email_list\", execute: (params) => adapter.list(params) },\n  ];\n}\n\n// Pi context: use Gmail API directly\nclass GmailAdapter implements EmailAdapter {\n  async readThread(params) { /* Gmail API call */ }\n}\n\n// COT context: use synced DB (faster, no API quota)\nclass CotEmailAdapter implements EmailAdapter {\n  async readThread(params) { /* SELECT FROM cot.emails */ }\n}\n```\n\nThe tools and guidance are identical. Only the data access layer changes. This eliminates the entire duplication problem.\n\n---\n\n## Context Budget: The Real Innovation\n\nThe architecture doc identified `read` as the 52.2% context killer. Smart-read was a good patch. But the real fix is making the model budget-aware.\n\n### How It Works\n\n```typescript\nclass ContextBudget {\n  private total: number;\n  private consumed: number = 0;\n  \n  addToolResult(result: string): string {\n    const cost = estimateTokens(result);\n    this.consumed += cost;\n    \n    // Append budget report to every tool result\n    const report = this.formatReport();\n    return result + \"\\n\\n\" + report;\n  }\n  \n  formatReport(): string {\n    const pct = Math.round(this.consumed / this.total * 100);\n    const remaining = this.total - this.consumed;\n    \n    if (pct < 50) return `[Budget: ${pct}% used | ${remaining} tokens remaining]`;\n    if (pct < 70) return `[⚠️ Budget: ${pct}% used | Consider using peek/grep modes]`;\n    if (pct < 85) return `[🔴 Budget: ${pct}% used | Delegate large reads to child sessions]`;\n    return `[🚨 Budget: ${pct}% used | Compact or finish soon]`;\n  }\n  \n  suggestReadMode(fileSize: number): ReadMode {\n    const costEstimate = fileSize / 4; // rough token estimate\n    const remainingBudget = this.total - this.consumed;\n    \n    if (costEstimate < remainingBudget * 0.05) return \"full\";   // <5% of remaining\n    if (costEstimate < remainingBudget * 0.15) return \"head\";   // <15% of remaining  \n    return \"peek\";                                                // large relative to budget\n  }\n}\n```\n\nThe model doesn't need instructions about when to use peek mode. It sees:\n```\n[🔴 Budget: 73% used | Delegate large reads to child sessions]\n```\n\nAnd it reasons about it naturally. This is cheaper and more reliable than 500 tokens of guidance in AGENTS.md.\n\n### Budget-Aware Read Tool\n\n```typescript\nconst readTool = {\n  name: \"read\",\n  execute: async (params, { budget }) => {\n    const stats = await fs.stat(params.path);\n    const suggestedMode = budget.suggestReadMode(stats.size);\n    \n    // If user specified full but budget suggests otherwise, warn\n    if (params.mode === \"full\" && suggestedMode !== \"full\") {\n      // Still honor the request, but include warning\n      const content = await readFull(params.path);\n      return budget.addToolResult(\n        content + `\\n\\n[Note: This file consumed ~${estimateTokens(content)} tokens. ` +\n        `Suggested mode was '${suggestedMode}' given current budget.]`\n      );\n    }\n    \n    const mode = params.mode ?? suggestedMode;\n    const content = await readWithMode(params.path, mode, params);\n    return budget.addToolResult(content);\n  }\n};\n```\n\n---\n\n## Guidance Hierarchy (Replacing AGENTS.md + Skills)\n\nThe current guidance stack:\n```\nAGENTS.md (10K tokens, always loaded)\n  → Extension tool descriptions (2K tokens, always loaded)\n    → Skills (2-7K tokens, loaded on demand)\n      → Memory (retrieved on demand)\n```\n\nThe proposed stack:\n```\nProfile guidance (1-2K tokens, always loaded)\n  → Core capability guidance (500 tokens, always loaded)\n    → Domain capability guidance (loaded with capability)\n      → Memory (retrieved on demand)\n        → Budget signals (appended to tool results)\n```\n\n### What Goes Where\n\n**Profile guidance (always loaded, <2K tokens):**\n- Identity (who am I, what's my role)\n- Core behavior (be direct, verify before asserting)\n- Trust boundaries (what needs approval)\n- Budget awareness (how to read budget signals)\n\n**Core capability guidance (always loaded, <500 tokens):**\n- Read: progressive disclosure modes exist, budget suggests the right one\n- Memory: search before writing, concrete examples are better than abstract rules\n\n**Domain capability guidance (loaded with capability, 500-2K each):**\n- Email: use snippets not bodies, two-pass pattern, triage integration\n- WhatsApp: message format, chat lookup patterns\n- Research: verify docs, temporal markers, cross-reference\n- Code editing: workspace conventions, file naming, version management\n\n**NOT in guidance anymore (moved to memory or eliminated):**\n- Environment details (detect at runtime, or query memory when needed)\n- Codex delegation rules (loaded only when using Codex capability)\n- Cross-machine coordination (loaded only when SSH is relevant)\n- Detailed tool routing trees (the model sees only relevant tools)\n\n---\n\n## What About Delegation?\n\nThe current system has three delegation mechanisms. The proposal has one: `session.spawn(contract)`.\n\n### The Spawn Contract\n\n```typescript\ninterface SpawnContract {\n  // What it gets\n  capabilities: string[];        // which capabilities to load\n  prompt: string;                // the task\n  \n  // How it runs\n  model?: string;                // default: inherited or tier-appropriate\n  maxTurns?: number;             // default: 5\n  trust?: TrustLevel;            // default: inherited\n  budget?: BudgetAllocation;     // default: split from parent\n  \n  // How it returns\n  maxOutput?: number;            // truncate result for parent's context\n  structured?: boolean;          // expect JSON output\n  \n  // Lifecycle\n  sync?: boolean;                // wait for result (default: true)\n  persist?: boolean;             // DB-backed lifecycle (default: false)\n}\n```\n\nThis replaces:\n- `spawn_sub_agent({ model, tools, prompt, max_turns, max_output })`\n- `agent_spawn({ task, tools, model, thinking, timeout, ... })`\n- `codex_start({ cwd, instructions, model, sandbox })`\n- `run_workflow(name, context)`\n- `maintain_write({ task_type, tools, prompt, model, ... })`\n\nAll five become different configurations of `session.spawn()`:\n\n```typescript\n// COT sub-agent: lightweight, synchronous, scoped\nsession.spawn({\n  capabilities: [\"email\"],\n  prompt: \"Summarize unread threads\",\n  model: \"sonnet\",\n  maxTurns: 3,\n  sync: true\n});\n\n// Pi child agent: multi-turn, persistent, full capabilities\nsession.spawn({\n  capabilities: [\"core\", \"search\", \"web\"],\n  prompt: \"Research X and write a report\",\n  maxTurns: 50,\n  persist: true,  // DB-backed lifecycle\n  sync: false     // parent continues\n});\n\n// Codex delegation: sandboxed, external provider\nsession.spawn({\n  capabilities: [\"code-editing\"],\n  prompt: \"Create these 5 files...\",\n  model: \"openai/codex\",\n  trust: \"sandboxed\",\n  maxTurns: 20\n});\n\n// Background worker: queued, one-shot, isolated\nsession.spawn({\n  capabilities: [\"research\", \"memory\"],\n  prompt: \"Find papers on topic X\",\n  model: \"opus\",\n  sync: false,\n  persist: true,  // survives parent session\n  scheduled: \"next-maintain-cycle\"\n});\n```\n\n### Implementation Strategy\n\nUnder the hood, `spawn` routes to the right implementation:\n- `sync: true, persist: false` → in-process session (current spawn_sub_agent approach)\n- `sync: false, persist: false` → child process (current agents-runtime approach)\n- `sync: false, persist: true` → queue-backed (current maintain approach)\n- `model: \"openai/codex\"` → Codex bridge (current codex extension approach)\n\nThe implementations exist. They just need a unified interface.\n\n---\n\n## What's Actually New (vs. What the Architecture Doc Proposed)\n\nThe architecture doc proposed 7 innovations. This proposal agrees with some, diverges on others:\n\n| Architecture Doc | This Proposal | Difference |\n|-----------------|---------------|------------|\n| Progressive read modes | ✅ Keep (smart-read exists) | + Budget-aware auto-selection |\n| Token-aware tool loading | ✅ Capability system | Capabilities = tools + guidance, not just tools |\n| Context % compaction | ✅ Budget-driven | + Model sees its own budget in real-time |\n| Skill triggers | ❌ Replace with capabilities | Capabilities auto-load when session contract specifies domain |\n| Memory auto-proposal | ✅ Keep as post-session hook | No change needed |\n| Trust gradient | ✅ Profile system | Profiles are richer than just trust |\n| Read tool analytics | ✅ Budget tracking provides this | Natural telemetry from budget system |\n\n### What's Genuinely New Here\n\n1. **Capabilities as tools + guidance (not separate).** This is the biggest architectural change. Currently tools live in extensions and guidance lives in skills. They must always be loaded together, never separated.\n\n2. **Budget as a runtime concept the model can see.** Not just token counting for compaction. The model receives budget signals in every tool result and can reason about its own resource constraints.\n\n3. **Adapter pattern for data access.** Same capability interface, different backends (API vs DB). Eliminates the entire Pi-vs-COT tool duplication.\n\n4. **Unified spawn contract.** One interface for all delegation patterns. The implementation varies, but the model only sees one tool.\n\n5. **Profile-driven session configuration.** Not two separate systems with different code, but one system with different profiles.\n\n---\n\n## Migration Path\n\nThis isn't a rewrite. It's a convergence. Each step delivers value independently.\n\n### Phase 1: Capability Extraction (Week 1-2)\n\nExtract the first capability from the overlap zone: **memory**.\n\n```\ncapabilities/memory/\n  tools.ts      → memory_search, memory_read, memory_write (from agent-memory extension)\n  guidance.md   → extracted from memory-architect skill + AGENTS.md memory section\n  adapter.ts    → PostgreSQL (shared by Pi and COT)\n```\n\nBoth Pi and COT load the same capability. Pi via extension wrapper, COT via tool registration. Same tools, same guidance, one implementation.\n\nThen: email, whatsapp, search. Each extraction eliminates one duplication.\n\n### Phase 2: Budget System (Week 2-3)\n\nAdd `ContextBudget` to the session. Doesn't require any architectural change — it's a wrapper around token counting that appends budget reports to tool results.\n\nMeasurable impact: reduced context blow from `read` tool (the 52.2% killer).\n\n### Phase 3: Profile System (Week 3-4)\n\nCreate profiles for `interactive` and `orchestrator`. Extract guidance from AGENTS.md and COT's system prompt into profiles + capabilities.\n\nAGENTS.md shrinks from ~10K tokens to ~2K. COT's prompt shrinks similarly. The rest moves into capability guidance loaded on demand.\n\n### Phase 4: Unified Spawn (Week 4-5)\n\nCreate the `spawn` contract interface. Implement it as a thin adapter over existing mechanisms:\n- In-process → current `spawn_sub_agent` code\n- Child process → current `agents-runtime` code\n- Queue-backed → current `maintain_write` code\n- Codex → current `codex_start/codex_turn` code\n\nThe model sees one tool. The implementation dispatches to the right backend.\n\n### Phase 5: Adapter Pattern (Week 5-6)\n\nFor capabilities with dual implementations (email, whatsapp, calendar), create the adapter interface. Pi uses API adapters, COT uses DB adapters. Same tools, same guidance, different data access.\n\n### Rollback\n\nEvery phase is independently reversible:\n- Phase 1: Keep old extension/tool alongside capability\n- Phase 2: Budget appending can be toggled off\n- Phase 3: Profiles are additive (old prompts still work)\n- Phase 4: Spawn wrapper delegates to existing code\n- Phase 5: Adapters are behind the same tool interface\n\n---\n\n## What NOT to Change\n\nSome things in the current system work well. Don't touch them.\n\n1. **PostgreSQL as the shared data store.** Works. Don't add anything.\n2. **Memory search/read/write pattern.** 20% adoption. The API is right.\n3. **COT's briefing builder.** 14 parallel queries → structured text. Good engineering.\n4. **Smart-read's progressive modes.** Keep all of them. Add budget awareness on top.\n5. **COT's advisory lock + heartbeat + zombie cleanup.** Production-grade lifecycle management.\n6. **The maintain queue FSM.** DB-enforced state machine. Don't reinvent.\n7. **Process event logging.** Structured telemetry. Keep exactly as-is.\n\n---\n\n## What This Enables (That's Currently Impossible)\n\n1. **Cross-system capability sharing.** Build a new data source once, both Pi and COT use it.\n2. **Dynamic capability loading in COT.** Currently COT loads all tools upfront. With capabilities, it loads what the briefing indicates it needs.\n3. **Budget-aware model behavior.** The model adapts its read strategy to remaining context without explicit instructions.\n4. **Unified delegation.** \"Spawn a worker\" means the same thing whether called from Pi or COT.\n5. **Incremental system prompt.** Instead of 10K tokens always, 2K base + capabilities loaded on demand. Interactive sessions that only do file editing never pay for email/whatsapp/research guidance.\n6. **Profile switching.** Same runtime can serve interactive and autonomous modes. Test autonomous behavior interactively. Debug COT patterns in Pi.\n\n---\n\n## The Honest Assessment\n\n### This proposal is right about:\n- Capability = tools + guidance (the separation is the root cause of duplication)\n- Budget as a first-class concept (the model should know its own resource constraints)\n- Adapter pattern for data access (eliminates the Pi/COT tool duplication)\n- Profiles over separate systems (same mechanism, different configuration)\n\n### This proposal might be wrong about:\n- **Unified spawn may add complexity without value.** The four delegation mechanisms serve genuinely different needs. Wrapping them in one interface could obscure important differences (sync vs async, isolated vs shared state, sandboxed vs trusted).\n- **Capability auto-loading might be premature.** The current explicit skill loading works. Auto-detection could be unreliable or load unnecessary capabilities.\n- **Budget signals might be noise.** If appended to every tool result, the model might learn to ignore them. The signal-to-noise ratio matters.\n- **The migration might not converge.** Incremental convergence sounds clean but could result in a system that's neither old nor new — just two systems with a compatibility layer.\n\n### The risk:\nThe biggest risk is building the compatibility layer and never completing the convergence. You end up with three systems instead of two: Pi, COT, and \"unified capabilities\" that neither fully uses. The migration phases must each deliver standalone value, or stop after phase 2 (budget system).",
  "sources": [
    {
      "label": "Legacy public URL",
      "url": "https://05802.github.io/unified-agent-architecture/"
    },
    {
      "label": "Legacy source markdown",
      "url": "https://raw.githubusercontent.com/05802/05802.github.io/master/_posts/2026-02-28-unified-agent-architecture.md"
    }
  ],
  "content_prefix": "entries/press/station-press/2026/02/unified-agent-architecture/",
  "assets_prefix": "entries/press/station-press/2026/02/unified-agent-architecture/assets/",
  "assets_base_url": "https://stations.work/content/entries/press/station-press/2026/02/unified-agent-architecture/assets/",
  "canonical_url": "https://stations.work/press/unified-agent-architecture"
}