Context will fill up; you need a way to make room

What You’ll Learn

  • Why unbounded context growth breaks agents
  • How to implement a three-layer compaction strategy
  • How to preserve decision rationale while discarding verbatim output

The Problem

After 100 tool calls, the messages array holds thousands of tokens of stale bash output. The model runs slower, hits context limits, and loses the thread.

The Solution

Three-layer compaction: summarize long tool outputs, archive old turns as structured notes, and keep the most recent turns verbatim.

Layer 1 (recent):  Last N turns, verbatim       --> model can see
Layer 2 (compressed): Old turns as JSON summaries --> injected as context
Layer 3 (archived): Everything                    --> written to disk

How It Works

  1. Track token usage. When it exceeds a threshold, trigger compaction.

  2. Summarize old tool results: replace 5000 lines of test output with "tests/ passed 42/42".

  3. Inject the compressed context as a system-like message.

def compact(old_messages, new_messages):
    summary = summarize_turns(old_messages)
    return [
        {"role": "user",
         "content": f"[COMPACTED CONTEXT]\n{summary}"},
        *new_messages,
    ]

What Changed From s05

ComponentBefore (s05)After (s06)
ContextGrows unboundedThree-layer compaction
Token trackingNoneThreshold-based trigger
ArchivingNoneStructured summaries

Try It

cd learn-claude-code
python agents/s06_context_compact.py
  1. Read every Python file in this project and tell me what's wrong
  2. Run the test suite 10 times and summarize the results
  3. Explore the entire codebase and create an architectural overview

Key Takeaway

Compaction isn’t deleting history — it’s relocating detail to make room for the agent’s next thought.