Context will fill up; you need a way to make room
What You’ll Learn
- Why unbounded context growth breaks agents
- How to implement a three-layer compaction strategy
- How to preserve decision rationale while discarding verbatim output
The Problem
After 100 tool calls, the messages array holds thousands of tokens of stale bash output. The model runs slower, hits context limits, and loses the thread.
The Solution
Three-layer compaction: summarize long tool outputs, archive old turns as structured notes, and keep the most recent turns verbatim.
Layer 1 (recent): Last N turns, verbatim --> model can see
Layer 2 (compressed): Old turns as JSON summaries --> injected as context
Layer 3 (archived): Everything --> written to disk
How It Works
-
Track token usage. When it exceeds a threshold, trigger compaction.
-
Summarize old tool results: replace 5000 lines of test output with
"tests/ passed 42/42". -
Inject the compressed context as a system-like message.
def compact(old_messages, new_messages):
summary = summarize_turns(old_messages)
return [
{"role": "user",
"content": f"[COMPACTED CONTEXT]\n{summary}"},
*new_messages,
]
What Changed From s05
| Component | Before (s05) | After (s06) |
|---|---|---|
| Context | Grows unbounded | Three-layer compaction |
| Token tracking | None | Threshold-based trigger |
| Archiving | None | Structured summaries |
Try It
cd learn-claude-code
python agents/s06_context_compact.py
Read every Python file in this project and tell me what's wrongRun the test suite 10 times and summarize the resultsExplore the entire codebase and create an architectural overview
Key Takeaway
Compaction isn’t deleting history — it’s relocating detail to make room for the agent’s next thought.