Load knowledge when you need it, not upfront

What You’ll Learn

  • Why loading all knowledge into the system prompt is wasteful
  • How to implement a two-layer skill system: names (cheap) + bodies (on demand)
  • How to discover skills dynamically

The Problem

Putting everything in the system prompt wastes tokens on unused skills. 10 skills at 2000 tokens each = 20,000 tokens, most of which are irrelevant to any given task.

The Solution

System prompt (Layer 1 -- always present):
+--------------------------------------+
| Skills available:                    |
|   - git: Git workflow helpers        |  ~100 tokens/skill
|   - test: Testing best practices     |
+--------------------------------------+

When model calls load_skill("git"):
+--------------------------------------+
| tool_result (Layer 2 -- on demand):  |
| <skill name="git">                   |
|   Full git workflow instructions...  |  ~2000 tokens
| </skill>                             |
+--------------------------------------+

Layer 1: skill names (cheap). Layer 2: full body via tool_result (on demand).

How It Works

  1. Each skill is a directory containing a SKILL.md with YAML frontmatter.
skills/
  pdf/SKILL.md
  code-review/SKILL.md
  1. A SkillLoader scans for SKILL.md files and extracts names and descriptions.

  2. Layer 1 goes into the system prompt. Layer 2 is just another tool handler:

TOOL_HANDLERS = {
    # ...base tools...
    "load_skill": lambda **kw: SKILL_LOADER.get_content(kw["name"]),
}

What Changed From s04

ComponentBefore (s04)After (s05)
Tools5 (base + task)5 (base + load_skill)
System promptStatic string+ skill descriptions
KnowledgeNoneskills/*/SKILL.md files
InjectionNoneTwo-layer (system + result)

Try It

cd learn-claude-code
python agents/s05_skill_loading.py
  1. What skills are available?
  2. Load the agent-builder skill and follow its instructions
  3. I need to do a code review -- load the relevant skill first

Key Takeaway

Discover cheaply, load deeply. Keep skill names in the system prompt and inject the full body via tool_result only when needed.