Building AI Agents is hard. So I built a 12-step visual guide to make it easy
Hey fellow devs! π
We've all seen the hype around AI Agents (Claude Code, Cursor, OpenClaw, OpenHands, etc.), but when you actually try to build one from scratch, the documentation is scattered and the logic flow between planning, memory, and tool execution can be deeply frustrating.
I spent the last few weeks breaking this down into 12 progressive sessions β from a single while-loop to a fully autonomous multi-agent system. Here's the complete roadmap:
π§© The 12 Sessions
The core pattern is simpler than you think. Every agent starts with this loop:
while True:
response = client.messages.create(messages=messages, tools=tools)
if response.stop_reason != "tool_use":
break
for tool_call in response.content:
result = execute_tool(tool_call.name, tool_call.input)
messages.append(result)
Then you layer complexity on top, one concept at a time:
| # | Session | The Core Idea |
|---|---|---|
| 01 | The Agent Loop | The minimal kernel: a while-loop + one tool |
| 02 | Tools | New tools register into a dispatch map; the loop never changes |
| 03 | TodoWrite | An agent without a plan drifts β list steps first, then execute |
| 04 | Subagents | Isolated messages[] per subtask keeps the main context clean |
| 05 | Skills | Inject knowledge via tool_result on-demand, not upfront in system prompt |
| 06 | Compact | Context will fill up β a 3-layer compression strategy enables infinite sessions |
| 07 | Tasks | File-based task graph with ordering, parallelism, and dependencies |
| 08 | Background Tasks | Run slow operations async; the agent keeps thinking ahead |
| 09 | Agent Teams | When one agent can't finish, delegate to persistent teammates via mailboxes |
| 10 | Team Protocols | One request-response FSM pattern drives all team negotiation |
| 11 | Autonomous Agents | Teammates scan the board and claim tasks themselves β no manual assignment |
| 12 | Worktree + Task Isolation | Each agent works in its own directory; goals and directories bound by task ID |
π What makes this different from other tutorials?
Most guides either stay too simple (basic API calls) or jump straight to LangChain abstractions. This takes a build-from-scratch approach:
- β Full working Python code for every session (not snippets)
- β Interactive simulators β watch the agent loop execute step-by-step
- β Diff view between sessions β see exactly what changed and why
- β Based on real patterns from production systems like Claude Code
The stack is intentionally minimal: Python + Anthropic API + standard library. No framework magic hiding the important parts. Completely free and MIT licensed.
π Full guide: HowToAgent.net
What's the hardest architectural decision you've hit when building agents? For me it was context compression (Session 06) β would love to hear what tripped others up.