Building AI Agents is hard. So I built a 12-step visual g...

Hey fellow devs! 👋

We've all seen the hype around AI Agents (Claude Code, Cursor, OpenClaw, OpenHands, etc.), but when you actually try to build one from scratch, the documentation is scattered and the logic flow between planning, memory, and tool execution can be deeply frustrating.

I spent the last few weeks breaking this down into 12 progressive sessions — from a single while-loop to a fully autonomous multi-agent system. Here's the complete roadmap:

🧩 The 12 Sessions

The core pattern is simpler than you think. Every agent starts with this loop:

while True:
response = client.messages.create(messages=messages, tools=tools)
if response.stop_reason != "tool_use":
break
for tool_call in response.content:
result = execute_tool(tool_call.name, tool_call.input)
messages.append(result)

Then you layer complexity on top, one concept at a time:

#	Session	The Core Idea
01	The Agent Loop	The minimal kernel: a while-loop + one tool
02	Tools	New tools register into a dispatch map; the loop never changes
03	TodoWrite	An agent without a plan drifts — list steps first, then execute
04	Subagents	Isolated `messages[]` per subtask keeps the main context clean
05	Skills	Inject knowledge via `tool_result` on-demand, not upfront in system prompt
06	Compact	Context will fill up — a 3-layer compression strategy enables infinite sessions
07	Tasks	File-based task graph with ordering, parallelism, and dependencies
08	Background Tasks	Run slow operations async; the agent keeps thinking ahead
09	Agent Teams	When one agent can't finish, delegate to persistent teammates via mailboxes
10	Team Protocols	One request-response FSM pattern drives all team negotiation
11	Autonomous Agents	Teammates scan the board and claim tasks themselves — no manual assignment
12	Worktree + Task Isolation	Each agent works in its own directory; goals and directories bound by task ID

🛠 What makes this different from other tutorials?

Most guides either stay too simple (basic API calls) or jump straight to LangChain abstractions. This takes a build-from-scratch approach:

✅ Full working Python code for every session (not snippets)
✅ Interactive simulators — watch the agent loop execute step-by-step
✅ Diff view between sessions — see exactly what changed and why
✅ Based on real patterns from production systems like Claude Code

The stack is intentionally minimal: Python + Anthropic API + standard library. No framework magic hiding the important parts. Completely free and MIT licensed.

👉 Full guide: HowToAgent.net

What's the hardest architectural decision you've hit when building agents? For me it was context compression (Session 06) — would love to hear what tripped others up.