I want to walk through my actual Claude Code setup. Not what Claude Code can do in theory, what mine does in practice, what every file is for, and what I’d do differently if I were starting today.
This is a specific setup built on a specific pattern. The pattern was originated by Shubham Saboo (@Saboo_Shubham_ on X), whose Monica agent, a Chief of Staff that manages a team of specialized AI agents, is the conceptual source of what I built. Monica runs on OpenClaw, Saboo’s platform. My version runs on Claude Code. The architecture is his. I want to be clear about that.
The shape of the thing
The repo is called chief_of_staff. At the root: CLAUDE.md (instructions for the main session), AGENTS.md (shared rules every agent follows), MEMORY.md (Jim’s curated long-term memory), HEARTBEAT.md (system health).
Inside .claude/agents/: six subagent definition files. Jim, Dwight, Karen, Kelly, Erin, Rachel, Ross.
Each agent file is a markdown document with a YAML frontmatter block and a system prompt. The frontmatter specifies: name, description, which tools the agent can use, which model it runs on, whether it has MCP server access.
That’s the architecture Paweł Huryn described in April 2026: “Sub-agents (.claude/agents/). Each folder is a self-contained agent. Its own CLAUDE.md.” One sentence that captures the whole pattern. I’d add: and its own tool list, its own memory, its own identity.
The three files every agent has
Each agent gets three supporting files beyond the definition file. This is Saboo’s pattern, and I’ve found reasons to keep all three.
SOUL.md — identity, principles, persona. Who is this agent. What does it care about. How does it talk. For my research agent Dwight, SOUL.md describes someone intense and thorough. For Karen the copywriter, it’s sharp and direct. These files almost never change. When they do, it’s because a performance review revealed that an agent’s identity was producing the wrong outputs.
AGENTS.md — operational instructions. What this agent does, how it does it, what it should never do. This file changes all the time. Every time something goes wrong with an agent’s output, it drifts off-task, it uses a format I didn’t want, it estimates when it should have flagged uncertainty, I update AGENTS.md. The line I added after Karen started speculating on data gaps: “Flag research gaps explicitly rather than estimating. Coordinate with Dwight before finalizing data-dependent claims.” That’s a harness decision born from a specific failure.
memory/MEMORY.md — curated long-term memory. I edit this. The agent reads it. It captures things the agent needs to know about its role, the brand, past decisions, lessons from past tasks. Not everything, just what actually changes future behavior.
The SOUL.md / AGENTS.md split is the piece I’d explain most to someone starting out. SOUL.md is who the agent is. AGENTS.md is how it works. One is identity, the other is procedure. When something goes wrong with output quality, it’s almost always an AGENTS.md issue. When something goes wrong with tone or voice, it might be SOUL.md. Keeping them separate means I can tune one without touching the other.
The daily log and why it exists
Every agent writes a daily log after completing a task. The format is loose: what did I do, what decisions did I make, what would I want to remember next time.
These files go to agents/{name}/memory/YYYY-MM-DD.md. They accumulate. During the weekly performance review, Jim (my chief of staff agent) reads them, grades each agent’s output, and writes up a review. I read the review, provide feedback, and Jim updates the agent’s SOUL.md and AGENTS.md based on my notes.
That loop — task → daily log → performance review → updated instructions — is what Saboo means when he talks about managing agents like employees. His explicit advice: don’t hand them the keys to everything on day one. Give them a workspace. Give them scoped access. Review their work. Update their instructions. Saboo put it this way: “Treat your agents like new hires, not tools. Give them just enough context. Then get out of their way.”
The daily log is how I keep the review from being based on my memory of what happened. It’s the agent’s own record.
Slash commands
Slash commands live in .claude/commands/. Each one is a markdown file. When I type /review in a Claude Code session, it reads review.md and runs the performance review workflow.
I have three: /review (weekly review for all agents or a named one), /status (dashboard of current priorities and system health), /feedback {agent} {notes} (apply feedback to a specific agent’s files).
The review one is the most important. Before it existed, I would periodically look at agent output, notice something was off, and maybe update the instructions. No real cadence. When /review existed as a slash command — meaning there was a defined process I could trigger in one step — the review actually happened weekly. The command didn’t create discipline. It lowered the friction enough that discipline could form.
MCP
Model Context Protocol is how Claude Code connects to external systems. In my setup, the main integrations are Notion (publishing pipeline, completed drafts go here before they’re considered done), Telegram (mobile access, I can dispatch tasks from my phone and get updates in a channel), and Gmail/Calendar for context that helps agents understand my schedule and priorities.
Each agent’s frontmatter can specify which MCP servers it has access to. Karen has Notion access because she needs to publish drafts. Dwight has web search access because research requires it. Not every agent needs every integration. The scoping is intentional. Post 7 in this series goes deeper into how MCP changed the way I think about workflows.
What I’d do differently
Two things.
First: I’d write the task boundary paragraphs earlier. For each agent: this agent handles X. It succeeds when Y. It fails when Z. It should never touch W. I didn’t write those until I had specific failures to respond to. I could have written them before the failures if I’d thought harder about scope upfront.
Second: I’d start with one agent, not six. Saboo’s warning about this is explicit: “Do not try to build six agents on day one.” I more or less followed it, I built Jim first, then Dwight, then the rest. But even that felt fast. The agents that work best are the ones I’ve run through more review cycles. The ones I haven’t stressed-tested enough still surprise me.
The file that explains everything
If you want to understand how the system works, read CLAUDE.md at the repo root. It’s the instruction file for the Jim session, the main orchestration layer. It describes my team, my principles for dispatching work, how I conduct reviews, how I handle feedback. It’s about 180 lines.
HumanLayer’s guidance on CLAUDE.md suggests staying under 300 lines, frontier models reliably follow 150-200 instructions before compliance degrades. I stay well under that, and I trim aggressively when something falls out of use.
The whole setup is built on the observation from Post 1: the harness matters more than the model. Every file in this repo is part of the harness.
