Claude Code, Codex, and Cursor: Three Tools, Three Jobs

People ask me which tool they should use for AI coding. The honest answer is that I use all three, and the question I ask isn’t “which one is better”, it’s “which problem am I trying to solve right now?”

Here’s how they actually divide in my workflow.

Cursor: where I write code

Cursor is my IDE. I’ve been using it since Cursor 3 shipped in early 2026, and the main thing I use it for is code editing, inline suggestions, multi-file edits, refactoring within a project.

The interaction model that matters most for my work is Agent mode (Cmd+I on Mac). This is Cursor’s autonomous mode, it reads files, edits multiple files in a single pass, runs terminal commands. It’s an agent in the sense that it makes decisions, not just completions.

But Cursor’s agent is one agent. It’s the IDE’s agent, shaped by rules you define. Mine lives in .cursor/rules/ — project-level .mdc files that specify how it should behave in this repo. When I’m working in a new project, my first step is usually writing the rules file: what’s the stack, what patterns do I want it to follow, what should it avoid.

I use Plan Mode for anything more complex than a small edit. Before touching code, the agent researches the files, asks clarifying questions, writes a detailed plan. I review the plan. Then it executes. That discipline, research before action, saves me a lot of cleanup.

Where Cursor shines: when the task is well-scoped and the context is the current project. Code completion, refactoring, writing tests for something I just built. It’s fast, it’s IDE-native, and $20/month is a reasonable entry point.

Where I don’t use it: for system-level agent work. Cursor’s agent isn’t a persona. There’s no SOUL.md for Cursor’s agent. It doesn’t have its own memory between sessions in the same way Claude Code subagents do. For the kind of orchestration work I do with the Chief of Staff system, dispatching tasks to named agents, running performance reviews, managing the content pipeline, Claude Code is the right tool.

Claude Code: where I build systems

Claude Code is where my actual operating system lives. The repo I’ve described in earlier posts in this series, Jim, Dwight, Karen, Kelly, and the rest, that all runs in Claude Code.

What makes Claude Code different from Cursor for this use case is the subagent architecture. Each agent in .claude/agents/ is a separate entity with its own model specification, tool access, and system prompt. When Jim dispatches a task to Dwight, Dwight spins up in his own context window with his own tool access. He doesn’t know what Karen is doing. He doesn’t have access to Jim’s full memory. The scoping is the point.

This architecture, a named agent with a defined identity, specific tools, scoped access — doesn’t have a direct equivalent in Cursor. Cursor’s rules system gets you some of the way there, but it’s one agent shaped by context, not multiple agents with defined boundaries.

Claude Code is also where I interact with the system on a daily basis. I open a session, Jim is already loaded from CLAUDE.md, I type a request, and the Chief of Staff decides how to route it. That experience, where the AI already knows the context because the context is written into the repo, is the practical payoff of everything I described in Posts 1 through 3.

The trade-off: Claude Code requires Claude Max ($100/month as of March 2026). That’s a meaningful cost difference from Cursor’s $20. If I were starting fresh today with a limited budget and a single project, I’d probably start with Cursor. If I were running an agent team for an ongoing operation, Claude Code is where I’d be.

Codex: for the work I don’t watch

Codex is OpenAI’s cloud-based coding agent. The thing that makes it different from Claude Code and Cursor is execution model: it runs in the cloud, without your laptop, triggered by a schedule, an API call, or a GitHub webhook.

I use Codex for tasks I want to run asynchronously. An agent-generated code project analysis. Research that takes 30-40 minutes and doesn’t need my attention. Parallel tasks where I want multiple agents working at the same time.

The practical constraint before cloud execution was that any agent task required my laptop to be open and a session to be running. That’s a significant constraint on what you’re willing to ask agents to do. If it takes 45 minutes and requires me to be at my computer, I might decide it’s not worth it. If it runs while I’m doing something else and I review the output later, the calculus changes.

NEW: Claude Code Routines (launched April 14, 2026, still in research preview) solves the same problem from the Anthropic side. Scheduled runs, API triggers, GitHub webhook support. I’ve started using those too — more on this in Post 5, which is specifically about Cowork and what changed when agents stopped needing my laptop to be on.

Where I still use OpenAI in my stack: YouTube Transcript, one of my earlier projects, uses OpenAI Whisper for audio transcription. Whisper is still the best tool I’ve found for that specific job. I’m not religious about staying on one vendor. I use what works.

The honest comparison

The benchmark-scoreboard version of this comparison is: Claude Opus 4.7 leads on code review and architecture work (SWE-Bench Verified 87.6%), GPT-5.5 leads on terminal-based and long-running agentic tasks (Terminal-Bench 82.7%). Cursor is primarily a coding assistant, not a benchmark-optimized research subject.

That framing matters for people choosing at the API level. For most day-to-day work, both frontier models are good enough that the harness and the task design matter more than the model.

The more useful comparison for my actual use: each tool has a native habitat.

Cursor’s native habitat is the code editor, file-level context, fast iterations on existing code. Claude Code’s native habitat is agent orchestration, persona-driven systems, multi-agent pipelines. Codex’s native habitat is cloud-executed async tasks, parallel work, scheduled automation.

None of them does the other two’s jobs as well. That’s why I use all three.

A note on frameworks

I deliberately left LangChain, CrewAI, and the agent frameworks out of this post. The developer series has a separate piece on the framework decision (spoiler: for most use cases, the right answer is no framework at all). Claude Code, Cursor, and Codex are product surfaces, not framework choices. They’re worth understanding on their own terms.