If you look at my GitHub history, there’s an arc.
The earliest projects use OpenAI. AI Resume Improver — one of my first AI-integrated projects — runs on GPT-4. It uses the OpenAI SDK, the /api/analysis/analyze route, multi-phase pipeline. Classic early-2025 pattern.
The middle-period projects switch to Gemini. AI Newsy, Personal Newsletter, the content-gen project — all Gemini. gemini-2.0-flash for summarization, Google Generative AI library, GitHub Actions on a cron schedule.
The current projects are almost all Claude. IndieOS, PillarPost, SocialButter — all using @anthropic-ai/sdk. The Chief of Staff system is pure Claude. RSS Feeds uses Claude Code in the README.
I didn’t make these switches based on benchmarks. I made them based on what kept working better for the specific problem in front of me. That’s a different thing.
These repos are all internal projects, so you can’t check my Github account for them. If you have questions, shoot me a message. Happy to go into any detail or share anything about these projects.
Why I started with OpenAI
In 2025, GPT-4 was the default for anyone building AI-integrated products. The API was stable, the documentation was good, the examples were everywhere. If you wanted to build something and ship it, OpenAI was the obvious starting point.
That’s not a criticism. The defaults exist for reasons. I built AI Resume Improver because I wanted to learn how to integrate an AI API into a web product. GPT-4 was the right choice for that, well-documented, mature SDK, lots of examples. I wasn’t trying to pick the best model for production at scale. I was trying to learn.
Why I moved to Gemini
AI Newsy was a different kind of project. I wanted something that ran on a schedule, automated news digestion, AI-generated summaries, delivered by email. GitHub Actions as the orchestration layer, Python as the execution layer.
At the time, Gemini Flash was faster and cheaper for the summarization tasks I was doing, and Google’s infrastructure for scheduled Python jobs integrated well with the GitHub Actions setup I was using. It wasn’t a philosophical switch, it was a performance-and-cost decision for a specific workload.
I used Gemini for Personal Newsletter too, mostly out of inertia from AI Newsy. They were similar architectures solving similar problems. Same stack made sense.
Why I landed on Claude
The shift started with content work. When I built PillarPost, a tool that takes raw ideas and generates LinkedIn post variations, I needed something that could hold a nuanced content framework, understand subtle tone differences between post types, and produce output that actually sounded like me rather than like an AI.
I’d been following Shubham Saboo’s work (@Saboo_Shubham_), who originated the Chief of Staff agent pattern I’ve built on. His stack runs on Claude. That gave me a practical reason to test it seriously.
The difference I noticed, and this is personal experience rather than benchmark data: Claude handled ambiguous or open-ended writing tasks better than what I’d seen from GPT-4 at the time. The outputs felt more considered. When I gave it a content framework and asked it to generate variations, it actually used the framework rather than pattern-matching to a generic LinkedIn post format.
For the Chief of Staff system, I wasn’t making a model choice, I was choosing where to build the harness. Claude Code is a native environment for Claude models. The agent primitives (subagents, SOUL.md, AGENTS.md, slash commands) are Claude-native. Using Claude models in a Claude Code harness isn’t a coincidence; it’s the intended pattern.
Where OpenAI still wins for me
This isn’t a conversion story. I still use OpenAI for two things.
YouTube Transcript uses OpenAI Whisper for audio transcription. Whisper is still the best tool I’ve found for turning YouTube audio into clean, punctuated transcripts. I tested Claude’s audio capabilities for this workload. Whisper produced better results. I’m not switching for the sake of consistency.
For long-running terminal tasks and agentic shell loops, GPT-5.5 (released April 23, 2026) is genuinely stronger, Terminal-Bench 82.7% versus Claude Opus 4.7’s 69.4%. If I have a task that involves extended shell execution, unattended browsing, or multi-step command-line work, I’ll route it there. The benchmark gap at that specific job is real.
The point is picking the right tool for the job, not picking a team and staying loyal. I was defaulting to OpenAI because that’s what I knew. Now I’m defaulting to Claude for the orchestration and content work because that’s what keeps working better. Both of those defaults are provisional.
The trust angle
There’s a dimension to this that goes beyond benchmarks. Anthropic’s April 2026 pricing split tests, Claude Code was removed from the $20 Pro plan before being partly rolled back, were a reminder that trust in a vendor involves more than whether their model performs. It involves whether they communicate well, whether they grandfather existing customers when plans change, whether they treat their users like adults.
The April 23 postmortem (three Claude Code regressions, acknowledged honestly after weeks of community frustration) was, ironically, a trust-builder. A company that publishes a thorough postmortem and resets subscriber usage limits is doing something many vendors don’t. It doesn’t erase the regression. But it’s a different response than denial.
I wrote separately about what I took from that incident. For this series, the short version: the model I use most matters less than whether I can build on the vendor’s platform with reasonable confidence that the platform is being maintained honestly.
What this means if you’re just starting
If you’re building your first AI-integrated project today, OpenAI is still a defensible default. The API is mature, the documentation is excellent, the SDKs are well-maintained.
If you’re building agent systems, orchestrated, multi-turn, persona-driven, Claude Code is where I’d start. The agent primitives feel native there in a way they don’t elsewhere.
If you’re doing audio transcription, Whisper is still the best option I’ve tested.
None of that is permanent. The model landscape shifts fast enough that “best for this job right now” is probably the only honest framing. Build the harness with portability in mind, LiteLLM as the model-routing layer, MCP for tool portability, so swapping models doesn’t require rewriting your system.
The model I started with eight months ago is not the model I run my most important work on today. That’s fine. It’s supposed to change.









