Short answer. Sonnet 4.6 wins for everyday production code. Opus 4.6 wins for architecture, hard debugging, and anything where reasoning matters more than speed. Haiku 4.5 wins for mechanical edits, file ops, and high-volume small tasks. If you're using Opus for everything, you're paying 5x more than you need to. The trick isn't picking one model. It's routing tasks to the cheapest one that can do the job. At Formaum, I route Opus for reasoning, Sonnet for production code, and Haiku for mechanical tasks . And the system spends maybe 20% of what running Opus-only would.
The honest answer (it depends . Here's the matrix)
There is no single best model. There is a best model per task. The current Claude lineup splits cleanly across three jobs.
| Model | SWE-bench Verified | Price (in / out per 1M tokens) | Best for |
|---|---|---|---|
| Opus 4.6 | 80.8% | $5 / $25 | Architecture, hard debugging, multi-file refactors |
| Sonnet 4.6 | 79.6% | $3 / $15 | Everyday feature code, bug fixes, code review |
| Haiku 4.5 | ~40% | $1 / $5 | Mechanical edits, file ops, formatting, commit messages |
Sonnet sits 1.2 points behind Opus on benchmarks and costs 60% less to run. That gap is the whole story. For most of the work I ship . CRM migrations, API integrations, full-stack features . Sonnet matches Opus output and saves me money on every prompt.
When Opus 4.6 wins
Opus is the one I reach for when reasoning is the bottleneck, not typing speed.
- Architecture decisions. Picking a stack. Designing a data model. Choosing between two patterns that both look fine on paper. Opus catches second-order effects Sonnet misses.
- Multi-file refactors. When a change touches 10+ files and the dependencies aren't obvious, Opus holds the whole graph in its head better.
- Hard debugging. Race conditions, weird async behaviour, state bugs that only show up in production. The kind where you've been staring at the trace for an hour and you need a second brain that actually reasons.
- Unfamiliar codebases. Walking into a 3-year-old legacy repo and figuring out where to make a change. Opus reads context better.
- Factual accuracy review. When the output has to be checked against source data . Claim by claim . Opus is the one that actually verifies instead of hallucinating confirmation.
If a task takes more than 10 minutes to reverse when wrong, I use Opus. The extra cost is cheap compared to fixing a bad architectural call three weeks later.
When Sonnet 4.6 wins
Sonnet is the daily driver. It's what runs when I'm shipping client work and don't want to think about model selection.
- Feature development. Building a new endpoint, a new component, a new workflow node. Standard CRUD. Standard integrations.
- Bug fixes inside a single file. The bug is bounded. The fix is bounded. Sonnet is fast enough and smart enough.
- Code review on a PR. Sonnet catches the same issues a senior engineer would. It misses the same ones too.
- Test generation. Writing unit tests around existing logic. Sonnet handles the surface area without spending Opus money on test scaffolding.
- Iterative refinement. The first pass is right 80% of the time. The next two passes clean it up. Sonnet's speed compounds across iterations.
In Anthropic's own testing, developers preferred Sonnet 4.6 over the previous Opus generation 59% of the time inside Claude Code. That number tracks with what I see.
When Haiku 4.5 wins
Haiku is the one most people sleep on. It's the difference between a $50 day and a $300 day on the API.
- Mechanical edits. Find-and-replace across files. Renaming a variable. Swapping a function signature. Haiku does this perfectly.
- File verification. Confirming an edit landed correctly. Reading a section of a file to check state. No reasoning needed.
- HTML edits and copy swaps. Updating a marketing page. Replacing a price. Changing a heading. Template application.
- Bash and git operations. Running commands, committing, pushing. Execution, not thinking.
- Commit messages and changelogs. Pattern matching on a diff. Haiku writes a fine commit message in a fraction of a second.
- Linting and formatting. Anything a static analyser could almost do.
Haiku struggles on real coding (SWE-bench around 40%). But the tasks above aren't real coding. They're mechanical. Using Sonnet or Opus on them is paying a senior engineer rate to format a file.
Cost-per-task routing (my actual setup)
Here's the routing rule I run in production. It lives in my Claude Code config and applies to every session.
| Task | Model | Why |
|---|---|---|
| Copy swaps, HTML edits, file edits | Haiku | Mechanical. No reasoning. |
| File verification after an edit | Haiku | Just confirming text matches. |
| Bash, git operations | Haiku | Execution. |
| Initial HTML generation from spec | Haiku | Template application. |
| Code review against a rules checklist | Haiku | Checklist, not judgment. |
| Everyday feature code, bug fixes | Sonnet | Daily driver. Best ratio. |
| Test generation, refactors inside a module | Sonnet | Bounded reasoning. |
| Architecture, system design | Opus | Reasoning matters. |
| Hard debugging, race conditions | Opus | Needs depth. |
| Factual accuracy verification | Opus | Needs to verify, not invent. |
The result: a typical day routes 60% of calls to Haiku, 30% to Sonnet, 10% to Opus. The 10% is where the value gets created. The 60% is where the budget would have leaked if I'd run Opus on everything.
Tool integration: when Claude Code picks for you . And when to override
If you use Claude Code, the harness routes tasks across models automatically. It defaults to Sonnet for the main conversation, fires Haiku for cheap sub-agent calls (Bash, file reads, mechanical edits), and reserves Opus for explicit deep-reasoning work.
Override the default when:
- You're starting an architectural conversation. Switch to Opus up front. Don't let it draft a plan on Sonnet and then re-do it.
- You're doing high-volume mechanical work . Bulk renames, large copy swaps, batch file ops. Force Haiku. The default may otherwise spend Sonnet money on every edit.
- You're verifying a deliverable against source data. Use Opus for the verification pass even if the rest of the session is on Sonnet.
The harness is smart. It's not psychic. If you know the task type, override.
Common mistakes (the ones that burn budget)
- Running Opus on everything. The single biggest spend leak. If your bill jumped 4x this month, this is why.
- Using Sonnet for file verification. Reading a file back to confirm an edit landed. Haiku does it in a tenth of a second for a fifth of the cost.
- Re-reading the whole file after a targeted edit. You already know what changed. Trust the diff.
- Spawning multi-agent fan-outs for trivial decisions. If the decision takes 10 minutes to reverse, one writer is enough. Save the fan-out for architecture.
- Iterating on HTML in HTML instead of markdown. Iterate in markdown on Sonnet. Generate HTML once on Haiku at the end. Don't rebuild every round.
- Skipping the model choice entirely. Default behaviour costs you money you didn't budget for. Five minutes setting routing rules pays for itself the same day.
The bottom line
The best Claude model for coding isn't a model. It's a routing strategy. Sonnet for most of the work. Opus when reasoning is the constraint. Haiku for everything mechanical. The teams running this hybrid setup report 60-80% cost savings against running Opus straight through, with no quality drop on the work that matters.
Pick the cheapest model that can do the job. Override the default when you know the task type. Iterate in the cheapest format that holds the meaning. That's the whole game.
Run on a stack that's holding you back?
Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.
Book a call