AI Tools 6 min read

Claude Sonnet vs Opus vs Haiku: Which Model to Use When

A practical guide to Anthropic's three Claude model tiers. What each one is good at, what they cost, and how to architect a production system that uses all three to cut AI spend by 60-80%.

Anthropic ships three main Claude models. Most teams use one of them for everything. That's the wrong answer in production. Here's the practical breakdown of when to use which, and how to architect a system that uses all three.


The three models, at a glance

Claude Opus. The flagship. Highest reasoning depth, strongest at complex multi-step tasks, most capable at handling ambiguity. Slowest and most expensive.

Claude Sonnet. The middle tier. Roughly 80% of Opus's reasoning quality at a fraction of the price and latency. The default for most production work.

Claude Haiku. The fast, cheap option. Strong at narrow, well-defined tasks at very low cost and very low latency. The right tool for high-volume, low-risk classification and routing.

Anthropic ships new versions of each tier regularly, and the gaps between them keep shifting. Check Anthropic's current model documentation for the latest specs. The principles below stay the same regardless of which version is current.


What Opus is good at

Use Opus when being right matters more than being fast or cheap. Specifically:

Don't use Opus for high-volume, repeatable tasks. The cost adds up fast and most of those tasks don't need the reasoning depth.


What Sonnet is good at

Sonnet is the right default for the bulk of production work. Use Sonnet when:

If you're not sure which model to use, start with Sonnet. It handles 80%+ of real production tasks competently, and you can upgrade specific calls to Opus if you find Sonnet falling short on a particular task.


What Haiku is good at

Haiku shines on narrow, well-defined, high-volume tasks where speed and cost matter more than reasoning depth:

Don't use Haiku for tasks that require nuance, long context, or complex reasoning. It will work, but the quality drop is real. Use it where the task fits its strengths.


The architecture that wins: tiered routing

The smartest production systems don't pick one model. They route each task to the right tier automatically. The pattern looks like this:

Tiered routing pattern Inbound task │ ▼ ┌──────────────────────────────┐ │ Triage (Haiku) │ ← cheap, fast classification └──────────────────────────────┘ │ ▼ Confidence high? ┌────┴────┐ YES NO │ │ ▼ ▼ Final ┌──────────────────────┐ response │ Standard work │ │ (Sonnet) │ └──────────────────────┘ │ ▼ Edge case detected? ┌────┴────┐ NO YES │ │ ▼ ▼ Final ┌────────────────┐ response │ Hard reasoning │ │ (Opus) │ └────────────────┘

The triage layer (Haiku) handles the easy cases cheaply. The bulk of work runs on Sonnet. Only the genuinely hard or ambiguous cases escalate to Opus. On real workloads, this pattern cuts AI cost roughly 60-80% versus running everything on the flagship.

🎯

The principle: Match the model to the task, not the task to the model. Most teams overpay because they're sending easy tasks to expensive models. Tiered routing fixes that with a few hundred lines of code.


What about prompt caching?

The other big lever for cutting cost is prompt caching. If you're sending the same context repeatedly (a long system prompt, a knowledge base, a set of examples), prompt caching keeps that context warm and charges you only for the new portion.

For production agents that run many times a day with the same setup, prompt caching can cut input costs by another 60-90% on top of tiered routing. Combine the two and AI infrastructure becomes a small line item rather than a budget concern.


The bigger picture

Most teams pay too much for AI not because the models are expensive but because they're using one model for everything. Tiered routing and prompt caching together turn a system that costs thousands a month into one that costs tens or hundreds. That's not a small optimisation. It's the difference between AI being a viable production layer and being a feature you ration.

Pick the right model for the task. Cache what you can. Then the cost stops being the conversation.


Frequently Asked Questions

What's the difference between Claude Sonnet and Claude Opus?
Sonnet is the middle tier. It has roughly 80% of Opus's reasoning quality at a fraction of the price and latency. Opus is the flagship, used when reasoning depth and accuracy matter more than speed or cost. For most production work, Sonnet is the right default.
When should I use Claude Haiku?
Haiku is the right tool for high-volume, narrow, well-defined tasks where speed and cost matter more than reasoning depth. Language detection, simple classification, routing, content moderation, the cheap first pass in a multi-stage pipeline. Don't use Haiku for tasks requiring nuance or long context.
Which Claude model is cheapest?
Haiku is cheapest by a significant margin, then Sonnet, then Opus. Pricing varies and updates, so check Anthropic's current pricing page. The bigger lever for cost is architecture (tiered routing and prompt caching) rather than picking the cheapest model for everything.
Can I use multiple Claude models in the same system?
Yes. The smartest production systems use all three with tiered routing. Haiku for the cheap first pass, Sonnet for the bulk of work, Opus only for the genuinely hard cases. This pattern typically cuts AI costs 60-80% versus running everything on a single flagship model.
Are Claude models available in Canada, the US, and the UK?
Yes. All three Claude tiers (Opus, Sonnet, Haiku) are available globally through the Anthropic API and Claude Team and Enterprise plans. There are no regional restrictions.

Continue reading
AI Tools
What is Claude Code →
Buyer's Guide
How to Hire a Claude Code Expert →
AI Tools
What Are AI Agents →
Selected Work
See the Case Studies →

Running on a stack that grew by accident?

Tools added one at a time, never architected together. That's the problem I solve. Book 45 minutes and I'll map what moves, what stays, and what makes sense for your operation.

Book a Discovery Call
GC

Genevieve Claire

Operations strategist. Previously EA Sports FIFA — $100M productions, $7B franchise. Now I build operations infrastructure for multi-location businesses. LinkedIn →