Claude Opus is Anthropic's most capable model for reasoning, complex code, and high-stakes analysis. Sonnet is the balanced default for everyday production workloads. Haiku is the fastest and cheapest, built for high-volume mechanical tasks. Architecting a system that routes each task to the right tier cuts AI spend 60–80% in production. At Formaum, every multi-location AI build uses all three Claude models with explicit task-to-model routing.


The three models, at a glance

Claude Opus. The flagship. Highest reasoning depth, strongest at complex multi-step tasks, most capable at handling ambiguity. Slowest and most expensive.

Claude Sonnet. The middle tier. Roughly 80% of Opus's reasoning quality at a fraction of the price and latency. The default for most production work.

Claude Haiku. The fast, cheap option. Strong at narrow, well-defined tasks at very low cost and very low latency. The right tool for high-volume, low-risk classification and routing.

Anthropic ships new versions of each tier regularly, and the gaps between them keep shifting. Check Anthropic's current model documentation for the latest specs. The principles below stay the same regardless of which version is current.


What Opus is good at

Use Opus when being right matters more than being fast or cheap. Specifically:

Don't use Opus for high-volume, repeatable tasks. The cost adds up fast and most of those tasks don't need the reasoning depth.


What Sonnet is good at

Sonnet is the right default for the bulk of production work. Use Sonnet when:

If you're not sure which model to use, start with Sonnet. It handles 80%+ of real production tasks competently, and you can upgrade specific calls to Opus if you find Sonnet falling short on a particular task.


What Haiku is good at

Haiku shines on narrow, well-defined, high-volume tasks where speed and cost matter more than reasoning depth:

Don't use Haiku for tasks that require nuance, long context, or complex reasoning. It will work, but the quality drop is real. Use it where the task fits its strengths.


The architecture that wins: tiered routing

The smartest production systems don't pick one model. They route each task to the right tier automatically. The pattern looks like this:

Tiered routing pattern Inbound task │ ▼ ┌──────────────────────────────┐ │ Triage (Haiku) │ ← cheap, fast classification └──────────────────────────────┘ │ ▼ Confidence high? ┌────┴────┐ YES NO │ │ ▼ ▼ Final ┌──────────────────────┐ response │ Standard work │ │ (Sonnet) │ └──────────────────────┘ │ ▼ Edge case detected? ┌────┴────┐ NO YES │ │ ▼ ▼ Final ┌────────────────┐ response │ Hard reasoning │ │ (Opus) │ └────────────────┘

The triage layer (Haiku) handles the easy cases cheaply. The bulk of work runs on Sonnet. Only the genuinely hard or ambiguous cases escalate to Opus. On real workloads, this pattern cuts AI cost roughly 60-80% versus running everything on the flagship.

🎯

The principle: Match the model to the task, not the task to the model. Most teams overpay because they're sending easy tasks to expensive models. Tiered routing fixes that with a few hundred lines of code.


What about prompt caching?

The other big lever for cutting cost is prompt caching. If you're sending the same context repeatedly (a long system prompt, a knowledge base, a set of examples), prompt caching keeps that context warm and charges you only for the new portion.

For production agents that run many times a day with the same setup, prompt caching can cut input costs by another 60-90% on top of tiered routing. Combine the two and AI infrastructure becomes a small line item rather than a budget concern.


The bigger picture

Most teams pay too much for AI not because the models are expensive but because they're using one model for everything. Tiered routing and prompt caching together turn a system that costs thousands a month into one that costs tens or hundreds. That's not a small optimisation. It's the difference between AI being a viable production layer and being a feature you ration.

Pick the right model for the task. Cache what you can. Then the cost stops being the conversation.

Run on a stack that's holding you back?

Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.

Book a call

Frequently Asked Questions

What's the difference between Claude Sonnet and Claude Opus?
Sonnet is the middle tier. It has roughly 80% of Opus's reasoning quality at a fraction of the price and latency. Opus is the flagship, used when reasoning depth and accuracy matter more than speed or cost. For most production work, Sonnet is the right default.
When should I use Claude Haiku?
Haiku is the right tool for high-volume, narrow, well-defined tasks where speed and cost matter more than reasoning depth. Language detection, simple classification, routing, content moderation, the cheap first pass in a multi-stage pipeline. Don't use Haiku for tasks requiring nuance or long context.
Which Claude model is cheapest?
Haiku is cheapest by a significant margin, then Sonnet, then Opus. Pricing varies and updates, so check Anthropic's current pricing page. The bigger lever for cost is architecture (tiered routing and prompt caching) rather than picking the cheapest model for everything.
Can I use multiple Claude models in the same system?
Yes. The smartest production systems use all three with tiered routing. Haiku for the cheap first pass, Sonnet for the bulk of work, Opus only for the genuinely hard cases. This pattern typically cuts AI costs 60-80% versus running everything on a single flagship model.
Are Claude models available in Canada, the US, and the UK?
Yes. All three Claude tiers (Opus, Sonnet, Haiku) are available globally through the Anthropic API and Claude Team and Enterprise plans. There are no regional restrictions.
Genevieve Claire
Genevieve Claire
Founder, Formaum — Claude Code Expert & Full-Stack AI Engineer

Builds bespoke AI automation systems for multi-location operations. Previously EA Sports FIFA ($7B franchise) and Film/TV VFX on Skyfall, Avengers, Game of Thrones. Based in Vancouver, BC.