Anthropic ships three main Claude models. Most teams use one of them for everything. That's the wrong answer in production. Here's the practical breakdown of when to use which, and how to architect a system that uses all three.
The three models, at a glance
Claude Opus. The flagship. Highest reasoning depth, strongest at complex multi-step tasks, most capable at handling ambiguity. Slowest and most expensive.
Claude Sonnet. The middle tier. Roughly 80% of Opus's reasoning quality at a fraction of the price and latency. The default for most production work.
Claude Haiku. The fast, cheap option. Strong at narrow, well-defined tasks at very low cost and very low latency. The right tool for high-volume, low-risk classification and routing.
Anthropic ships new versions of each tier regularly, and the gaps between them keep shifting. Check Anthropic's current model documentation for the latest specs. The principles below stay the same regardless of which version is current.
What Opus is good at
Use Opus when being right matters more than being fast or cheap. Specifically:
- Strategic analysis and judgment-heavy tasks where the cost of being wrong is high
- Complex multi-step reasoning where each step depends on the previous one
- Long-context tasks involving lots of cross-referencing
- Generating content that needs to nail voice, tone, or technical accuracy on the first pass
- Final review or critique passes where catching subtle errors matters
Don't use Opus for high-volume, repeatable tasks. The cost adds up fast and most of those tasks don't need the reasoning depth.
What Sonnet is good at
Sonnet is the right default for the bulk of production work. Use Sonnet when:
- Lead classification, intent detection, segmentation
- Drafting emails, messages, follow-ups in a defined voice
- Structured data extraction from unstructured documents
- Summarising calls, meetings, transcripts into action items
- Most agentic tool-use workflows
- Most code generation tasks (Claude Code defaults to Sonnet for good reason)
If you're not sure which model to use, start with Sonnet. It handles 80%+ of real production tasks competently, and you can upgrade specific calls to Opus if you find Sonnet falling short on a particular task.
What Haiku is good at
Haiku shines on narrow, well-defined, high-volume tasks where speed and cost matter more than reasoning depth:
- Language detection (what language is this message in?)
- Simple intent classification (lead / support / spam)
- Routing decisions (which queue should this go to?)
- Content moderation flags (does this contain X?)
- Real-time interactive features where every millisecond of latency matters
- The cheap first pass in a multi-stage pipeline that escalates to Sonnet or Opus only when needed
Don't use Haiku for tasks that require nuance, long context, or complex reasoning. It will work, but the quality drop is real. Use it where the task fits its strengths.
The architecture that wins: tiered routing
The smartest production systems don't pick one model. They route each task to the right tier automatically. The pattern looks like this:
The triage layer (Haiku) handles the easy cases cheaply. The bulk of work runs on Sonnet. Only the genuinely hard or ambiguous cases escalate to Opus. On real workloads, this pattern cuts AI cost roughly 60-80% versus running everything on the flagship.
The principle: Match the model to the task, not the task to the model. Most teams overpay because they're sending easy tasks to expensive models. Tiered routing fixes that with a few hundred lines of code.
What about prompt caching?
The other big lever for cutting cost is prompt caching. If you're sending the same context repeatedly (a long system prompt, a knowledge base, a set of examples), prompt caching keeps that context warm and charges you only for the new portion.
For production agents that run many times a day with the same setup, prompt caching can cut input costs by another 60-90% on top of tiered routing. Combine the two and AI infrastructure becomes a small line item rather than a budget concern.
The bigger picture
Most teams pay too much for AI not because the models are expensive but because they're using one model for everything. Tiered routing and prompt caching together turn a system that costs thousands a month into one that costs tens or hundreds. That's not a small optimisation. It's the difference between AI being a viable production layer and being a feature you ration.
Pick the right model for the task. Cache what you can. Then the cost stops being the conversation.
Frequently Asked Questions
Running on a stack that grew by accident?
Tools added one at a time, never architected together. That's the problem I solve. Book 45 minutes and I'll map what moves, what stays, and what makes sense for your operation.
Book a Discovery Call