Same workload. Half the spend. No capability lost.

If your Claude Cowork bill is climbing and you cannot point at the cause, you are almost certainly hitting some combination of these five things. They compound. Fix one and you save tokens. Fix all five and the curve bends.

Here they are in order of impact, with before/after examples from production client engagements.

One framing note before the list. The five fixes are mechanical. Anyone with the right context can apply them. The judgment calls underneath, which skills to prune without breaking a workflow, which subagents to fan out for a high-reversibility decision, when /compact will cost you a thread you actually need, are not mechanical. That is the part that pays for itself, and it is the part that does not show up in a fixes-in-five-minutes post.

60-80%Active Context Reduction From /compact
18K+Tokens Per Turn Per Bloated MCP
~2%Context Budget Shared By Skills

Fix 1: /compact Every 30 to 45 Minutes On Long Sessions

What it is. /compact summarises the current conversation and resets the active context to the summary plus the most recent turns. The full history is preserved in the session log. The model is no longer paying input tokens to re-read it on every turn.

Why it works. Message 201 costs as much in input as messages 1-200 combined. Long sessions are exponentially expensive on the right side of the curve. /compact cuts active context by 60 to 80% in one command. The single most leveraged habit you can adopt.

Before

4-hour build session, no compaction. By hour three each message is resubmitting the full architectural discussion, every read file, every tool result. Input tokens per turn climbing past 80K. Cost per turn climbing in lockstep.

After

Same session, /compact at the 45-minute mark and again at the 2-hour mark. Active context resets to a summary plus the last few turns. Input tokens per turn hold steady around 20-30K. Cost per turn flat across the full session.

Apply it: Set a 45-minute timer. When it fires, /compact unless you are mid-debug on something that needs the full thread. Then reset the timer.


Fix 2: Sonnet 4.6 As Default, Opus Only For Hard Work

What it is. Stop defaulting to Opus for everything. Set Sonnet 4.6 as the daily driver and switch to Opus only for tasks where first-pass accuracy is expensive to get wrong: architectural decisions, deep debugging, high-stakes deliverables, hard creative copy.

Why it works. Most overspend on Cowork is wrong-model-default, not over-use. Opus is roughly five times the cost of Sonnet per token. For volume editing, file operations, mechanical edits, status updates, and most day-to-day execution, the quality delta does not justify the cost delta. The savings here are immediate and large.

Before

Opus running every task. Renaming files, summarising a meeting note, drafting a follow-up email. All on the most expensive model in the lineup. Bill scales with token volume, not task complexity.

After

Sonnet handles the volume. Opus reserved for architecture, hard debugging, and high-stakes copy. Bill drops sharply with no perceptible quality change on the work that used to default to Opus.

Apply it: Make Sonnet your default model in settings. Flag your top three or four task types that genuinely need Opus and only switch for those. Review monthly.


Fix 3: MCP Bloat Audit

What it is. Every MCP server you connect injects its tool schemas into every turn. A single bloated server can cost 18,000+ tokens per turn. Five connected servers and you are paying close to 90K input tokens before your prompt even runs. Most operators do not know which server is the culprit because the cost is invisible until you instrument it.

Why it works. Killing one bloated server you barely use frees enough context per turn that everything downstream gets cheaper. The savings compound on every message of every session.

Before

Eight MCP servers connected because they seemed useful at install. Two are used daily, two are used monthly, four are forgotten. Every turn pays the schema tax for all eight. Average input tokens per turn inflated by 40K from unused servers alone.

After

Quarterly MCP audit. Daily-use servers stay. Monthly-use servers move to on-demand activation. Forgotten servers get uninstalled. Per-turn input drops by tens of thousands of tokens across every session.

Apply it: List every MCP server you have connected. For each, ask: did I use this in the last 30 days? If no, uninstall. If rarely, move it to manual activation.


Fix 4: Skill Pruning Past 20 Loaded Skills

What it is. Skills share a context budget of roughly 2% of the window. Past 20 loaded skills you get diminishing returns and active harm. The model starts to forget skills exist, mis-trigger them, or burn tokens scanning a directory it does not need.

Why it works. Smaller, sharper skill libraries trigger more reliably and cost less. The skills you keep should each pull their weight. Skills you wrote six months ago for a one-off client and never used again are silent tax.

Before

34 skills loaded. Half are stale, written for a client engagement that ended, or duplicated by a newer skill. Trigger descriptions overlap. Model occasionally invokes the wrong one. Token tax paid on every turn for skills the model has effectively forgotten.

After

14 skills loaded. Each one earns its place. Triggers are non-overlapping and progressive disclosure keeps frontmatter lean. Wrong-skill invocations drop to near zero. Per-turn cost falls and skill behaviour gets more reliable at the same time.

Apply it: Audit your skill library monthly. Anything not invoked in the last 60 days gets archived. Anything triggered wrong twice gets its trigger description tightened or the skill gets killed.


Fix 5: Subagent Fan-Out Discipline

What it is. Subagents are powerful and expensive. Each one gets its own context, model call, and tool invocations. Fan out 20 subagents on a small task and you pay for 20 full agent runs. The fix is a single rule: fan out only when reversibility is above 10 minutes.

Why it works. Most fan-outs are overkill for the decision at hand. A 200-word edit does not need an 11-agent ensemble. A $15K client deliverable does. Matching research depth to decision reversibility is the cleanest discipline you can install, and it pays back immediately on the next tempting fan-out.

Before

Default to fan-out for almost any non-trivial task. Each fan-out spawns 5 to 11 subagents. Wasted spend concentrated in low-reversibility decisions where one direct draft plus a voice check would have been faster, cheaper, and equally accurate.

After

Self-check before every fan-out: if wrong, how long to fix? Under 10 minutes, no fan-out. One to two hours, two or three agents. Six months of commitment, full pipeline. Subagent spend drops by half or more with no quality loss on the work that mattered.

Apply it: Write the decision-depth rule into your operating instructions so the model checks itself before spawning. Review fan-out usage weekly until the discipline sticks.


The Compounding Effect

Each fix alone saves a meaningful share of your bill. Stacked, they cut spend in half without you losing a single capability. The reason is structural. Most Cowork overspend is architecture, not behaviour. Sessions are too long because nobody designed the session boundaries. Models are too big for the task because nobody mapped task complexity to model tier. MCP schemas are too heavy because nobody audited the connector contract. Skill libraries are too cluttered because nobody pruned against a usage signal. Subagent fan-out is too eager because nobody wrote the decision-reversibility rule into the operating instructions.

The cost drops. The rigor underneath does not. Audit, schema design, failure-mode planning, pilot validation, and multi-location edge-case handling cost the same after these fixes as before. The fixes are leverage on a tuned system, not a substitute for tuning it.

Most of these fixes I learned by watching client builds overspend first. You can pay that tuition or skip it.

⚠️

The order matters. /compact and model default give you the fastest, biggest wins. MCP and skills are the medium-impact compounding layer. Subagent discipline is the longest behavioural change. Run them in that order and you will see the curve bend within a week.


Want this running on your ops? Book a free 45-min ops mapping call. We'll audit your stack, find the bottlenecks, and show you where Cowork moves the needle. cal.com/formaum/45

Run on a stack that's holding you back?

Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.

Book a call

Frequently Asked Questions

How do I actually cut my Claude Cowork bill without losing capability?
Five compounding fixes in order of impact: /compact every 30-45 minutes on long sessions, default to Sonnet 4.6 with Opus reserved for hard tasks, audit and prune MCP servers quarterly, keep skill libraries under 20 loaded skills, and apply subagent fan-out discipline based on decision reversibility. Stacked, these cut spend in half.
What does /compact actually do?
/compact summarises the current conversation and resets active context to that summary plus the most recent turns. The full session log is preserved. On long sessions it reduces active context by 60 to 80%, which collapses per-turn input cost and prevents the cost curve from running away after the 200-message mark.
Why is defaulting to Opus expensive?
Opus is roughly five times the cost of Sonnet per token. For volume editing, file operations, mechanical work, and most execution tasks, the quality delta does not justify the cost delta. Most Cowork overspend is wrong-model-default, not over-use. Reserve Opus for architecture, hard debugging, and high-stakes first-pass accuracy.
How much do MCP servers cost in tokens?
A single bloated MCP server can inject 18,000+ tokens per turn through its tool schemas. Five connected servers can push input cost close to 90K tokens before your prompt even runs. Audit quarterly: uninstall anything you have not used in 30 days, move rarely-used servers to manual activation.
How many skills is too many in Claude Cowork?
Skills share a context budget of roughly 2% of the window. Past 20 loaded skills you hit diminishing returns and active harm: the model forgets skills, mis-triggers them, and burns tokens scanning a directory it does not need. Audit monthly, archive anything not invoked in 60 days.
GC
Genevieve Claire
Founder, Formaum — Claude Code Expert & Full-Stack AI Engineer

Builds bespoke AI automation systems for multi-location operations. Previously EA Sports FIFA ($7B franchise) and Film/TV VFX on Skyfall, Avengers, Game of Thrones. Based in Vancouver, BC.