Same workload. Half the spend. No capability lost.
If your Claude Cowork bill is climbing and you cannot point at the cause, you are almost certainly hitting some combination of these five things. They compound. Fix one and you save tokens. Fix all five and the curve bends.
Here they are in order of impact, with before/after examples from production client engagements.
One framing note before the list. The five fixes are mechanical. Anyone with the right context can apply them. The judgment calls underneath, which skills to prune without breaking a workflow, which subagents to fan out for a high-reversibility decision, when /compact will cost you a thread you actually need, are not mechanical. That is the part that pays for itself, and it is the part that does not show up in a fixes-in-five-minutes post.
Fix 1: /compact Every 30 to 45 Minutes On Long Sessions
What it is. /compact summarises the current conversation and resets the active context to the summary plus the most recent turns. The full history is preserved in the session log. The model is no longer paying input tokens to re-read it on every turn.
Why it works. Message 201 costs as much in input as messages 1-200 combined. Long sessions are exponentially expensive on the right side of the curve. /compact cuts active context by 60 to 80% in one command. The single most leveraged habit you can adopt.
Before
4-hour build session, no compaction. By hour three each message is resubmitting the full architectural discussion, every read file, every tool result. Input tokens per turn climbing past 80K. Cost per turn climbing in lockstep.
After
Same session, /compact at the 45-minute mark and again at the 2-hour mark. Active context resets to a summary plus the last few turns. Input tokens per turn hold steady around 20-30K. Cost per turn flat across the full session.
Apply it: Set a 45-minute timer. When it fires, /compact unless you are mid-debug on something that needs the full thread. Then reset the timer.
Fix 2: Sonnet 4.6 As Default, Opus Only For Hard Work
What it is. Stop defaulting to Opus for everything. Set Sonnet 4.6 as the daily driver and switch to Opus only for tasks where first-pass accuracy is expensive to get wrong: architectural decisions, deep debugging, high-stakes deliverables, hard creative copy.
Why it works. Most overspend on Cowork is wrong-model-default, not over-use. Opus is roughly five times the cost of Sonnet per token. For volume editing, file operations, mechanical edits, status updates, and most day-to-day execution, the quality delta does not justify the cost delta. The savings here are immediate and large.
Before
Opus running every task. Renaming files, summarising a meeting note, drafting a follow-up email. All on the most expensive model in the lineup. Bill scales with token volume, not task complexity.
After
Sonnet handles the volume. Opus reserved for architecture, hard debugging, and high-stakes copy. Bill drops sharply with no perceptible quality change on the work that used to default to Opus.
Apply it: Make Sonnet your default model in settings. Flag your top three or four task types that genuinely need Opus and only switch for those. Review monthly.
Fix 3: MCP Bloat Audit
What it is. Every MCP server you connect injects its tool schemas into every turn. A single bloated server can cost 18,000+ tokens per turn. Five connected servers and you are paying close to 90K input tokens before your prompt even runs. Most operators do not know which server is the culprit because the cost is invisible until you instrument it.
Why it works. Killing one bloated server you barely use frees enough context per turn that everything downstream gets cheaper. The savings compound on every message of every session.
Before
Eight MCP servers connected because they seemed useful at install. Two are used daily, two are used monthly, four are forgotten. Every turn pays the schema tax for all eight. Average input tokens per turn inflated by 40K from unused servers alone.
After
Quarterly MCP audit. Daily-use servers stay. Monthly-use servers move to on-demand activation. Forgotten servers get uninstalled. Per-turn input drops by tens of thousands of tokens across every session.
Apply it: List every MCP server you have connected. For each, ask: did I use this in the last 30 days? If no, uninstall. If rarely, move it to manual activation.
Fix 4: Skill Pruning Past 20 Loaded Skills
What it is. Skills share a context budget of roughly 2% of the window. Past 20 loaded skills you get diminishing returns and active harm. The model starts to forget skills exist, mis-trigger them, or burn tokens scanning a directory it does not need.
Why it works. Smaller, sharper skill libraries trigger more reliably and cost less. The skills you keep should each pull their weight. Skills you wrote six months ago for a one-off client and never used again are silent tax.
Before
34 skills loaded. Half are stale, written for a client engagement that ended, or duplicated by a newer skill. Trigger descriptions overlap. Model occasionally invokes the wrong one. Token tax paid on every turn for skills the model has effectively forgotten.
After
14 skills loaded. Each one earns its place. Triggers are non-overlapping and progressive disclosure keeps frontmatter lean. Wrong-skill invocations drop to near zero. Per-turn cost falls and skill behaviour gets more reliable at the same time.
Apply it: Audit your skill library monthly. Anything not invoked in the last 60 days gets archived. Anything triggered wrong twice gets its trigger description tightened or the skill gets killed.
Fix 5: Subagent Fan-Out Discipline
What it is. Subagents are powerful and expensive. Each one gets its own context, model call, and tool invocations. Fan out 20 subagents on a small task and you pay for 20 full agent runs. The fix is a single rule: fan out only when reversibility is above 10 minutes.
Why it works. Most fan-outs are overkill for the decision at hand. A 200-word edit does not need an 11-agent ensemble. A $15K client deliverable does. Matching research depth to decision reversibility is the cleanest discipline you can install, and it pays back immediately on the next tempting fan-out.
Before
Default to fan-out for almost any non-trivial task. Each fan-out spawns 5 to 11 subagents. Wasted spend concentrated in low-reversibility decisions where one direct draft plus a voice check would have been faster, cheaper, and equally accurate.
After
Self-check before every fan-out: if wrong, how long to fix? Under 10 minutes, no fan-out. One to two hours, two or three agents. Six months of commitment, full pipeline. Subagent spend drops by half or more with no quality loss on the work that mattered.
Apply it: Write the decision-depth rule into your operating instructions so the model checks itself before spawning. Review fan-out usage weekly until the discipline sticks.
The Compounding Effect
Each fix alone saves a meaningful share of your bill. Stacked, they cut spend in half without you losing a single capability. The reason is structural. Most Cowork overspend is architecture, not behaviour. Sessions are too long because nobody designed the session boundaries. Models are too big for the task because nobody mapped task complexity to model tier. MCP schemas are too heavy because nobody audited the connector contract. Skill libraries are too cluttered because nobody pruned against a usage signal. Subagent fan-out is too eager because nobody wrote the decision-reversibility rule into the operating instructions.
The cost drops. The rigor underneath does not. Audit, schema design, failure-mode planning, pilot validation, and multi-location edge-case handling cost the same after these fixes as before. The fixes are leverage on a tuned system, not a substitute for tuning it.
Most of these fixes I learned by watching client builds overspend first. You can pay that tuition or skip it.
The order matters. /compact and model default give you the fastest, biggest wins. MCP and skills are the medium-impact compounding layer. Subagent discipline is the longest behavioural change. Run them in that order and you will see the curve bend within a week.
Want this running on your ops? Book a free 45-min ops mapping call. We'll audit your stack, find the bottlenecks, and show you where Cowork moves the needle. cal.com/formaum/45
Run on a stack that's holding you back?
Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.
Book a call