The Cowork Maintenance Loop: What Nobody Documents

Message 201 costs as much in input tokens as messages 1-200 combined. That single line reframes the entire economics of running Cowork at scale, and almost nobody who sells Cowork rollouts mentions it.

The reason most Cowork stacks drift into expensive nonsense is that nobody maintains them. A skill written in month 1 fires on triggers that no longer match the workflow. A plugin installed in month 2 carries 14 sub-skills the operator has never used. An MCP server eats 18,000 tokens per turn doing nothing useful. The bill climbs. The output quality slips. The operator blames the tool.

This is the maintenance cadence I install on every client retainer that includes ongoing Cowork support. Weekly, monthly, quarterly. None of it is glamorous. All of it is the difference between a stack that compounds and a stack that rots.

Why Cowork Drifts (And Why Nobody Flags It)

Cowork drift is structural, not accidental. Four mechanisms cause it, all of them invisible to the operator using the system day to day.

Skills accumulate, attention budgets don't. Skills share roughly a 2% slice of the context window for their trigger descriptions. Past 20 installed skills, the model starts forgetting some exist. The operator notices output quality slipping and assumes the model got worse. The model is fine. The library is bloated.

MCP servers expand silently. Each connected MCP server burns context per turn. A noisy server can eat 18,000+ tokens before the operator types a single character. Most teams add servers and never remove them. The bill climbs and nobody can point to which connector is responsible.

Long sessions cascade. Cowork's autocompact behaviour is a feature, not a bug, but on a 4-hour session without manual /compact discipline, the same context gets resubmitted across dozens of turns. Disciplined /compact every 30-45 minutes cuts active context by 60-80%. Most operators never do it.

Plugins ship with defaults nobody tunes. The single highest-leverage move on any installed plugin is the /customize pass. It's the step almost everyone skips at install, and the one nobody comes back to in month 3.

The reason nobody flags any of this is that there's no error message. The system keeps working. It just gets slower, dumber, and more expensive over time. Maintenance is what catches the drift before the bill does.

Maintenance looks like overhead until you have seen what an untuned Cowork stack costs at month four. By the time you notice the bill, the drift is already six weeks deep, the skill library has compounded its own confusion, and the operator using the stack day to day has stopped trusting the output. None of that shows up in a usage dashboard.

Weekly Maintenance: 30 Minutes

Friday afternoon, end of the working week. Block the calendar. Run the loop.

Skill usage scan. Open the Cowork usage view. For every skill in the library, ask one question: did I fire this in the last 7 days? If yes, it stays. If no, flag it for the monthly prune pass. Don't delete on a Friday. Friday-you is tired and makes overconfident calls. Just flag.

MCP bloat check. Look at the connected MCP servers. Count them. If any server was added this week, confirm it's actually being used. Servers added in week 2 of a project and forgotten by week 5 are the single largest source of silent context burn.

/compact discipline review. Look back at the week's long sessions. Any session over 90 minutes where /compact wasn't run? Note it. The next time that session pattern starts, set a 30-minute timer and compact at the bell. This is operator-discipline maintenance, not tool maintenance, but it lives in the same loop.

Prune the obviously dead. Plugins installed during a try-it-out moment, skills written for a one-off task, MCP servers from a vendor evaluation that didn't go anywhere. If it hasn't been touched in two consecutive weekly scans, delete it on the second scan. Two-strike rule. Keeps the library honest without making Friday-you destructive.

~2%Context budget skills share

60-80%Active context cut by /compact discipline

18K+Tokens a noisy MCP server burns per turn

Monthly: 2 Hours

First Monday of the month, before the week's client work starts. Two hours, deep maintenance pass.

Cost review against industry benchmarks. Pull the previous month's Cowork spend. Per operator, per active day. Industry average for engaged operator use lands around $13/day, and 90% of users stay under $30/day. If your number is above $30/day per operator, the cause is almost always one of four things: subagent fan-out without budget, autocompact cascades on long sessions, MCP bloat, or wrong-model defaults. Diagnose specifically. Don't generalise.

Plugin /customize pass. For every plugin installed during the month, run /customize. Prune sub-skills you haven't used, tighten trigger descriptions on the ones you have, raise the model floor on skills that require depth and lower it on skills that don't. This is the highest-ROI hour of the month. Nobody does it. Do it.

Skill anti-pattern audit. Walk the skill library and look for the four most common failure modes:

Skills written as documentation instead of triggers. SKILL.md should lead with a tight trigger condition, not a paragraph of explanation.
Skills with overlapping triggers. Two skills firing on the same condition produce inconsistent output. Pick the keeper, kill the duplicate.
Skills with bloated trigger descriptions. A trigger description over 200 words is almost always over-explained. Compress.
Skills using Opus for tasks Sonnet handles cleanly. Wrong-model default is the most common single line item on a bloated bill.

Execute the flagged-for-deletion list from the weekly scans. Anything that survived four weekly flag passes without being un-flagged gets deleted on the monthly. Library stays clean. Context budget stays honest.

Quarterly: Half A Day

Once a quarter, four hours of deep work. The maintenance pass that's strategic, not tactical.

Re-platform check. Cowork ships new capabilities monthly. Connector list expands, plugin marketplace adds vertical bundles, scheduled task primitives evolve. A skill written in month 1 against a missing capability may now be redundant because a native connector covers the use case. Walk the library. Anything you'd architect differently today, redesign now while it's quarterly maintenance, not in month 11 when it's an emergency.

Vertical bundle evaluation. Anthropic shipped Legal in May 2026 and Marketing Ops shortly after. If your work overlaps a bundle vertical, install the bundle in a sandbox and run /customize end to end. Either it absorbs three of your custom skills and you delete them, or it doesn't fit your workflow and you confirm the custom build is the right call. Either outcome is useful.

Sub-agent retune. Sub-agents written three months ago against the model's three-months-ago behaviour are almost always sharper than they need to be now. Walk each sub-agent, check the compression instructions, check the brief, check the return format. Tighten what got loose. Loosen what got over-constrained.

Cost trend. Pull the quarter's spend. Trend up or down? Per-active-day, per-operator. Is the trend explained by workload, by drift, or by a specific decision? If you can't point to a specific cause for the direction of the trend, the answer is drift, and you have a month to fix it before the next quarterly check.

~$13Per active day, industry average

90%Of users stay under $30/day

4Hours/quarter for full strategic maintenance

What This Looks Like As A Retained Service

Most ops teams don't have the discipline to run this loop themselves. That's not a criticism. It's a sequencing reality. The operator who is qualified to do the maintenance is the same operator using the stack day to day, and they're heads-down on the work the stack is supposed to support. Friday at 4pm, the maintenance loop loses to one more client deliverable. Every time.

You can write your own maintenance loop. You can also retain someone whose ops practice has burned through every credit anti-pattern and skill-library failure mode already, on their own stack and across client engagements. The loop is the retainer in a different costume.

This is where continuous ops support earns its keep. The retained version of the loop above looks like:

Weekly: a 30-minute audit run from outside the team. Skill usage scan, MCP bloat check, /compact discipline review, two-strike prune list. Delivered as a tight written brief, not a meeting.

Monthly: a 2-hour deep pass with the operator. Cost review, /customize on every active plugin, skill anti-pattern audit, execute the flagged deletions. One working call, one summary.

Quarterly: a half-day strategic review. Re-platform check, vertical bundle evaluation, sub-agent retune, cost trend with a written read on direction and cause.

The framing for the client is straightforward: the Cowork stack is infrastructure now. Infrastructure needs maintenance. The choice is not whether maintenance happens. The choice is whether it happens on a cadence, or whether it happens in an emergency in month 8 when the bill triples and the output quality drops and nobody can point to the cause.

That's the whole loop. Weekly is hygiene. Monthly is tuning. Quarterly is strategy. Run all three on cadence and the stack compounds. Skip any of the three and you're paying for drift on a schedule.

→

Want this running on your ops? Book a free 45-min ops mapping call. We'll audit your stack, find the bottlenecks, and show you where Cowork moves the needle. cal.com/formaum/45

Run on a stack that's holding you back?

Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.

Book a call

Frequently Asked Questions

How often should I actually run /compact on a long Cowork session?

Every 30-45 minutes on any session over 90 minutes. Disciplined /compact cadence cuts active context by 60-80% on long sessions, which is the single largest controllable lever on per-day cost. Set a timer if the muscle isn't there yet. After two weeks of the timer, it becomes automatic.

How do I know which MCP server is burning my context budget?

Audit the connected server list and check each one's token footprint per turn. A noisy server can eat 18,000+ tokens before you type a character. Disconnect any server you haven't used in the last 7 days. If costs drop after disconnect, you found the leak. If costs don't drop, the leak is elsewhere and you've still simplified the stack.

Is $30/day per operator a hard ceiling for Cowork costs?

It's a benchmark, not a ceiling. Industry average lands around $13/day per active operator and 90% of users stay under $30/day. Above $30/day consistently, the cause is almost always one of four things: sub-agent fan-out, autocompact cascades on long sessions, MCP bloat, or wrong-model defaults. Diagnose specifically before raising the budget.

Should I rewrite skills every quarter or just tune them?

Tune monthly, rewrite quarterly only if Cowork shipped a primitive that makes the original architecture obsolete. Most skills survive 6-12 months of tuning before they need a real rewrite. The quarterly re-platform check is where you catch the ones that don't, not a default rewrite cadence.

What's the single most common Cowork maintenance failure?

Skipping the /customize pass on installed plugins. Out of the box, plugins ship with bloated trigger descriptions, sub-skills nobody uses, and wrong-model defaults on at least one skill. The /customize pass takes 15 minutes per plugin and pays for itself inside a week. Almost nobody runs it.

Genevieve Claire

Founder, Formaum — Claude Code Expert & Full-Stack AI Engineer

Builds bespoke AI automation systems for multi-location operations. Previously EA Sports FIFA ($7B franchise) and Film/TV VFX on Skyfall, Avengers, Game of Thrones. Based in Vancouver, BC.

Why Cowork Drifts (And Why Nobody Flags It)

Weekly Maintenance: 30 Minutes

Monthly: 2 Hours

Quarterly: Half A Day

What This Looks Like As A Retained Service

Run on a stack that's holding you back?

Frequently Asked Questions

Genevieve Claire

Related field notes

Six Months Running Formaum on Claude Cowork

The 30-Day Cowork Rollout For a Small Ops Team

Five Cowork Skills Every Owner-Operator Should Write First