Train Your AI The Way You'd Onboard A Junior

Train your AI the way you'd onboard a junior. Same week one. Same week one mistakes.

HBR opened this framing in March. The article was sharp on the parallel and silent on the playbook. This is the playbook. Four weeks, the pattern I use to train Claude skills inside client engagements running in production.

The Premise

If you hired a junior on Monday, you would not hand them a 200-page SOP binder and walk away. You would tell them the rules, watch them work for a week, give them a real task, and correct mistakes as they happen. You would not expect mastery on day three. You would expect competence on day thirty.

AI agents are the same shape. The Anthropic skill system, the open Agent Skills standard published December 2025, even the way Cowork plugins are structured, all of it rewards the same onboarding cadence a good manager already knows. Rules first, shadow second, autonomy third, feedback loop forever.

The mistake people make is treating a skill as documentation. Documentation is what you write when you do not know if anyone will read it. A skill is an instruction set with a trigger condition. Different artifact, different rules.

Week 1. Tell It The Rules

This is your CLAUDE.md, your system prompt, your employee handbook. The thing you would tell a new hire on day one. How we talk to clients, what tone we use, what we never say, how we handle money conversations, what tools we have, what the daily ritual looks like.

My own CLAUDE.md is thirty-something locked rules and a global voice guide. Every locked rule was earned. Something broke, I wrote the rule, the same thing has not broken again. The Anthropic guidance is the same. Start with the rules of the house before you give the agent any task.

What goes in week one. Identity and positioning, so the agent sounds like the operation. Voice rules, so its output passes the in-house edit. Operational defaults, so it knows what to do without being asked every time. Banned phrases and patterns, so it stops making the same mistakes a junior would make on day three.

What does not go in week one. Detailed task instructions for specific workflows. That comes later. Week one is the constitution, not the procedure manual.

→

The test for a week-one rule: if a human hire violated it, would you correct them on the spot? If yes, it belongs in the handbook. If no, it belongs in a skill body later.

Week 2. Shadow Then Handoff

Now you give the agent its first real job. You do not write a perfect skill. You scaffold one, run the work manually a few times, then promote the procedure into a skill.

The on-ramp Anthropic recommends is the skill-creator skill. It interviews you about the workflow, generates the SKILL.md, tests the trigger conditions, iterates. That is the right entry point. You do not need to memorise the YAML frontmatter spec to ship your first skill. You need to describe the job out loud and let the scaffolder do the structural work.

What a skill actually is. A directory with a SKILL.md inside it. The SKILL.md has YAML frontmatter at the top, markdown body below. Frontmatter answers how it runs. Permissions, model, triggers, name. Body answers what to do. Steps, examples, edge cases.

Week two discipline. Run the workflow manually three times with the agent assisting. Note what you correct. Capture the corrections as rules inside the skill body. Only then promote to a triggered skill. The shadow period is where you learn what the skill actually has to know. Skip it and you ship a skill that only works in the one example you had in mind.

The trigger condition is the part most people get wrong. A skill is invoked by the trigger, not by your hope that the agent picks the right tool. The trigger has to be specific enough that the agent reaches for the skill at the right moment and specific enough that it does not reach for it at the wrong moment. Both directions matter.

What gets missed when nobody on the team has onboarded skills into production before. A junior needs a mentor who has done the job. Same for the agent. The mentor surface here is everything a non-engineer cannot recognise as missing until production breaks: the data contracts between skills, the rollback triggers when a write fails halfway, the observability layer that tells you which skills are actually firing versus which are quietly being skipped, the integration contracts that survive a connector outage, the schema decisions that make the vault legible to every future skill. Those are the architectural choices the agent inherits. Get them wrong and every skill written on top of them is fragile by inheritance.

Week 3. Progressive Disclosure

This is the most important architectural concept in the entire skill system and the one most home-grown skills get wrong.

Progressive disclosure means the skill shows enough at the top to trigger correctly and reveals the detail only when needed. The trigger description is short, the body unfolds on use, the deep documentation lives in linked files that load only on demand. The agent does not pre-load every word of every skill into context at session start. It pre-loads the trigger conditions and reads the body when it actually invokes the skill.

Why this matters. Skill libraries share a context budget around two percent of the window. Load twenty bloated skills with no progressive disclosure and the agent forgets they exist because the index they sit in is too noisy to read. You wrote the skills and the agent does not reach for them. Classic failure.

The fix. Trigger description sized for recognition, not explanation. Body sized for the actual work. Reference docs in separate files that the skill can load on demand. The HBR junior parallel still holds. You do not put the full standard operating procedure on the employee's first day desk. You hand them a one-page rules card and tell them where the manual lives.

What this looks like in practice. A trigger that says when to fire, three lines. A body that walks the steps, half a screen. A linked reference doc that holds the deep detail, loaded only if the agent needs to consult it. Three layers, each smaller than the one below. That is the shape.

~2%Context Budget Shared Across All Loaded Skills

20+Loaded Skills Where Forget-It-Exists Behaviour Kicks In

Dec 2025Agent Skills Open Standard Published

Week 4. Feedback Loop

By week four the agent is doing real work. The job now is to build the loop that turns mistakes into rules and rules into compressed instinct.

I run this as lab notes. Something goes wrong, I write a note. Same thing goes wrong twice, the note gets promoted into a locked rule. Same rule violated three times, the rule gets stronger or moves higher in the hierarchy. Lab notes older than thirty days and never repeated, prune. It is a small ritual and it is the difference between a stack that gets sharper every month and a stack that drifts.

What gets corrected in week four. Voice drift, tool drift, rule violations, blind spots in trigger conditions. The pattern matters more than any single fix. You are not fixing a bug, you are training the system to catch the pattern next time. Same way you would coach a junior. The point is not the one mistake. The point is the class of mistake.

Promotion threshold. One occurrence, lab note. Two occurrences, rule. Three violations of the same rule, escalate. That cadence keeps the rule set tight enough to actually be loaded and respected, and aggressive enough that real patterns get codified before they become invisible.

What Changes Versus Human Onboarding

The parallel holds for most of the work. Three differences matter.

One. No memory across sessions without explicit hooks. A human junior remembers Tuesday's correction on Wednesday. An agent does not, unless you write the correction into the handbook or the skill body. Every lesson you teach has to land in a file or it is gone at the next session start. This is the biggest single difference and the reason CLAUDE.md and skill bodies have to be treated as the actual training surface. The chat window is not training. The file is training.

Two. No peer learning. A junior on a team picks things up by osmosis. They overhear standups, watch how seniors handle clients, internalise the vibe. An agent learns only from what you put in front of it. The whole context has to be explicit. That cuts both ways. You lose the ambient pickup, but you gain a system where the standards are auditable instead of cultural.

Three. Hallucination cost. A junior who does not know something asks. An agent who does not know something can invent. The cost of a confidently wrong output in front of a client is real, and the mitigation has to be designed in. Source discipline rules in the handbook, guardrails in critical skills, human-in-the-loop on anything client-bound that has not been stress-tested. A junior gets the benefit of the doubt while they learn. An agent does not, because the failure mode looks identical to success until someone catches it.

The Onboarding Map

Week one, the handbook. CLAUDE.md, voice rules, identity, banned patterns. Constitution before procedure.

Week two, shadow then handoff. Use the skill-creator skill, run the work three times, capture corrections, promote to triggered skill. Trigger condition is the load-bearing part.

Week three, progressive disclosure. Trigger description small, body sized to the work, deep reference docs linked and loaded on demand. Respect the two percent context budget.

Week four, feedback loop. Lab note on first occurrence, rule on second, escalation on third, prune at thirty days. Train the pattern, not the bug.

Forever, the loop runs. The cadence does not stop at week four. It is the operating mode. A stack that has been onboarded well looks like a junior who has been with you a year. Quiet, fast, gets the obvious stuff right, escalates the interesting stuff. That is the bar.

→

Want this running on your ops? Book a free 45-min ops mapping call. We'll audit your stack, find the bottlenecks, and show you where Cowork moves the needle. cal.com/formaum/45

Run on a stack that's holding you back?

Book a 45-minute discovery call. I'll map what moves, what stays, and what makes sense for your operation.

Book a call

Frequently Asked Questions

What is a Claude skill, technically?

A directory containing a SKILL.md file. The SKILL.md has YAML frontmatter at the top defining how the skill runs (name, triggers, permissions, model) and a markdown body defining what to do. Optional reference files can sit alongside and be loaded on demand. The Agent Skills open standard was published in December 2025 at agentskills.io.

What is progressive disclosure and why does it matter?

Progressive disclosure means the skill shows just enough at the top to be triggered correctly and reveals detail only when invoked. Trigger description small, body sized to the work, deep reference docs linked and loaded on demand. It matters because all loaded skills share a context budget around two percent of the window. Bloat the triggers and the agent stops reaching for skills it should be using.

Where do I start if I have never built a skill?

Use the skill-creator skill that Anthropic ships. It interviews you about the workflow you want to encode, generates the SKILL.md, helps you test the trigger conditions, and iterates. You do not need to memorise the YAML spec to ship your first skill. Describe the job out loud and let the scaffolder do the structural work.

How is training an AI agent different from onboarding a junior?

Three differences. No memory across sessions unless you write lessons into the handbook or skill bodies. No peer learning, so the whole context has to be explicit instead of absorbed. And hallucination cost, because an agent that does not know something can invent confidently rather than ask. The mitigations are source discipline rules, guardrails, and human review on client-bound output.

When does a lab note become a locked rule?

On the second occurrence of the same pattern. One occurrence is a note. Two is a rule. Three violations of the same rule means the rule needs to be stronger or moved higher in the hierarchy. Lab notes older than thirty days that never repeated get pruned. This keeps the rule set tight enough to actually be loaded and aggressive enough that real patterns get codified before they become invisible.

Genevieve Claire

Founder, Formaum — Claude Code Expert & Full-Stack AI Engineer

Builds bespoke AI automation systems for multi-location operations. Previously EA Sports FIFA ($7B franchise) and Film/TV VFX on Skyfall, Avengers, Game of Thrones. Based in Vancouver, BC.

The Premise

Week 1. Tell It The Rules

Week 2. Shadow Then Handoff

Week 3. Progressive Disclosure

Week 4. Feedback Loop

What Changes Versus Human Onboarding

The Onboarding Map

Run on a stack that's holding you back?

Frequently Asked Questions

Genevieve Claire

Related field notes

Six Months Running Formaum on Claude Cowork

The 30-Day Cowork Rollout For a Small Ops Team

Five Cowork Skills Every Owner-Operator Should Write First