I Ran My Entire Content Operation on Claude Code's New Agent Teams Feature. Here Is the Full Setup.

What happens when you stop using AI as a tool and start running it as a team

Mar 26, 2026

This is not a guide on how to get started with Claude Code.

If you want that, Daria Cupareanu wrote a solid beginner walkthrough last week. Read it. It covers the basics well.

This is something different. This is what happens when you take Agent Teams — Anthropic’s new research preview that lets multiple Claude Code instances work on a shared project simultaneously — and run it against a real content production workflow under deadline.

I had a window to test it. Anthropic doubled Claude Code usage quotas through March 27. Off-peak hours. All plans. Unused GPU capacity. The practical result: two to three times the context you normally get during overnight sessions.

So I ran the experiment. I set up three agents on a single content sprint. Here is the architecture, the config, what broke, and what I would change.

Why Agent Teams matters for operators

Most Claude Code workflows follow the same pattern: one session, one task, one operator in the terminal managing the handoffs.

You write a prompt. Claude executes. You review the output, adjust, write another prompt. It works. It scales surprisingly well for solo operators. But there is a ceiling, and it is the context ceiling.

A single Claude Code session can only hold so much. A competitive research pass, a full post draft, a QA review against a brand voice file — running all three in one session means each task gets a fraction of the available context. The research is thin. The draft is generic. The QA misses things.

Agent Teams changes the architecture. Each agent gets its own session with its own full context window. They work on the same project. They can communicate results rather than just reporting back to a central orchestration prompt.

For a content operation, the implication is direct: the research agent knows the competitive set. The draft agent knows the brand voice file. The QA agent knows the pre-publish checklist. None of them are competing for context with each other.

That is the shift. Not a feature upgrade — an architectural one.

How to enable it and trigger your first sprint

Before anything else: Agent Teams is disabled by default. One line enables it.

Open your Claude Code settings.json and add:

{
  "env": {
    "CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS": "1"
  }
}

You need Claude Code v2.1.32 or later and a Pro or Max plan. If you are on the free tier, this feature is not available. Full documentation is at code.claude.com/docs/en/agent-teams.

Once enabled, triggering a team is plain language. You describe the team you want and the lead agent handles spawning. No agent definition file. No workflow schema. You tell Claude what roles you need and it builds the team.

For a content sprint, the trigger prompt I used:

Create an agent team for a content production sprint.
Spawn three teammates:
- Research agent: scan the competitive publications listed in CLAUDE.md and return a collision brief
- Draft agent: write a full post draft using the brand voice file and post brief in CLAUDE.md
- QA agent: review the draft against the pre-publish checklist and banned word list in CLAUDE.md

Research agent completes first. I will review and approve the brief before it goes to Draft agent.
Draft completes second. I will review before it goes to QA agent.
QA agent returns a flagged issue list only — no rewrites.

A few things worth knowing before you run it.

Teammates start blank. They load your CLAUDE.md automatically, but they do not inherit the lead’s conversation history. Everything each agent needs to do its job must be in the spawn prompt or in CLAUDE.md. Thin spawn prompts produce thin output.

The task list is your control panel. Hit Ctrl+T to see what every teammate is working on and what is queued. If an agent goes off-track, this is where you catch it early.

Switching between teammates. In-process mode (the default) runs all agents in one terminal. Shift+Up/Down cycles through active teammates. Press Enter to open a teammate’s session, Escape to interrupt their turn. If you have tmux or iTerm2, you can run split-pane mode and watch each agent in its own panel. That is the setup worth having for anything longer than a quick sprint.

Token cost scales with team size. Each teammate has its own context window. Three agents on a content sprint costs roughly three times the tokens of a single session. For research and review work, the tradeoff is worth it. For a single short draft, it is not.

The setup

Three agents. One content sprint. Here is what each one was running.

Agent 1: Research

Scope: Scan the six monitored publications in the competitive set. Return a structured brief covering: (1) what each published in the last seven days, (2) collision risks against the queued post topics, (3) any green opportunity windows.

Context loaded: Competitive intelligence file. Publication list. Collision classification logic (Red / Yellow / Green). Owned gaps list.

Output format: Structured markdown brief with collision flags, trend signals, and a recommended search list for the next sprint.

Human checkpoint: I reviewed the brief before Agent 2 received it. This is not optional. Agent 1 occasionally flags something as a Red collision when it is a Yellow — the topic is covered, but the angle is different. That call requires human judgment.

Agent 2: Draft

Scope: Take the approved research brief and write a full post draft. Target word count: 1,200 to 1,800 words. Apply the brand voice file and FSA post structure rules.

Context loaded: Brand voice file with banned words, writing rules, and section format specs. The confirmed post brief (angle, format, CTA, owned gap alignment). The competitive brief from Agent 1, post-review.

Output format: Full draft in markdown. Subtitle. Tags. LinkedIn companion post.

Human checkpoint: I reviewed the draft before Agent 3 received it. I made two structural changes — the original hook was hedged, and the ending drifted toward a motivational close. Cut both.

Agent 3: QA

Scope: Review the draft against the pre-publish checklist. Flag every banned word. Flag AI content tells. Flag structural issues: predictable wrap-up summaries, rule of three patterns, significance inflation, hollow affirmations. Return a specific list of issues with line references.

Context loaded: Pre-publish checklist. Banned word list. Anti-AI pattern catalog.

Output format: Issue list with line references and suggested rewrites. No full redraft — specific flags only.

Human checkpoint: I reviewed the QA report and made the edits myself. Agent 3 is not authorized to rewrite. It finds problems. I fix them.

What the quota window made possible

Normally, running three sessions this size back-to-back during peak hours compresses each one. The research pass gets shorter. The draft prompt gets more constrained. The QA run skips edge cases.

During the doubled-quota window, I ran all three sessions overnight. Each agent had full context. The research brief came back with six publications checked, collision logic applied, and a trend signal section I would normally have skimmed or skipped. The draft came back closer to final than usual. The QA pass caught four banned words and one classic AI tell I had missed in my own review.

Total human time in the session: about 20 minutes. Three checkpoints plus the final edit pass.

That is the number worth sitting with. Not the agent count. Not the feature name. Twenty minutes of human time to run a content sprint that would have taken three to four hours without the system.

What broke

Two things.

Agent 2 does not know what it does not know. The draft agent works from the brief I approve and the brand voice file it has loaded. It has no way to know whether a specific claim is accurate unless that claim is in its context. One draft came back with a stat I could not verify. The QA agent did not catch it because QA is checking style and structure, not factual accuracy.

Fact-checking stays human. Full stop.

Agent handoffs require explicit format specs. The first time Agent 2 received the research brief from Agent 1, the format was loose enough that Agent 2 treated the collision flags as context rather than constraints. The draft went in a direction that was technically a Yellow collision topic. I caught it in the human checkpoint, but it added a round of revision.

Fixed by adding an explicit instruction to Agent 2: “Treat any Yellow collision flag as a constraint on the angle, not just context. If the flag says differentiate via operator depth, the draft must establish that distinction in the first two paragraphs.”

The fix worked. The next sprint came back clean.

What the handoff protocol actually looks like

The piece nobody writes about with Agent Teams is the handoff. The capability is impressive. The protocol is where it breaks.

Here is the protocol I am running now:

Agent 1 completes research. Returns brief.
Human reviews brief. Approves or flags revisions.
Agent 1 returns revised brief if needed. Human approves.
Brief goes to Agent 2 with explicit constraints included — not just the competitive context, but the behavioral rules that context implies.
Agent 2 returns draft.
Human reviews draft. Structural changes made by human. Style changes flagged for Agent 3.
Draft goes to Agent 3 with the issue log from human review pre-populated so Agent 3 focuses its pass on the remaining checks, not the ones already addressed.
Agent 3 returns QA report.
Human makes final edits. Ships.

Four human touchpoints. None of them are optional. The system breaks if you remove any of them.

That is not a criticism of Agent Teams. That is the architecture. The agents handle the speed and scale. The human handles the judgment. Removing the judgment layer does not make the system faster. It makes the output wrong faster.

Who this setup is for

This is not a beginner workflow. If you are still figuring out how to write a CLAUDE.md file or structure your first session, build that foundation first.

If you are already running Claude Code regularly and hitting the context ceiling — research competing with drafting competing with QA in a single session — Agent Teams is the upgrade worth testing.

The quota window closes March 27. Use it.

Run the three-agent setup on your next content sprint. Send me what broke.

Pawel Jozefiak

Apr 1

Ran a similar experiment but from the single-agent direction - kept trying to squeeze more out of one context window before splitting. Your parallel research+draft+QA split is much cleaner than how I approached it. The 20-minute active involvement number is where I got stuck too, though for a different reason: review queues compound fast when running multiple projects. Ended up thinking hard about what 'agentic' actually means versus what we're calling it.

My take after testing Claude's new computer use alongside this kind of workflow: https://thoughts.jock.pl/p/claude-cowork-dispatch-computer-use-honest-agent-review-2026 - the distinction between assisted and autonomous matters more than the demos suggest.

Full Stack Agents

Discussion about this post

Ready for more?