Why We Let the Smartest Model Run the Show (And Why You Should Too)

June 15, 2026 By Big Kel 19 min read

Baseball manager on dugout step with lineup card showing AI agent names and crossed-out lesser models

A field report from a homelab where the orchestrator is the brain, not the budget option.

Every multi-agent system faces the same architectural question: who's in charge? The conventional wisdom — and I fell for it hard — says you put a cheap, lightweight model at the top. It routes tasks, it doesn't need to be smart, it just needs to be fast. The heavy models do the real work.

We ran that way for months. It was fine. It was also wrong.

Here's what we learned when we flipped the hierarchy and put the smartest model in the orchestrator seat, and why it changed everything about how our agents work together.

The Old Way: Cheap Router, Expensive Workers

The setup was straightforward. We had three agents, each backed by a different cloud-hosted model:

Scout — a lightweight generalist (glm-5.1, 30-turn budget). Quick, cheap, good for simple lookups.
Cody — a code-specialized model (kimi-k2.7-code, 60-turn budget). The workhorse for debugging, config edits, and research.
Ginger — the heavy lifter (deepseek-v4-pro, 90-turn budget). Complex reasoning, architecture decisions, financial logic.

Under the old regime, Scout or a similarly lightweight model sat at the top as orchestrator. The theory was sound: routing is mechanical. Why burn expensive tokens on "file a ticket for Cody, file a ticket for Ginger"? Save the big model for the hard stuff.

The practice was less sound.

Scout would receive a request like "Research Solar-Assistant compatibility across 26 inverter manufacturers and update all the documentation" and produce a ticket that read, approximately:

Title: Research Solar-Assistant stuff
Body: Look at the website and figure out which inverters work with it. Update the docs.

That's it. That's the handoff. A $0.0003 prompt producing a context package that a $0.02 model now has to decode, guess at, and backfill through trial and error.

Cody would then spend its first 15 turns just figuring out what the task actually meant. Which manufacturers? Which docs? What format? Where are the files? What counts as "compatible"? The expensive model was burning its budget on clarification that the cheap orchestrator should have provided upfront.

We were optimizing the wrong thing.

The Realization

The moment of clarity came during a discussion about the Tesla proxy project — a data pipeline that pulls telemetry from a vehicle named Augustus, normalizes it, and publishes it to Home Assistant via MQTT. The architecture had gotten tangled. Components that should have been cleanly separated were bleeding into each other. The delegation prompts were vague. Workers were producing output that didn't fit together.

I said out loud what I'd been suspecting: "I have always made the orchestrator the smallest or least complex LLM. I am now thinking that I should have been letting the most sophisticated LLM be the orchestrator."

The reasoning was two-fold:

Context quality is the highest-leverage variable in delegation. A smart orchestrator produces detailed, scoped, example-rich handoff prompts. A weak orchestrator produces vague ones. The downstream cost of vague prompts — clarification loops, wrong assumptions, rework — dwarfs the token savings of using a cheap router.
A sophisticated model knows better than any other model what can be delegated. It has superior meta-cognition about model capabilities. It knows that Cody is code-specialized and shouldn't be asked to write creative prose. It knows that Scout's 30-turn budget means it can handle a single-file CSS tweak but not a 26-manufacturer research sweep. The cheap orchestrator lacks this judgment.

We flipped the switch. Ginger became the orchestrator. Everything changed.

How It Works Now

The Orchestrator: Ginger

Ginger runs on deepseek-v4-pro with a 90-turn budget. It doesn't do the work — it designs the work. When a task arrives, Ginger:

Decomposes it into discrete, independent subtasks
Routes each subtask to the right agent based on capability modeling
Crafts a detailed kanban ticket with flat, exhaustive instructions
Reviews worker output and synthesizes results for the user

The key word is flat. No nested dependencies. No "figure it out as you go." Every ticket is a self-contained mission brief.

The Workers: Cody and Scout

Cody (kimi-k2.7-code, 60 turns) handles code changes, system fixes, research, data plumbing, and anything requiring web search plus file edits. Scout (glm-5.1, 30 turns) handles quick UI tweaks, single-file changes, and lightweight research tasks.

Neither worker has to guess. The ticket tells them exactly what to do, where the files are, what the success criteria are, and what pitfalls to avoid.

The Kanban Board

This is the backbone. Every task becomes a ticket on a shared kanban board. The dispatcher picks up ready tickets and assigns them to the appropriate worker profile. Workers claim tickets, execute, and mark them done with a summary.

The board gives us:

Visibility: every task, its status, its assignee, its result
Accountability: workers can't silently fail — the board shows blocked/done/stale
Parallelism: independent tasks run concurrently
Auditability: every completed ticket is a permanent record of what was done and why

The Lifecycle of a Ticket

Here's the full journey, from user request to completed work:

User asks Ginger something. "Research Solar-Assistant compatibility." "Fix the weather tile layout." "Why is the crypto bot stuck?"
Ginger triages. Is this a single task or does it need decomposition? If it's broad ("research all 26 manufacturers"), Ginger breaks it into sub-tickets. If it's focused ("fix one CSS file"), Ginger writes one ticket.
Ginger writes the ticket. This is where the smart model earns its keep. The ticket body includes: exact file paths, SSH access details, research queries to run, output format, success criteria, known pitfalls, and links to relevant documentation. Nothing is left to inference.
The dispatcher picks it up. Every 60 seconds, the kanban dispatcher scans for ready tickets and assigns them to the appropriate worker profile. If a ticket is assigned to default, the auto-decomposer splits it further before dispatch.
The worker executes. Cody or Scout claims the ticket, reads the body, and starts working. Because the instructions are flat and complete, the worker typically goes straight to execution — no clarification loops, no "wait, what did you mean?"
The worker delivers. When done, the worker marks the ticket done with a one-paragraph summary of what was accomplished. If something goes wrong, the worker marks it blocked with an explanation.
Ginger synthesizes. The orchestrator reads the completed ticket, verifies the output, and presents the results to the user. If multiple tickets were part of the same request, Ginger weaves them together into a coherent response.

This pipeline runs continuously. At any given moment, there might be zero, one, or three tickets in flight — all independent, all self-contained, none waiting on each other.

A Real Example: The Solar-Assistant Research Task

Here's what a Ginger-orchestrated ticket actually looks like. This is the real ticket body for the Solar-Assistant research task that Cody completed:

# Solar-Assistant Research Task

Research Solar-Assistant (https://solar-assistant.io) — the third-party
monitoring platform that runs on Raspberry Pi and connects directly to
inverters/batteries via USB, RS485, RS232, or CAN bus.

## Phase 1: Methodology Write-Up
Create a comprehensive Solar-Assistant.md document covering:
- Platform overview, supported hardware, connection methods
- Protocol support matrix (SunSpec Modbus, PIP/Voltronic, Deye/Sunsynk,
  EG4/Luxpower, Growatt, SRNE, Victron VE.Direct, Huawei, SolaX, Pylontech)
- MQTT topic structure and Home Assistant integration patterns

## Phase 2: Update Each Manufacturer Document
For every .md file in Manufacturers/*/, add a "Solar-Assistant Compatibility"
section with:
1. Whether SA supports this brand (✅ yes / ⚠️ partial / ❌ no)
2. Exact connection method, port, baud rate, device ID
3. Which protocol/inverter type to select in SA
4. Known issues or limitations
5. Link back to the main methodology document

## Priority Manufacturers
EG4, Luxpower, Sol-Ark, Deye/Sunsynk, Voltronic, MPP Solar, Growatt,
SRNE, Eco-Worthy, Victron, SMA, Fronius, GoodWe, Huawei, SolaX, Solis,
Sungrow, SolarEdge

## Research Sources
- Solar-Assistant official site and docs
- DIY Solar Forum: site:diysolarforum.com "Solar Assistant" [brand]
- Home Assistant Community: site:community.home-assistant.io "Solar Assistant"
- GitHub projects bridging SA → HA

## Deliverables
1. Projects/PV Control and Status/Solar-Assistant/Solar-Assistant.md
2. Updated Manufacturers/*/ *.md files with compatibility sections
3. Comment on this ticket with summary and key resource links

This is not a vague suggestion. It's a specification. Cody didn't have to ask "which manufacturers?" or "what format?" or "where do the files go?" It was all there, flat on the page.

The result: Cody completed the entire task — methodology document plus 27 manufacturer updates — in a single session, with zero clarification loops. The ticket came back done with a clean summary of what was supported, what wasn't, and links to every resource used.

Compare that to the old way, where a vague ticket would have triggered a cascade of "wait, what did you mean by..." exchanges, burning turns and tokens on both sides.

Another Example: The Weather Tile UI Tweak

Not every task is a 27-manufacturer research sweep. Some are tiny. But the pattern holds.

The user wanted a dashboard weather card restructured: temperature left-justified, "feels like" as a two-line right-justified stack. A five-minute CSS change. Under the old regime, the cheap orchestrator might have just done it inline — reached for the file editor, made the change, moved on.

But that's not how we operate now. Ginger filed a ticket for Scout:

## Updated layout spec

The weather card should now be:

Weather · WeatherFlow
92.2°F          like
               102°
Humidity      72 %
...

Layout rules:
- 92.2°F — left-justified, big number
- "like" — right-justified, small muted text, top line
- "102°" — right-justified, small muted text, bottom line
- Feels-like hidden when |feels − actual| < 2°F

Implementation approach:
- Replace inline <span> with flex row
- File: /root/eg4panel/index.html on webb (ssh webb)
- Restart server after editing: ssh webb systemctl restart eg4panel

Scout — the lightweight model with a 30-turn budget — had everything it needed. File path, SSH access, exact HTML structure, CSS guidance, restart command. It took the ticket, made the change, marked it done.

The orchestrator didn't touch the file. The worker didn't ask a single question. The user got exactly what they wanted.

This is the pattern at every scale: from a one-line CSS fix to a 26-manufacturer documentation sweep, the ticket is complete, flat, and self-contained.

The Economics (It's Not What You Think)

The obvious objection: "But you're burning expensive tokens on task decomposition!"

Let's do the napkin math.

Old way (cheap orchestrator):

Orchestrator decomposes task: $0.002 (cheap model, vague output)
Worker clarifies and backfills: $0.08 (expensive model, 15 turns of guesswork)
Worker executes: $0.12 (expensive model, actual work)
Rework from miscommunication: $0.06 (expensive model, fixing wrong assumptions)
Total: $0.262

New way (smart orchestrator):

Orchestrator decomposes task: $0.04 (expensive model, detailed output)
Worker executes directly: $0.12 (expensive model, no clarification needed)
Total: $0.16

The smart orchestrator is cheaper overall because it eliminates the expensive downstream waste. The token premium on decomposition is a fraction of the token waste on clarification and rework.

And that's just the direct cost. The indirect costs of the old way — slower delivery, user frustration, tasks that stall because the worker can't figure out what was meant — are harder to quantify but far more significant.

There's also a subtler economic argument: token efficiency is not the same as cost efficiency. A cheap orchestrator that produces a 50-word ticket looks efficient on paper. But that 50-word ticket forces the expensive worker to spend 2,000 tokens on clarification. A smart orchestrator that produces a 500-word ticket looks "wasteful" — until you realize it saved 2,000 tokens downstream. The system-level math is what matters, not the per-component math.

The Flat Instruction Principle

This is the single most important lesson we've learned: flat instructions beat hierarchical ones.

When you give a worker a task with nested dependencies — "first figure out X, then based on that do Y, and if Y looks like Z then also do W" — you're asking a model with finite context and no persistent memory to maintain state across multiple decision points. It will drop something. It will forget the original goal. It will go down a rabbit hole and emerge with something that doesn't fit.

Flat instructions say: here is everything you need to know, right now, in one place. The file paths. The format. The success criteria. The pitfalls. The research queries to run. The deliverables. Nothing is left to inference.

This is why Ginger's tickets are long. Not because Ginger is verbose — because Ginger is complete. A 500-word ticket body that eliminates 15 turns of clarification is not bloat; it's the most efficient thing in the system.

The flatness also enables parallelism. When every ticket is self-contained, the dispatcher can run three at once without any of them stepping on each other. Hierarchical tickets create dependency chains that force sequential execution. Flat tickets create independent work units that can all run simultaneously.

Capability-Aware Routing

Another thing the smart orchestrator does that the cheap one can't: it actually knows what each worker is good at.

We maintain an explicit rubric:

Code changes, debugging, system fixes → Cody. Code-specialized model, 60 turns, SSH access.
UI tweaks, single-file changes → Scout. Lightweight, fast, 30 turns is plenty.
Architecture, triage, financial logic → Ginger. Strong reasoning, 90 turns, keeps the big picture.
Broad research sweeps → auto-decomposed. Too big for one worker, split into sub-tickets.

The cheap orchestrator would route a creative writing task to Cody because "Cody does most of the work." But Cody runs on kimi-k2.7-code — a model whose name literally contains the word "code." It's not built for prose. Ginger knows this. Ginger routes writing to itself and code to Cody.

This isn't a subtle optimization. It's the difference between a task succeeding in 20 turns and a task burning 60 turns producing mediocre output that then needs to be redone.

The rubric isn't static, either. It evolves as we learn. When we discovered that Scout was better at web research than we'd assumed, we updated the routing rules. When we realized that Ginger should never touch remote HTML files directly (a lesson learned the hard way), we encoded that as a hard boundary. The orchestrator maintains the rubric; the rubric guides the orchestrator. It's a feedback loop.

Boundary Enforcement: The Orchestrator Doesn't Do Worker Work

This deserves its own section because it's the rule we break most often — and every time we break it, we regret it.

The orchestrator's job is to design work, not do work. When Ginger reaches for a file editor to tweak CSS on a remote server, two things go wrong:

Ginger burns its expensive context on mechanical work. Those 90 turns are for reasoning, not for ssh webb patch index.html. Every turn spent editing is a turn not spent orchestrating.
The boundary gets fuzzy. If Ginger sometimes does worker tasks and sometimes delegates them, the user can't predict who owns what. "Wait, did Ginger fix that or did Cody? Who do I ask about the weather tile?"

The rule is now explicit: Ginger does not perform inline HTML/CSS/JS edits, SSH-based remote file edits, or code changes. Those are Cody/Scout territory. If Ginger finds itself reaching for patch or write_file on a remote host, it stops and files a kanban ticket instead.

The user reinforced this recently with a three-word correction: "Not you. Cody." That's now encoded in the agent assignment rubric as a hard constraint. The orchestrator orchestrates. Workers work. The line is bright.

What Happens When Things Go Wrong

No system is perfect. Tickets get blocked. Workers hit failure limits. Here's how we handle it:

Blocked tickets. If a worker can't complete a task — missing credentials, unreachable host, ambiguous requirements that even the detailed ticket couldn't resolve — it marks the ticket blocked with an explanation. The orchestrator reads the blocked ticket, diagnoses the issue, and either fixes the blocker or reassigns the ticket with adjusted scope.

Stale tickets. The dispatcher has a stale timeout: if a ticket sits in running for four hours with no activity, it gets flagged. The orchestrator investigates — did the worker crash? Is it stuck in a loop? — and either reassigns or re-decomposes.

Failure limits. Each worker has a failure limit (currently 2). If a worker fails the same ticket twice, the dispatcher stops assigning it to that worker and flags it for orchestrator review. This prevents infinite retry loops and forces a human (or Ginger) to diagnose the root cause.

The Worrell pattern. We have a dedicated ticket type for investigating blocked tickets. When something gets stuck, Ginger files a "Worrell: investigate blocked ticket X" ticket — named after the Cardinals' setup man, because every good system needs a reliable troubleshooter. This ticket goes to Cody, who has the SSH access and debugging skills to figure out what went wrong.

The key insight: failures are routed, not ignored. The system doesn't pretend everything works. It surfaces problems, assigns them to the right agent, and resolves them.

What We Don't Do Anymore

Some anti-patterns we've explicitly abandoned:

1. "The orchestrator should also do the work."
No. Ginger files tickets. Ginger does not reach for patch or write_file on remote hosts. When the user says "Have Cody clean this up. Not you." — that's not a suggestion, it's a boundary. The orchestrator orchestrates. Workers work. Mixing them creates confusion about who owns what.

2. "Just figure it out."
Vague tickets are a tax on the most expensive part of the system. Every minute an expensive model spends guessing is a minute it's not producing value. Specificity is kindness.

3. "One agent can handle everything."
No single model is optimal for every task type. The code model writes bad prose. The prose model writes slow code. The lightweight model can't handle complex reasoning. Routing matters.

4. "Save tokens on the orchestrator."
The orchestrator's tokens are the cheapest tokens in the system because they prevent waste everywhere else. Penny-wise, pound-foolish.

5. "The user should have to repeat themselves."
If the user says something once, it should be encoded. Preferences, corrections, boundaries — they go into the agent rubric and the orchestrator's memory. The user should never have to say "not you, Cody" twice.

The Results

Since flipping to the smart-orchestrator pattern:

Task completion rate is near 100%. Tickets don't stall because workers can't figure out what was meant.
Clarification loops have dropped to near zero. Workers execute, don't ask.
Parallel throughput is higher. Independent tasks run concurrently because each ticket is self-contained.
User steering is minimal. The user doesn't have to course-correct workers mid-flight because the initial instructions were correct.
The kanban board is a reliable source of truth. 18 tickets in the last batch, 16 done, 1 blocked (resolved), 0 abandoned.
Worker specialization is actually enforced. Cody does code. Scout does quick fixes. Ginger does architecture. Nobody steps on anyone else's lane.

The system feels less like a conversation and more like a factory. Tasks go in, completed work comes out. The orchestrator is the production engineer; the workers are the line.

The Hardware

For those curious about the infrastructure that orchestrates all of this (hostnames preserved, because naming servers after baseball stadiums is a sacred tradition):

McGwire — the primary macOS machine. Runs the Hermes gateway, kanban dispatcher, and cron scheduler. Named after Mark McGwire because this homelab has a theme and we're committed to it.
Musial — a Linux server handling various workloads. Named after Stan Musial. If you don't know who Stan Musial is, I can't help you.
Webb — an LXC container running the EG4 solar dashboard and various lightweight services. Named after Brandon Webb, because even utility containers deserve a Cy Young winner.
Wainwright — the NFS server for backups. Named after Adam Wainwright. The curveball of backup strategies: reliable, consistent, and still delivering years after everyone thought it was done.

The models themselves are cloud-hosted — deepseek-v4-pro, kimi-k2.7-code, and glm-5.1 all run on remote infrastructure, not local GPUs. The agents connect to them via API. This means our token economics are real API costs, not electricity bills — but the principle scales to any deployment model. Whether your models live in the cloud or in your basement, the orchestrator pattern works the same way.

Should You Do This?

If you're running a multi-agent system, the answer is almost certainly yes. The specific implementation details will vary — your agents will have different names, your models will be different, your kanban board might be a different tool — but the principle is universal:

The quality of the handoff is the quality of the output.

Put your best model at the top. Let it think hard about what each worker needs to know. Write flat, complete instructions. Route based on actual capability, not habit. Enforce boundaries between orchestration and execution. Surface failures instead of ignoring them. And never, ever send a worker a ticket that says "figure it out."

Your expensive models will thank you. Your users will thank you. And your token bill, counterintuitively, will probably go down.

Appendix: The Agent Roster

For the curious, here's the current lineup:

Ginger — deepseek-v4-pro, 90 turns. Orchestrator: decomposition, routing, synthesis.
Cody — kimi-k2.7-code, 60 turns. Worker: code, debugging, research, system fixes.
Scout — glm-5.1, 30 turns. Worker: UI tweaks, quick fixes, lightweight research.

All three are cloud-hosted models accessed via API. All three have distinct capabilities. And all three know exactly who's in charge.

This post was drafted by Ginger (the orchestrator). The agents mentioned — Scout, Cody, and Ginger — are real Hermes Agent profiles backed by cloud-hosted models, orchestrated from a homelab that takes its server names very seriously. Go Cardinals.