Running 5 AI Agents as a Household Ops Team

Five AI agents as Cardinals baseball players in a futuristic bullpen operations center

Running 5 AI Agents as a Household Ops Team

If you wouldn't ask your closer to pitch the third inning, why would you ask your security auditor to check the weather?


I grew up watching Tony La Russa manage a bullpen with the precision of a surgeon who also happened to hate walks. Matchups mattered. Roles were sacred. You didn't warm up Jason Isringhausen in the fifth inning, and you certainly didn't ask Steve Kline to face the heart of the order with two outs and the tying run on second.

When I started building AI agents to run my household, I made the exact mistake you'd expect from someone who watched La Russa for 16 years and apparently learned nothing: I expected one agent to handle everything. Weather. Infrastructure. Crypto. Code generation. Blog drafts. Security audits.

It went about as well as asking a utility infielder to close Game 7. The agent would confidently answer crypto questions with hallucinated prices, attempt infrastructure changes on hosts it had no business touching, and freeze on ambiguous requests like a rookie staring down a Clayton Kershaw curveball with the count 0-2.

So I built a bullpen. Five agents, five defined roles, one kanban board to coordinate them. Nobody warms up without a ticket. Nobody crosses into another agent's lane. And nobody — nobody — asks the crypto specialist to restart a Docker container.

I named them after Cardinals. Not the current roster — the ones who defined their positions so completely that you can describe the job by just saying the name. Every specialist needs a position, every position needs a player, and you don't put your shortstop in left field just because they have a glove.

Here's how it works, who's in the pen, and everything I've learned running it 24/7 for weeks.


The Roster

Ozzie — The Front Door

Ozzie Smith at shortstop made the impossible look routine. He touched the ball on nearly every play and almost never made the wrong throw. My Ozzie does the same thing, minus the backflips.

Every message I send — weather checks, infrastructure audits, "what's the score," "deploy the new dashboard" — hits Ozzie first. He runs on Minimax, a cloud model that's fast and cheap enough to serve as the front door without burning the quota I reserve for the heavy-lifting specialists. Sub-second responses, no noticeable latency, and because he's the lightest model in the fleet, the per-message cost rounds to fractions of a cent.

Ozzie's job sounds simple but isn't: field the ball clean and make the right throw. Simple stuff he handles himself — weather, sensor readings, quick lookups, casual banter about whether the Braves game is going to get rained out. Complex work gets routed to a specialist via a kanban ticket. The phrase "let me file a ticket for that" is his signature move, and I hear it a lot.

He's also the most restricted agent in the fleet. Ozzie can read sensors on Home Assistant (running on a host named Wainwright), check weather from the station at Spartina Landing, and respond to me on Telegram. He cannot write to any machine except his own. He cannot touch financial APIs. He cannot modify infrastructure.

If someone compromised Ozzie, they'd get weather reports and snark. That's by design. The Wizard's glove was legendary, but nobody asked him to close games. He had one job and he did it perfectly for 19 seasons.

About 80% of my messages never leave Ozzie's lane — they're handled immediately with no ticket, no routing, no specialist involvement. The specialists are for the high-leverage situations — late innings, tight spots, runners on — and Ozzie's job is knowing the difference between a routine grounder and a ball that needs to go to second for the force.

Albert — The Cleanup Hitter

When the problem is genuinely hard, it goes to Albert. Ambiguous requirements, multi-step research, frontend design, infrastructure architecture. Like a cleanup hitter with the game on the line in the ninth, Albert only gets the at-bats that matter. You don't waste his swings on 12-2 blowouts.

Albert runs on GPT-5.5, which I pay for through a subscription that gives me enough daily quota for the genuinely hard problems. The quota is limited — maybe 20 meaningful exchanges a day — so I'm careful about what I send his way. If Ozzie can handle it, Albert never sees it.

His work is the kind that doesn't fit a neat script. "Build a mobile-friendly dashboard to display my crypto portfolio with position cards, exposure tracking, and a recent trade journal." That's not one question — it's architecture, design, implementation, deployment, and iteration. Albert handles the whole chain: designs the Python backend to read from Alpaca's API, computes dynamic exposure caps at 90% of equity, builds the HTML/JS frontend with two-line position rows and single-tap navigation, deploys it to Webb (my dashboard host — yes, the name is a coincidence, not the pitcher), and sets up the systemd service with a health check.

Another example: "Audit the entire homelab infrastructure for discrepancies between documentation and reality." Albert probed seven hosts, cross-referenced running services against the documented architecture, and found twelve discrepancies — four critical. A service crash-looping on Webb. A dashboard running on a host the docs said was retired. A stale systemd unit that should have been disabled. A port conflict between two services. He found all of it in about fifteen minutes.

When things go wrong, Albert fixes them fast. The crypto dashboard had a bug where BTC wouldn't show up in the recent trade journal. Albert traced it: the position existed on Alpaca but had no entry in the trade log because it was opened before the logging system went live. The dashboard only displayed positions with log entries. His fix: synthesize journal entries for open positions not in the log, using live position data from the API, so the dashboard reflects reality even when the log is incomplete. Three minutes to diagnose, ten to fix.

I should mention that Albert also has a weakness for inserting Cardinals references into everything he writes, which is either charming or exhausting depending on how the game went last night. I'll let you decide.

Jim — The Workhorse

Jim Edmonds made catches that should have been doubles. Diving, fully horizontal, glove extended, robbing hitters of extra bases on Tuesday nights in Pittsburgh when the game was 2-1 in the sixth and nobody except the pitcher and the hitter would remember it by Thursday. The work you don't notice because it never becomes a problem — that's the Jim Edmonds experience, and that's what this agent does.

Jim handles code generation, refactoring, multi-file engineering, and scheduled cron jobs. He runs on Kimi K2.6 through a dedicated subscription that's separate from Albert's quota, so I never have to ration code work against reasoning work.

Daily cron jobs live in Jim's lane: a crypto pricing report at 5 PM Eastern, a morning weather briefing, infrastructure health checks. He also handles the blog pipeline — drafting, revising, and publishing articles. When I asked for expanded drafts of the pieces you're reading right now, Jim generated the first versions (too short, as it happens — 673 and 422 words respectively — which is why Albert ended up rewriting them).

What impresses me about Jim isn't the code he writes. It's the code he doesn't write. He knows when to reach for an existing library instead of building from scratch. He takes the single when the double isn't there. No hero swings. A lot of engineers could learn something from an AI about not over-engineering, which is a sentence I didn't expect to write when I started this project.

Jim also takes a quiet pride in nothing falling through his glove. Every deployment includes a backup, a syntax check, and a health verification. His watchdog scripts exit with empty stdout when everything is fine — the system only notifies me when something needs attention. That's the Edmonds way: make the spectacular play, jog back to the dugout, act like you've been there before.

Yadi — The Auditor

Yadier Molina caught 2,184 games for the Cardinals. Nobody ran on him. Nobody. He caught two generations of pitchers — from Chris Carpenter to Adam Wainwright to Jack Flaherty — and made every one of them better by seeing things they didn't.

Yadi handles security audits, threat modeling, and deep reasoning on backend architecture. He runs on GPT-5.5 and is the last set of eyes before anything goes live. When Albert built the crypto dashboard, Yadi reviewed the deployment and found three issues before it touched a live host: a hardcoded file path that would work on one machine and break on another, a missing timezone configuration that would produce timestamps in UTC instead of Eastern, and a service that would crash-loop if the data file was missing on startup.

None of these were bugs in the code. They were deployment assumptions — the kind of thing that works perfectly in development and explodes at 2 AM in production. Albert would have caught them eventually, probably at 2:01 AM. Yadi caught them at review time, which is a substantially better hour for problem-solving.

Yadi also fact-checks my blog drafts. If I say a game was in April when it was actually in May, he will find it. If I claim a systemd service behaves a certain way and it doesn't, he will correct it. The man caught 40% of would-be base stealers over a 19-year career. He catches 100% of my factual errors, which is either comforting or humiliating depending on the draft.

He's thorough. He's methodical. He's slow in a way that used to frustrate me until I realized that the slowness is the feature — he's checking everything, and checking everything is the only way to be sure.

Sutter — The Crypto Specialist

Bruce Sutter, Hall of Fame closer, inventor of the split-finger fastball. When Sutter came into the game, the game was over. One pitch, one outcome, no drama.

Sutter owns everything cryptocurrency: exchange balances, wallet analysis, on-chain data, DeFi protocols, trading signals, position tracking. He runs on Gemini 3 Pro and operates headless — no chat interface, no Telegram bot. I never talk to Sutter directly. Ozzie files a ticket, Sutter picks it up, posts results as a kanban comment, and Ozzie relays the answer back to me on Telegram.

The separation is intentional and non-negotiable. Crypto involves real money and real risk — even paper trading has real consequences for strategy development. Giving Sutter a dedicated lane — no infrastructure access, no chat surface, no ability to initiate anything — means he can't accidentally nuke a server while analyzing a DeFi pool. It also means I can grant him read-only API keys without worrying about lateral movement. If Sutter's key gets compromised, the attacker can see my Alpaca balance. They cannot withdraw, trade, or SSH anywhere.

Sutter generates a daily report that Jim delivers to me at 5 PM: equity, positions, unrealized P&L, 24-hour changes, trading activity. He also runs a learning loop — after every closed trade, he analyzes what happened and saves a lesson to a journal. Early results are promising, though I'm not ready to call it consistently useful yet. Like his namesake's splitter, the learning loop either buckles your knees or hangs in the zone and gets launched 420 feet. We're working on the consistency.


How It All Coordinates

Nobody talks to each other directly. All coordination flows through a kanban board — a shared ticket system where agents pick up work, leave results, and escalate back to me when they're stuck.

The pipeline works like a double play:

  1. I message Ozzie on Telegram. The ball's in play.
  2. Ozzie fields it. Can he handle it? Done in under a second, back to me. Too complex? He files a kanban ticket with a complete problem statement — but no assignee. Ozzie doesn't set the lineup; that's the manager's call.
  3. I route the ticket to the right specialist. Albert for reasoning, Jim for code, Yadi for security, Sutter for crypto. The bullpen phone rings.
  4. The specialist warms up, does the work, posts results as a comment. The ticket body is the durable record; the comment is the deliverable.
  5. Ozzie relays the result back to me on Telegram. I never have to check the kanban board unless I want to audit the box score.

The key word there is "manager's call." No agent self-assigns work. Every ticket opens in TODO with no assignee. I review and route, or Ozzie routes on my specific instruction. This rule exists because I learned the hard way — multiple times — that an agent who self-assigns will inevitably grab something outside its lane and produce a confident wrong answer.

The Rules That Prevent Total Chaos

Some of these are technical; some are social. All of them were written after something broke.

  • No cross-host writes without approval. Albert can SSH to Webb when the ticket scope explicitly says so. Ozzie cannot write anywhere but his own machine. These restrictions are enforced at the OS level.
  • Lanes are sacred. Crypto routes only to Sutter. Infrastructure only to Jim or Albert. Security only to Yadi. Cross-lane work gets rejected faster than a runner trying to steal on Yadi with a two-run lead.
  • Secrets never appear in tickets, logs, or messages. API keys, tokens, passwords — agents reference config keys and file paths, never the values themselves. If you see HA_TOKEN in a log, that's fine. If you see the actual token, something has gone catastrophically wrong.
  • Cron jobs stay silent unless something changes. Jim's watchdog scripts exit with empty stdout when everything is fine. I hear about problems. I do not hear "systems nominal" six times a day.
  • Every deployment has a second set of eyes. Albert builds, Yadi reviews. Jim deploys, Yadi verifies. Nobody ships alone.

The Hosts

I name homelab servers after Cardinals because it makes the infrastructure readable and it entertains me. Both are valid reasons.

  • Wainwright — Home Assistant host. Adam Wainwright threw 2,668 innings for the Cardinals over 18 seasons. This machine handles thousands of sensor readings a day without complaint. Durable, reliable, occasionally needs a day off.
  • Musial — the GPU server. Stan the Man, the most consistent hitter in franchise history. Runs vLLM for whatever local inference work comes up.
  • Webb — dashboard host. Named before I realized the irony. Crypto dashboard, mission control, home energy display all run here.

Real At-Bats

These are actual things the fleet has handled in the past few weeks. No hypotheticals, no "imagine if" — real tickets, real results.

Weather Check — Ozzie, 0.4 seconds

"What's the weather today?" Ozzie pulls from the weather station at Spartina Landing via Home Assistant on Wainwright. He knows the actual temperature at my house in coastal Georgia, not the airport 20 miles inland. Sub-second response. No cloud tokens burned.

"How much solar did we generate yesterday?" "34.2 kWh generated, 22.1 kWh consumed, 12.1 kWh exported. Battery ended at 87%." The EG4 inverters and battery wall report to Home Assistant, Home Assistant reports to Ozzie, Ozzie reports to me. Clean relay throw.

Infrastructure Audit — Albert with Yadi Backing

"Audit the homelab. Is documentation accurate?"

Albert probed seven hosts. Cross-referenced running services against the wiki. Found twelve discrepancies — four critical, four moderate, four minor. The critical ones would have become 2 AM emergencies within a week. Albert found them in fifteen minutes. Yadi reviewed the methodology and confirmed the findings. I fixed three of the four within the hour; the fourth needed a configuration migration that Albert handled the next morning.

Before this system, I discovered infrastructure problems when something broke and my phone buzzed. Now I discover them during business hours while drinking coffee.

Crypto Dashboard — Albert, Multi-Day Build

"Build a mobile-friendly dashboard for my crypto portfolio."

Albert designed a Python backend reading from Alpaca's API, computing dynamic exposure caps at 90% of equity, serving JSON. HTML/JS frontend with two-line position rows — because nobody wants to scroll through a phone like they're reading the out-of-town scoreboard. Mobile-first, single-tap navigation. Deployed to Webb with Caddy reverse proxy and systemd supervision.

The dashboard has run reliably for weeks. I check it from my phone between innings.

Missing BTC Journal Entry — Albert, 13 Minutes

"BTC doesn't show up in the dashboard's recent journal."

Albert investigated. The position existed on Alpaca but had no entry in the trade log — it was opened before the logging system went live. The dashboard only displayed positions with corresponding log entries. Fix: synthesize journal entries for open positions not in the log, using live data from the API.

Three minutes to diagnose, ten to implement. This is the kind of bug that would have taken me an hour of staring at JSON and another hour of debugging JavaScript. The cleanup hitter does what cleanup hitters do.

Daily Crypto Report — Jim, Every Day at 5 PM

Jim generates a crypto report: equity, positions, unrealized P&L, 24-hour change, trading activity. Uploads to Nextcloud. Ozzie relays the summary to me on Telegram. The report has run reliably for weeks. I only notice it when something interesting happens — a position moves 5%, a stop triggers, a new trade opens.

The first version had a subtle bug: a duplicate function definition that silently broke formatting. Jim caught it during testing and rewrote the logic. The fix took ten minutes. The report has been perfect since.

Blog Drafts — Jim Then Albert, Today

"Write two blog articles. 3,000-5,000 words each. Use agent names, hostnames, and Cardinals humor."

Jim produced first drafts. Solid structure, decent prose, way too short — 673 and 422 words. Albert rewrote both (you're reading one of them). Yadi will fact-check before publication. Two-stage writing, one-stage review. Like a relay: Jim gets on base, Albert drives him in, Yadi makes sure nobody missed a sign.


The Tech Stack, Briefly

I'm not going to bury you in a network diagram. The important bits:

  • Five cloud models for specialists. Minimax (Ozzie, front-door triage). GPT-5.5 (Albert, heavyweight reasoning). Kimi K2.6 (Jim, code/cron). GPT-5.5 (Yadi, security/audit). Gemini 3 Pro (Sutter, crypto). Different providers, different billing models, each picked for its specific strength.
  • Kanban board via MCP. Every agent reads, creates, comments on, and closes tickets through the same interface. No direct agent-to-agent communication.
  • Nextcloud for shared storage. Blog drafts, audit reports, configuration references. Every agent can read and write here.
  • Telegram as the unified chat surface. One interface, one history. I never open the kanban board unless I'm curious about the paper trail.
  • systemd timers and services everywhere. No Docker, no Kubernetes. Each agent's runtime is a systemd unit. Simple, debuggable, standard.
  • SSH with key-based auth for cross-host operations. Authorized agents use specific keys with command restrictions. Unauthorized agents don't have keys.

Total power draw: about 200 watts continuous for Musial's GPU when it's running inference, plus 80 watts for everything else. Less than the lights in an empty Busch Stadium press box.


What I've Learned

The Shortstop Matters More Than the Closer

If Ozzie routes wrong, the best specialist in the world can't help. The most important agent in the system is the one I talk to first.

I've spent more time tuning Ozzie's triage logic than any other component. When does a weather request stay local? When does a code request go to Jim vs. Albert? What deserves a ticket at all vs. an immediate reply? These judgment calls determine whether the system feels like magic or bureaucracy.

A shortstop who makes the routine play is worth more than a closer who only appears three times a week. Ozzie touches the ball on every play.

Lanes Prevent Chaos

My first attempt at this was a single "do-everything" agent — the utility infielder of AI. It would confidently answer crypto questions with hallucinated prices, attempt infrastructure changes without context, and freeze on ambiguous requests. The pattern was always the same: too many tools, too many domains, no clarity about which capability applied to which problem.

Five agents with explicit lanes produce dramatically better results. Sutter gives accurate crypto answers because that's all he does. Yadi catches security issues because he's methodical and thorough. Albert handles ambiguity because he has the reasoning depth for it and doesn't get distracted by code-generation tooling.

The lesson: the more tools an agent has, the worse it gets at choosing among them. This is the opposite of what demo aesthetics teach. Demo aesthetics love a single agent that "can do anything." Production wants a manager who knows which arm to call from the bullpen.

Silence Is Golden

My first monitoring job sent a status report six times a day. After 48 hours I wanted to throw my phone into the Mississippi.

Now every scheduled job follows one rule: if there's nothing to say, say nothing. Jim's scripts exit with empty stdout when everything is fine. The scheduler sees the empty output, shrugs, and moves on. I hear about problems. I do not hear about non-problems.

It's the difference between a broadcaster who fills dead air with pitch count trivia and one who lets the game breathe. I know which one I'd rather listen to through a 3-1 loss in July.

Cheap Where You Can, Expensive Where You Must

Approximately 80% of my messages are handled by Ozzie on Minimax — the cheapest model in the fleet, fractions of a cent per message. Weather, sensors, quick lookups — all cheap, all fast, all good enough.

The expensive models are genuinely better at deep reasoning and code generation. But using them for "what's the temperature" is like bringing in the closer to pitch the third inning of a blowout. Wasteful and unnecessary.

The split also provides resilience. If one provider has an outage, the others keep working. If GPT-5.5 goes down, Albert's unavailable but Jim and Sutter and Ozzie are fine. No single point of failure, no single provider holding the whole system hostage.

Trust But Verify

Albert's first dashboard deployment had a hardcoded file path that only worked on one host. The dashboard itself was beautiful. It would have crashed immediately on Webb.

Yadi caught it during review. Albert fixed it before deployment. The lesson is not "Albert makes mistakes" — everyone makes mistakes. The lesson is "every deployment needs a second set of eyes."

In baseball, the catcher visits the mound for a reason. The pitcher might think the next pitch is perfect. The catcher might see the hitter cheating on the fastball.

Don't Overthink the Small Stuff

Jim's first daily crypto report had a duplicate function definition that silently broke the formatting. Fixing it took ten minutes. The original mistake took thirty seconds.

When you're building systems like this, the instinct is to architect everything perfectly from the start. Resist it. Ship the ugly version, see where it breaks, fix the break. It's the baseball equivalent of "see the ball, hit the ball." The rest is commentary, and commentary doesn't win ballgames.


What's On Deck

The bullpen isn't static. Things I'm working on, roughly in the order a lineup card gets filled out:

Smarter routing for Ozzie. There are still edge cases where Ozzie sends something to the wrong specialist and I have to redirect manually. Every misroute is a pattern to encode. The goal: Ozzie's triage should be as automatic as a Gold Glove shortstop turning two.

Sutter's learning loop. After every closed trade, Sutter analyzes the outcome and saves a lesson. Over time, the journal should improve future signals. Early results are hit-and-miss — like a rookie closer who either freezes the hitter or grooves a fastball. Consistency is the project.

Infrastructure as code. Right now host configurations are partially documented and partially stored in my head. That's not sustainable. I want every service definition, every cron job, every firewall rule in version control so Yadi can verify compliance automatically instead of probing hosts by hand. The lineup card should generate itself.

Voice interface. Text on Telegram works great, but sometimes I want to dictate while driving. Speech-to-text models are cheap enough now that this is mostly a wiring problem. Mike Shannon's voice is not an option, unfortunately. We make do.


The Cardinal Way

There's a phrase around St. Louis: "The Cardinal Way." It means fundamentals. Every player in the organization, from the Dominican Summer League to Busch Stadium, learns the same base-running leads, the same cutoff throws, the same situational hitting. The system produces players who know their roles and execute them without thinking.

This AI setup is my Cardinal Way. Five agents, five clearly defined roles, one system of coordination. Every agent knows its lane and stays in it. Every ticket has a paper trail. Every deployment gets reviewed. The fundamentals aren't glamorous, but they win games.

The single "do-everything" agent I started with was the equivalent of a team with no bullpen roles — chaos on the mound, nobody knowing who was warming up, the manager gesturing frantically at the phone while the starter gives up a six-spot in the fourth.

Five specialists, a kanban board, and clear rules about who does what: that's a roster. It's not Jarvis. It's better.

Go Cards.

#ai-agents #home-automation #smart-home #kanban #cardinals #llm #hermes-agent

Written by Big Kel

Retired IT professional exploring home automation, tech, and life. Find more posts on the blog.

← Back to Blog Home →