You Wouldn't Turn a Left Fielder Into a Pitcher

May 6, 2026 By Big Kel 12 min read

There is a category of decision in software engineering that almost everyone gets wrong on the first attempt, and the mistake costs months. The decision is: when a popular framework is roughly aimed at your problem but doesn't quite fit, do you adapt the framework, or do you build the thing you actually need?

The default move, especially when you're early in a project and surrounded by tutorials and Reddit threads telling you which framework everyone else uses, is to adapt. Take CrewAI in hierarchical mode and bend it. Take Hermes Agent and graft a new orchestration layer on top. Take a generic OpenClaw subagent harness and try to teach it about your specific tools. The framework is 80% there, you tell yourself; surely it's faster to fix the missing 20% than to start over.

This is the framework-adaptation trap, and I've fallen into it three times in the last two years building Brock, my home-built CrewAI assistant. I want to walk through how each of those attempts went, what specifically broke, and the rule I now follow when I see myself reaching for someone else's harness instead of writing one. The short version: when a framework solves 80% of your problem, use it and forget I said anything; when it solves 20% of your problem, build. The middle case — the dangerous one — looks like 80% but turns out to be 20% once you're three months in. The post is mostly about how to recognize that case before you sink the three months.

A baseball analogy did the most work for me when I finally understood the pattern: you wouldn't try to turn a left fielder into a pitcher. The left fielder might be a tremendous athlete. He might have a strong arm. He might love the idea. But pitching is a different position, with different mechanics, different conditioning, and a different role in the game. If you need a pitcher, you draft a pitcher. You don't spend a season retraining an outfielder and tell yourself the marginal cost is low. Frameworks are positions. If you need something the framework wasn't drafted for, draft what you need.

Adaptation attempt one: Hermes plus OpenClaw on Discord

Before Brock, I had a setup I called Hermes. It was a Discord-bot-driven orchestration layer that received messages, classified them, and routed them to a separate execution layer called OpenClaw, which was a collection of subagents — each handling a specific domain like solar telemetry, Tesla data, or Obsidian notes. Hermes was the gateway. OpenClaw was the workforce. They talked to each other through a bespoke protocol I'd designed.

This was a defensible architecture for the use case Hermes was built for: lightweight Discord-mediated automation with a small number of specialized backends. It worked for about a year. The problem started when I wanted Hermes to do things Hermes hadn't been drafted for.

I wanted memory — persistent conversation state across sessions. Hermes had session-scoped state but no concept of an operator-spanning memory layer. I bolted one on. It mostly worked.

I wanted slash commands — deterministic skills that bypassed the LLM entirely. Hermes routed everything through its classifier first. I added a pre-classifier shortcut. It mostly worked.

I wanted to migrate off Discord onto Telegram, because my actual usage was on Telegram and Discord had become a friction point. Hermes was so deeply tied to Discord's message model that I ended up writing a Telegram-to-Discord-shaped adapter that re-shaped Telegram messages into something Hermes could process. It mostly worked.

I wanted multi-step reasoning — give the assistant the ability to call several OpenClaw subagents in sequence to assemble a complex answer. Hermes wasn't designed for chained calls; it assumed a single classify-and-dispatch flow. Implementing chains required teaching Hermes about state I'd been trying to keep out of its scope. It... sort of worked.

By the end of that year, the cumulative weight of "mostly works" and "sort of works" had a name, and the name was technical debt. Hermes was a left fielder I was trying to turn into a pitcher, an outfield coach, and a starter all at once. Each individual adaptation was small. The combined cost — the friction every time I wanted to extend the system in a new direction — was enormous.

I rebuilt from scratch. The replacement, eventually, became Brock. The first version of the new architecture was simpler than the Hermes adapter layer I'd been maintaining; it solved the problem Hermes hadn't been drafted for, which was "operator-driven multi-modal assistance with persistent memory and Telegram-first delivery," directly, and it didn't carry any of the assumptions Hermes had been built around.

That was attempt one. Lesson: when the underlying assumptions of the framework don't match the assumptions of your use case, no amount of glue code will reach. Every glue layer is another place where future you will pay the tax.

Adaptation attempt two: CrewAI hierarchical mode

The second attempt was, in hindsight, the same mistake at a smaller scale.

When I started Brock, I picked CrewAI as the framework. The basic CrewAI Process.sequential mode was a great fit — define agents, define tasks, give them tools, and run them in order. For my early needs, sequential was perfect. CrewAI was 80% right and the remaining 20% was things I could write around it without much cost. Good framework choice for that phase.

Then I wanted multi-agent delegation — a supervisor that decided which specialist should handle a query — and reached for CrewAI's Process.hierarchical mode. This is where the trap got me again.

Process.hierarchical in CrewAI is a manager-worker pattern: a manager agent receives the task, decides which worker agents to delegate to, and assembles their outputs into a final answer. Sounded perfect. Looked perfect in the docs. Was not actually right for my use case.

Two things broke. First, the cost. As Towards Data Science documented at length: "Each agent in a crew makes its own LLM calls, so a 5-agent crew processing a single task generates at least 5 separate API calls, often more when delegation or memory retrieval is involved. For high-volume workflows, this can lead to API bills that are 5 to 10 times higher than a single-agent approach." That tracked exactly with what I saw. A single chat message would spawn the manager call, then sequential delegation calls, then synthesis calls — twelve to fifteen LLM invocations for a query that should have been one routing decision plus one specialist invocation.

Second, and more important: the manager logic didn't actually do what I wanted. The same write-up notes that hierarchical orchestration "executes all tasks sequentially, causing incorrect agent invocation, overwritten outputs, and inflated latency/token usage. The framework's internal routing doesn't enforce conditional branching or true delegation, so the final response is effectively determined by whichever task happens to run last." I needed a router that picked one specialist and stopped. CrewAI's hierarchical mode was structurally a fan-out-and-merge, not a select-and-dispatch.

I tried to bend it. I wrote a custom manager agent system prompt that aggressively discouraged calling multiple workers. I built an early-exit pattern. I worked around the framework's internal task assembly with intercepted output handlers. After about six weeks of "it mostly works," I realized I had spent six weeks trying to teach a fan-out architecture to behave like a select architecture, and I was still going to have to maintain whatever bent version emerged from that effort indefinitely.

I rewrote that whole layer in about two days. The replacement is something I call lite_supervisor. It is a small Python module — under 250 lines, including comments — that does exactly two things: a single LLM routing decision (which specialist?), and a single specialist invocation (the actual answer). It does not delegate to multiple specialists in parallel. It does not assemble outputs from multiple sources. It does not have a fan-out phase. It does the thing I needed and nothing else. Two LLM calls per chat, total. Audit trails are linear and inspectable. The code that replaced the CrewAI hierarchical config is shorter than the config it replaced.

This is a pattern worth naming directly. When you find yourself writing custom manager prompts to make a framework behave differently from how it's designed to behave, that's the framework telling you it isn't drafted for your position. Listen to it.

Adaptation attempt three: the OpenClaw style subagent harness

The third attempt is one I almost made but caught myself.

OpenClaw — the subagent harness from my Hermes era, the one I'd already torn down once — has cousins. Anyone who's spent time in the agent-framework space has probably encountered some flavor of "give me a directory of skills, dispatch to them by name, return the result." Whether it's Garry Tan's gbrain, one of the early OpenClaw forks, AutoGen's subagent patterns, or the various "tool registry plus classifier" libraries that have shipped in the past year, the basic shape is similar: a registry of capabilities, a classifier that picks one, a dispatcher that runs it.

When I was designing Brock's slash command surface — the deterministic skills like /grid and /cardinals and /portfolio — I caught myself reaching for a generic skill-registry harness. There's a Claude Code skills pattern that does roughly this, and Garry Tan's GStack package puts twenty-plus skills into a unified registry with shared lifecycle. It's a coherent design. I almost adopted it.

Then I asked the question that had bitten me twice already: what specifically am I going to need that this harness doesn't do, and what will adapting it cost?

What I needed for Brock's slash commands: a Telegram bridge that mapped a /foo message to a Python handler with type-hinted arguments. Approximately fifteen lines of code per slash command, including the route registration, argument parsing, and response formatting. Half of those fifteen lines are the actual function body — the API call to MLB Stats, the SQLite query, the rate calculation. The dispatch surface is essentially trivial.

What an OpenClaw-style harness would have given me: a beautiful skill registry, lifecycle hooks, a discovery mechanism, declarative argument schemas, namespacing, conflict detection. All elegant. All sized for someone running a hundred skills, where the registry plumbing is a meaningful fraction of the value. I had nine slash commands. The plumbing would have been more code than the slash commands themselves.

I wrote a fifty-line dispatcher in the Telegram bridge. It does what I need. It will do what I need next month. When my use case grows, I'll re-evaluate. If it ever passes the threshold where a real registry would help — call it forty or fifty skills — I'll adopt one then. Until then, the framework's not solving a problem I have.

This was the first time I caught the trap before falling into it. The previous two times I had to learn it the expensive way.

What "drafting a pitcher" actually looks like in code

I want to make the alternative concrete, because it's easy to read this and conclude that I'm advocating for everyone to write everything from scratch. That's not the lesson.

The lite_supervisor that replaced CrewAI hierarchical is a good example of what I mean by "draft a pitcher." It's small. It does one thing well. Here's the shape:

A Python module with a single entry point — run_lite(message, conversation_id) -> ChatResponse
That entry point does three things: build a prompt for the router LLM with a list of available specialists; call the router; on the routed specialist's name, hand off to that specialist's existing CrewAI invocation
It threads through deterministic state pre-injection (DSPI) for specialists that benefit from it
It runs the output verifier on the response before returning
It logs the trace as a single linear audit entry

That's it. There's no manager prompt to tune. There's no fan-out logic to fight. There's no token bloat from multi-agent assembly. Two LLM calls per chat, one of which is small and one of which is a specialist whose configuration I already had from my CrewAI Process.sequential days.

The CrewAI machinery is still there underneath, doing the part it was drafted for: defining agents, attaching tools, running an LLM call with the agent's system prompt. lite_supervisor sits on top of CrewAI's primitives, not on top of CrewAI's hierarchical orchestration. I drafted a pitcher; I didn't tear down the rest of the team.

This is what "build the thing you actually need" looks like when you're honest about it. You don't rewrite the world. You write the small piece that nobody else's framework is drafted for, on top of the framework primitives that fit. The skill is recognizing which piece is which.

When SHOULD you adapt instead of building?

I don't want to be ideological. There are real cases where adapting beats building, and pretending otherwise is just NIH syndrome with extra steps.

The honest test is: how much of the framework's design is load-bearing for your use case, and how much is dead weight?

If the framework's core abstractions are load-bearing — if you're using it the way it was drafted, taking advantage of the patterns it's optimized for — adapt it. CrewAI's Process.sequential is load-bearing for me; I use it as designed, and it does its job well. I have no plans to replace the CrewAI Agent or CrewAI Task primitives.

If the framework's core abstractions are dead weight — if you're spending most of your time working around them, building bypass layers, writing custom prompts to make it behave differently — build. CrewAI's Process.hierarchical was dead weight for me. The fan-out architecture was actively against my use case. Every line of adaptation was paying tribute to a design choice I didn't need.

Joel Spolsky made the related argument in 2001: "If it's a core business function — do it yourself, no matter what." His framing was about depending on someone else for the feature that defines your product. The agent-framework version of this is: depend on someone else's framework only for the parts that aren't your differentiation. If your supervisor logic is what makes your assistant work, write the supervisor logic. Don't rent it.

The common counter-argument is "but what about reuse, what about ecosystem, what about the community of people who've solved problems you don't even know you'll have." Those are real benefits when the framework fits. They are illusions when it doesn't. A framework's ecosystem solves problems for people running its design as drafted; it does very little for people running it sideways.

The cheap way to know

There's a quick diagnostic that's worked well for me. Once you find yourself doing any of the following, you're probably in adaptation-trap territory and should consider building:

Writing a custom system prompt to override the framework's default behavior
Patching framework internals to expose state the API doesn't surface
Maintaining a fork of the framework
Wrapping the framework in your own facade so callers don't see the framework's API
Working around the framework's logging or observability because it doesn't fit yours
Spending more time reading framework source than reading your own

Each of those things is occasional and fine in isolation. Two or three of them at once is a smell. Four or more is the universe telling you the left fielder isn't going to learn to throw a curveball.

When that's where you are, the fastest path forward is usually not "adapt harder." It's to write down — on a piece of paper, no IDE, no docs — what your system actually needs to do. Just the verbs and nouns. Then look at how much of what you wrote down is supported natively by the framework you're fighting, versus how much is a pattern-match against the framework's existing abstractions. If most of it is pattern-match, that's a hint that the framework was drafted for a related but different problem, and you'll keep paying the difference forever.

Closing

In professional baseball, draft strategy is built around the idea that you scout the position you need. You don't draft the best athlete and decide later what to do with them. You decide what role is missing, then find the player drafted for it. The same logic applies to the architecture under your agent: figure out what role you actually need filled, then find or build the thing drafted for it. Don't try to retrain a left fielder into your closer in the eighth inning of the ninth game.

I've torn down two framework adaptations to build the version I actually needed, and caught myself before doing it a third time. Each rebuild was less code than the bent version it replaced. Each one made the system smaller, faster, and more obviously correct. None of those were rewrites for the sake of rewriting. They were drafts.

If you're three months into bending a framework that was drafted for someone else's problem, the rebuild is probably one weekend away, and you're not going to believe how much smaller it ends up.

Draft the pitcher.

#aiagents #frameworkdesign #crewai #openclaw #architecture

Written by Big Kel

Retired IT professional exploring home automation, tech, and life. Find more posts on the blog.