The org chart was designed for humans

Earlier this year, senior engineering practitioners from some of the largest tech companies sat in a room for multiple days trying to answer a question that kept resurfacing in every breakout session. If agents handle the code, where does the engineering go? The answers diverged wildly but the urgency didn't.

I've been thinking about that retreat since it happened, partly because the findings confirmed something I'd been watching across teams of all sizes, and partly because the most interesting thread isn't in the published summary. It's in what the summary doesn't resolve.

The experiments nobody's reporting on

The organizations I've been talking to are doing something quiet. They're pulling small groups together into tight, co-located, cross-functional teams. One pizza teams. Engineers and product people and designers sitting next to each other, sometimes a data scientist. In the same room, not async, because it turns out the collaboration that matters now isn't code production.

The work these teams do looks different from what I expected. It's not about writing code faster. It's about making decisions together. The agents handle the implementation. The humans handle the judgment calls that the agents can't make - which tradeoffs to accept and what "good enough" means for this specific context. Proximity turns out to matter again, not because anyone needs to pair-program, but because when agents are pushing you through decisions at a pace you're not used to, having another human who can say "wait, that doesn't feel right" before you've committed to the wrong tradeoff is genuinely valuable. Same room is best, but same timezone works - what kills you is a twelve-hour lag between the gut check and the decision. The pace wears you down otherwise.

The org chart was designed for humans. The humans are no longer the only ones doing the work.

Centralization isn't the villain here

There's a version of this article where I argue that centralized control is holding organizations back. That's the easy take, and I think it's wrong.

The teams pulling people into tight experimental groups are centralizing, and they're doing the right thing. You don't decentralize something you don't understand yet. Decentralizing prematurely is how you end up with ten teams all solving the same organizational problem differently and none of them solving it well.

The retreat backs this up. Practitioners described teams that got AI tools, cleared their backlogs in days, and then ran headfirst into cross-team dependencies and human-speed decision-making. The bottleneck didn't disappear, it just moved upstream. Decisions became the constraint, not engineering capacity. Middle managers who previously served as coordination points became approval bottlenecks.

That speed mismatch makes centralization the honest first response. If you don't know what your governance model should look like when agents produce work faster than anyone can evaluate it, the worst thing you can do is pretend you've figured it out and fan out across the organization. Better to centralize, learn, and then figure out what fanning out even means.

I should complicate this, though, because the previous paragraph makes centralization sound too tidy. What I'm seeing is that centralized experimental teams produce insights that are hard to transfer. Team A figures out a workflow that works brilliantly for their domain. Team B tries to adopt it and it falls apart because the constraints are different. The knowledge lives in the team's shared context, in the gut-check conversations that happened in real time, not in whatever playbook they write up afterward.

Where the experiments hit boundaries

The centralize-first instinct runs into a problem at the boundaries between teams. The retreat gave it a name: "agent topologies" - Conway's Law applied to agents. If organizations design systems that mirror their communication structures, what happens when agents become participants? Agents can be duplicated across teams instantly. A specialized database agent can exist everywhere without the bottleneck of a single human specialist.

Agents that learn from their context diverge, though. The database agent on the e-commerce team accumulates different patterns than the one on the ERP system, even from identical starting configurations. Human teams have this problem too - team-specific norms, local conventions - but agents do it faster. And when multiple agents try to fix the same issue, they create feedback loops. One agent's fix triggers another agent's correction. The retreat cited an example where an agent given a linter rule about file length responded by making individual lines longer, technically satisfying the rule while violating the principle behind it.

The retreat also identified something it called "the middle loop" - a new category of supervisory engineering work between writing code and managing delivery. Directing agents, recognizing when they're producing plausible-looking wrong answers. This work requires skills that experienced engineers often have but nobody explicitly trains for and no career ladder recognizes.

This gets harder at cross-project boundaries. When one team is figuring out how to work with agents, the lessons are local and manageable. When N teams across your organization are all pushing agents into different domains simultaneously, the problem changes shape. Some of what they learn will be domain-specific. Some will apply to a single folder or service. Some will be organizational knowledge about how agents interact with your governance and review processes and deployment pipeline - stuff that has nothing to do with a specific codebase. The challenge isn't just enabling the people pushing in this direction.

The full shared context can't travel - the same proximity that makes the decisions good makes the lessons harder to extract. You can't bottle what happened in those rooms, the shared instinct, the real-time feel for when something's off. But the specific rules those rooms produce travel fine. The file-length linter story from the retreat is one: agents interpret rules literally, so encode the intent, not just the metric. That's a sentence. It cost someone weeks of confused debugging to earn. The next team won't have the instinct that produced it. They'll have the rule itself, which is the difference between hitting the same wall and skipping it.

And none of this depends on how big your organization is. It depends on how much work you're trying to route through agents. A fifty-person company running agents across four projects with shared infrastructure hits these boundary problems the same way a five-thousand-person company does. A three-person startup I talked to hit the same wall the moment they pointed two agents at the same shared service. The variable is agent adoption depth, not headcount.

My stake in this argument

Conflict of interest

I build Engraph, a system for managing organizational constraints that get delivered to AI agents. If you buy the argument that engineering organizations need enablement infrastructure, my company benefits. I sell infrastructure for a problem I'm telling you is urgent. The retreat is independently published and doesn't mention Engraph. The experiments I described are happening at organizations that have never heard of us. But I'm not going to pretend the framing is neutral. "This problem requires tooling" is a claim that serves my interests. I think it's also true, but I'd rather you weigh the argument knowing both things than discover the conflict of interest later. And to be specific about the limits: Engraph captures the part that can be written down. The shared instinct, the real-time feel for when something's off - that still requires proximity. I think the written-down part is more valuable than most people assume, but that's a bet, not a fact.

What I'm not sure about

The argument I've been making assumes that organizational culture change is necessary, that you can't just let agents get smarter and have the problem solve itself. Model capability improves every few months. If agents eventually navigate organizational context without being taught, the whole reorganization question becomes less urgent. I don't think that's where it lands - the knowledge that matters is specific to your company, your regulatory environment, your team's scar tissue from things that broke - but it's worth naming the possibility honestly.

There's a bias I should name, though. I'm drawn to the centralized experiments because they're legible - you can point at a team and say "they're figuring it out." Some useful organizational learning probably happens messily, in friction between teams that don't share a room or a whiteboard. I don't think that invalidates centralizing first. But if I'm only watching the legible experiments, my read on all of this is probably skewed toward what's easy to observe.

What does organizing mean now?

I don't know. The retreat didn't either. The question itself might be more useful than any answer anyone produces in the next year or two, because it forces you to be specific about what you're actually trying to coordinate. Not "how do we use AI" - that question has a thousand vendors with a thousand answers. The harder question: what decisions need human judgment, what can agents handle alone, and where's the boundary between the two?

The engineers running the experiments, making the gut calls, hitting the boundary problems - they're generating knowledge that matters. Enabling them is half the job. Making sure the next team inherits their rules, not just their code, is the other half.

Related reading: Why Engraph describes the governance worldview underneath the product. Code review was never about the code looks at the informal knowledge pipeline that used to transfer institutional context between engineers - and why it's breaking. And The context engineering blind spot examines the specific failure mode of authorised rules going stale in agent context windows.