Skip to content
Back to blog

Code review was never about the code

Engraph||8 min read

Ankur Jain recently argued that code review is a historical approval gate that no longer matches the shape of the work. He's right. Teams with high AI adoption merge 98% more pull requests while review time increases 91%. And the velocity gains come with a cost: research on Cursor adoption in open-source projects found a statistically significant but transient increase in development speed alongside a substantial and persistent increase in code complexity and static analysis warnings. The approval gate is overwhelmed, and what's getting through is rougher than before.

Jain proposes layers of trust to replace it: deterministic guardrails, behaviour-driven specifications, scoped permissions, adversarial verification. These are good ideas and some of them will work. But they're solving for code review as a quality checkpoint. There's a second function that gets less attention.

Code review was the last informal pipeline for tribal knowledge. The senior engineer who writes "we tried that caching strategy in Q3 and it caused a cascade failure" in a PR comment - that's organisational intelligence being transmitted. Not the style nits, not the null check flags, not the test coverage reminders. The one comment in seventeen that carries the reason the architecture looks the way it does.

That pipeline is what's breaking, and most of the proposals for replacing code review don't address it.

The hidden function

Code review was never designed to transfer institutional knowledge. It was designed to catch bugs and enforce consistency. The knowledge transfer happened as a side effect - a senior engineer happened to be reviewing, happened to have the context, and happened to write it down in a PR comment that nobody would ever read again.

A comparative study of knowledge transfer in human-human versus human-AI pair programming found that while AI copilots facilitate a similar frequency of knowledge transfer episodes, developers accept AI suggestions with less scrutiny - averaging 9.4 utterances per episode compared to 14.4 in human-human sessions. The depth is different. Code review had the same asymmetry: the transfer happened, but with less depth than it would have in a dedicated knowledge-sharing session.

This worked, in the way that a lot of accidental infrastructure works: unreliably, unevenly, and with no backup when the person holding the knowledge leaves. The bus factor on most teams' tribal knowledge is one or two people. Knowledge silos between teams only broke open during cross-team reviews, which most organisations avoid because they're slow. Rubber-stamp approvals - and every honest team admits they happen - transferred no knowledge at all.

So when I say "code review was never about the code," I'm not saying the quality gate didn't matter. It did. I'm saying the knowledge pipeline mattered more, and we didn't notice because it was invisible. We measured review in terms of approval time, comment density, defect escape rate. We never measured how much institutional context moved from one engineer's head to the codebase's shared understanding through the review process. We optimised what we measured - faster approvals, fewer required reviewers, automated style checks - and accidentally optimised away the part that was hardest to replace.

What's actually changing

AI is accelerating a problem that already existed. The tribal knowledge pipeline was fragile before agents arrived. But two things are making it worse.

First, code velocity is outpacing review capacity. More PRs mean less time per review, which means the one-in-seventeen comment that carries real organisational context is even less likely to get written. Reviewers triage. The first thing cut is the explanatory comment that would have saved someone else a week.

Second, AI code review tools can check correctness but not organisational context. They can flag a missing null check, an unused import, a test gap. They cannot say "that approach caused an outage last quarter - here's why" because that knowledge lives in post-mortems, Slack threads, and the heads of the people who were on-call that night. Having access to the git history isn't the same as understanding why the code looks the way it does. Recent research on AI-generated "silent" pull requests - PRs merged without any review comments - found they introduce substantial increases in complexity and quality issues. The knowledge pipeline isn't just degraded. For a growing share of PRs, it's absent entirely.

I should be careful not to overstate this. People ARE trying to encode organisational knowledge for agents. Rules files, CLAUDE.md, system prompts, architectural decision records - these are real attempts, and they work. Context windows are growing, RAG is improving, and a well-maintained CLAUDE.md with folder-level scoping gives agents meaningful project context. The problem isn't that "nobody is encoding WHY." It's that the encoding is manual, it's write-once in practice, and it doesn't capture the knowledge that only surfaces when someone makes a mistake in a context nobody anticipated.

Corrections as the primary signal

That last point is worth dwelling on. The most valuable organisational knowledge doesn't start as documentation. It starts as a correction. Not like that - use the Money type for all amounts in the payments domain. That service looks stateless but it caches auth tokens in memory, so you can't scale it horizontally without a session store.

These aren't the kind of rules anyone writes down before the mistake happens. They emerge from experience - from the developer who's corrected the same pattern enough times that it's become muscle memory. Specs capture what you know upfront. Corrections capture what you learn along the way. Both matter. But corrections are the ones that evaporate.

Right now, a correction happens in a session and disappears when the window closes. The same mistake, the same explanation, repeated in the next session, by the next agent, for the next developer. The knowledge exists - it's just not persisting past the conversation where it was first articulated.

Research on context infrastructure for AI agents has documented this pattern across hundreds of development sessions: corrections that encode real organisational decisions, delivered once and then lost. The paper found that capturing these corrections with structure - provenance, scoping, lifecycle - reduced repeated mistakes across sessions. Whether that reduction justifies the overhead of a dedicated system is a separate question, and one I'll get to.

The doubts

Maybe the tribal knowledge pipeline wasn't that valuable. Most software teams ship working products without formal knowledge management. Senior engineers leave and teams adjust. Documentation rots and teams compensate. The informal pipeline through code review was leaky and unreliable, and teams built around its limitations. Maybe the loss of that pipeline, accelerated by AI, is less dramatic than this article implies.

Maybe better agents solve this without dedicated infrastructure. I acknowledged above that context windows and RAG are already improving. Take that trajectory further: a sufficiently capable agent that can reason across git history, PR comments, incident reports, and Slack threads might reconstruct the organisational context on its own, without anyone having to capture it first. If models get good enough at inferring the decisions behind the architecture - not just what the code does but what it's protecting against - the correction pipeline becomes unnecessary. I don't think we're there yet, but "not yet" is a weak foundation for building infrastructure.

Maybe Jain's five layers handle this too. If behaviour-driven specifications are detailed enough to encode the "why" behind architectural decisions, and if adversarial verification is sophisticated enough to catch violations of institutional conventions, then the approval gate replacement also replaces the knowledge pipeline. The two functions might not need separate solutions. Jain gestures at this when he writes "it's our job to encode in the constraints." If teams do that encoding well enough through specs and guardrails, a separate correction-capture system is redundant.

And there's a more basic objection: maybe formal knowledge transfer is worse than informal knowledge transfer. The PR comment that says "don't do this - it took down the billing service for three hours" carries context, tone, and judgment in a way that a structured constraint might not. Formalising knowledge strips nuance. The senior engineer's comment includes implicit signals about confidence level, scope of applicability, and how seriously to take the warning. A codified rule flattens all of that into a severity label and a rationale field. Something is lost in the translation.

I don't have a confident answer to any of these. What I can say is that the correction pattern is real - the same mistakes do get repeated across sessions - and that informal transfer mechanisms don't survive the transition to agent-heavy workflows where review happens less frequently and the reviewer may itself be an AI. Whether formalising the correction pipeline is worth the overhead depends on how many agents you're running, how fast your codebase is evolving, and how much institutional knowledge exists only in people's heads. For some teams, the answer is clearly no.

What comes next

The industry is focused on replacing code review as an approval gate. That work is important. Jain's layers of trust, automated testing, behaviour-driven specs - these will make agent-generated code more reliable.

But the knowledge pipeline needs its own answer. Not necessarily a separate product or a new category of infrastructure. Maybe it's better tooling around existing rules files - automated staleness detection, correction capture built into the editor, lifecyce tracking. Maybe it's something more structured. The MentorScripts concept from recent research on agentic software engineering points in a similar direction: machine-readable rulebooks that evolve with the codebase.

We're building one answer to this - a system that captures corrections as scoped, lifecycled constraints and serves them back to agents in future sessions. We think the correction-to-constraint pipeline is the right abstraction, but we're early enough that we might be wrong about the shape of the solution even if we're right about the problem.

What I'm more confident about is the problem itself. Code review did two things. The industry is replacing one of them. The other one is still mostly unaddressed, and it matters more as teams scale up their agent usage and the informal channels that used to carry institutional knowledge carry less of it.

If this resonates, Why Engraph lays out the governance worldview underneath the product. The context engineering blind spot examines a specific failure mode - authorised rules going stale - that's closely related to the knowledge pipeline problem described here. And The org chart was designed for humans looks at what happens when organisations try to reorganise around these challenges.


Further reading