The context engineering blind spot

Rules files go stale. Everyone knows this in the same way everyone knows they should write more tests. But the failure mode deserves closer attention than it gets, because an authorised rule that's wrong for the current context is harder to catch than a prompt injection attack. At least an attack looks suspicious.

The industry has put real resources into injection defence. Simon Willison has catalogued dozens of variants. Anthropic built constitutional AI partly to address it. Google published safety layers for Gemini. The threat model is clear: unauthorised text enters an agent's context and steers it wrong. The defences are getting better fast.

The other direction - authorised text that steers agents wrong - gets handled manually. Code review, team discussions, documentation updates. These work, up to a point. But they don't scale the same way, and there's no automated equivalent to what the injection side has built. The text was trusted, the process was followed, and the agent did what it was told in a context where what it was told no longer applied.

The ops world has understood this failure mode for years. They call it runbook rot - procedures written for one architecture that persist past their relevance and get followed during an incident they weren't designed for. GitLab's 2017 database incident is the canonical example: recovery procedures that hadn't been maintained against current infrastructure, backup methods that hadn't been tested, documented steps that didn't work when it mattered. That was nearly a decade ago. Ops teams have since built practices around runbook testing and expiry. Agent rules files have the same rot problem, running in a context where the rules execute faster, more frequently, and with less human judgment between the instruction and the outcome - and so far, none of the same practices.

The retry rule

Here's an example from a project we worked on before Engraph existed. I should say upfront: this doesn't prove the problem is widespread. It proves the mechanism exists.

A rules file tells agents to retry failed background jobs with exponential backoff. Standard resilience pattern. When the team wrote it, they were processing data sync and cache warming. Retries made those robust.

Months later, the platform adds payment processing. An agent builds a payment webhook handler, follows the retry rule, and the code review looks clean - exponential backoff on a background job is what you'd expect. The job ships.

Then a payment processor times out. The job retries. The original request also completes on the processor's side, and the customer is charged twice.

The fix is documented in Stripe's integration guides: payment webhooks need idempotency keys before retries are safe. But that knowledge lived in Stripe's docs and in a conversation between two engineers who did the original integration. It never made it into the rules file, because the rules file was written before payments existed.

Notice what happened here. The rule didn't go stale in the usual sense - it didn't become incorrect over time. It was always correct for its original scope. What changed was the codebase grew into a domain where the rule needed a qualifier that nobody had thought to add. That's not staleness. It's incompleteness. And it's harder to catch, because the rule still looks right everywhere except the one place where it matters.

A rule that's correct in the wrong context is harder to catch than a prompt injection attack, because at least an attack looks suspicious.

A constraint system doesn't prevent this the first time it happens. The first team to hit the double-charge learns the hard way. What a constraint system does is capture the lesson with enough structure that the second team, working six months later in a different session, gets the warning before repeating the mistake. The value starts at the second occurrence.

Why the injection comparison helps and where it misleads

Both authorised context and prompt injection share a mechanical property: text enters a context window and shapes what comes out. That observation is true but less useful than it sounds.

Models don't treat all text the same. Anthropic trains its models to weight system prompts more heavily than user messages through RLHF and constitutional AI training. OpenAI and Google do similar work. A system prompt and a hidden instruction in a webpage genuinely receive different treatment from the model, and those differences matter for security. Calling them "the same mechanism" dismisses real engineering that makes agents safer.

Where the comparison does hold: the model's trust boundaries distinguish authorised input from unauthorised input. They don't distinguish accurate input from inaccurate input. A rules file with full system-prompt authority gets the same model trust whether its contents are current or eighteen months past their relevance. The security infrastructure protects the channel. Nothing protects the content.

That's the gap. Not that authorised context and injection are "the same" - they're not, and framing them that way alienates the security audience whose work matters. But the defences built for unauthorised input don't address the failures of authorised input. One side has a mature, growing response. The other side doesn't.

What governance for authorised context needs

If you wanted to catch these failures systematically, what would you need to track? We've been building a system for this and have landed on five properties. Five is where we are now, not necessarily where the answer is.

Authorisation - did someone with authority approve this for agent consumption? Rules files handle this already. It's necessary but nowhere near sufficient, because the retry rule was fully authorised.

Provenance - where did it come from, and is that trail still meaningful? A constraint that was proposed by an agent after an incident, reviewed by an engineer who understood the subsystem, and accepted into the system has a real trail. A rules file committed eighteen months ago has authorship but no lifecycle context. These two properties blur when the same person proposes and reviews.

Scoping - is the context targeted to what the agent is doing? Rules files can be scoped to directories - CLAUDE.md in a subdirectory, folder-level .cursorrules - so this isn't binary. But the scoping is manual, there's no enforcement that it's correct, and it fragments across tools. This assumes the boundaries are right. They usually aren't at first, which means scoping is only as good as the map underneath it.

Lifecycle - can the context strengthen when validated, deprecate when contradicted, flag itself when it hasn't been tested against the current codebase in months? This is what rules files lack most completely.

Auditability - after the agent acts, can you trace which context influenced the result? Honest answer: nobody can do this reliably, us included. We track what was served. We can't prove a specific constraint caused a specific output.

These overlap in ways that resist clean separation. Scoping affects lifecycle - a constraint scoped to a rarely-touched subsystem ages differently than one firing every session. A sophisticated supply chain attack could score partial on several axes. The properties describe governance quality on a spectrum, not a binary between engineering and attack.

Governance	Prompt injection	RAG with metadata	Rules file	Managed constraints
Authorization	No	Yes	Yes	Partial
Provenance	No	Partial	Partial	Yes
Scoping	No	Partial	Partial	Yes
Lifecycle	No	No	Partial	Yes
Auditability	No	Partial	Partial	Partial

Yes - systematically enforced without manual effort. Partial - exists but depends on context or discipline. A rules file has provenance because it lives in git, but nobody's tracking whether that provenance is still meaningful. For managed constraints, authorisation is partial because review quality depends on who does it, and auditability is partial because tracking what was served isn't proving what influenced the output.

The gap between partial and yes is "possible if someone bothers" versus "guaranteed by the system." That guarantee has real costs - infrastructure, review burden, token overhead on every session. For a solo developer or a small team with a couple of agents, rules files are likely the right tradeoff. The governance gap matters where the cost of misalignment across agents exceeds the cost of managing it.

The doubts

The entire value proposition rests on a compounding hypothesis: if corrections are captured as constraints with provenance and lifecycle, the correction load should decrease over time. We've seen this in our own development. Correction rates have dropped as the constraint set has grown. But that's one team, on one codebase, measuring our own product. The framework-to-evidence ratio in this article is high and I'd rather acknowledge that than let the structure imply otherwise.

The curve might be logarithmic - steep early, flat later. It might plateau at a level that doesn't justify the infrastructure. We're measuring it across more teams and will publish what we find.

There's a cold start problem, though it's not the one you'd expect. You can migrate existing rules into the system on day one - the content isn't what's missing. What's missing is lifecycle signal. Migrated rules haven't been challenged by agents, validated against current code, or scoped to the subsystems where they actually apply. Some of them are already absorbed - the codebase embodies the pattern so thoroughly that no agent needs to be told. Those don't need to enter the lifecycle at all. Others are wrong and should be challenged immediately. The first real work of migration is triage, not import. The bet is that running the surviving rules through lifecycle management - where they can be challenged, deprecated, strengthened, or absorbed if the code eventually makes them self-evident - surfaces problems that a flat file wouldn't. If it doesn't, you've added infrastructure for nothing.

There's also a failure mode that mirrors the one we're trying to solve. A senior engineer proposes a constraint based on a misunderstanding, it gets reviewed and accepted, and now every agent follows confident, authoritative, wrong advice. The compounding story works in reverse here - a bad constraint compounds bad decisions across every session it touches. We've built feedback loops: agents can flag tension when a constraint contradicts the codebase, patterns of agents working around a constraint signal it might be wrong. These are mitigations, not solutions.

Then there's constraint conflicts. Two rules, both correct in isolation, that contradict each other when the work crosses boundaries. One says keep functions under 50 lines for readability. Another says never split transaction logic across files for auditability. An agent building a complex transaction handler receives both and can't satisfy them simultaneously. Severity rankings and scoping reduce how often this happens. When it does, a human decides. That limitation gets worse as the constraint set grows, and I'm genuinely uncertain whether the useful upper bound is in the hundreds, the low thousands, or whether the system creates its own governance burden past some threshold nobody can manage.

There's a counterargument that doesn't require any infrastructure at all: models are getting smarter. A sufficiently capable agent working on a payment handler might independently know that retries need idempotency keys, regardless of what the rules file says. Model reasoning improves every few months. If capability growth shrinks this problem on its own, the governance layer is solving a temporary gap.

That's plausible for well-documented patterns. Stripe idempotency is in every integration guide and training corpus. But the failures that actually hurt are organisation-specific: the internal payment processor with non-standard retry semantics, the legacy queue that silently drops messages above a certain size, the regulatory constraint that only applies in two of your twelve markets. No model update captures those. They live in incident reports, team conversations, and the memory of engineers who were there when it broke. Governance infrastructure is how that knowledge survives past the people who hold it. Model capability makes agents better at reasoning about public knowledge. It doesn't make them better at reasoning about yours.

And the most uncomfortable objection: the reviewer is still a human. This article argues that humans miss stale authorised context. Then it proposes a system where humans accept or reject constraints. We've moved the judgment from code review to constraint review - evaluating the rule directly rather than spotting its misapplication across a diff. In code review you see a rule's concrete effect in a specific context. In constraint review you're evaluating an abstraction that will apply across unknown future contexts. I'm not sure that's easier. It might be harder. The bottleneck has moved, not disappeared.

This actually complicates the framing of the whole article. I've been calling this a "blind spot," but code review, team discussions, documentation updates, and post-incident runbook revisions are all defences against stale authorised context. They're manual, they're partial, and they scale poorly with agent count - but they exist, and they work for most teams at most scales. The more honest claim is narrower: these manual defences degrade as agent count and codebase complexity grow, and nobody has built automated ones yet. "Blind spot" might overstate it. "Under-addressed scaling problem" is more accurate, if less useful as a title.

Whether this matters enough

Most software teams ship working products with rules files and code review. The failure mode I'm describing requires a longer chain than it might seem: the rule has to be wrong for the current context, the agent has to follow it rather than reason past it, the reviewer has to miss the misapplication in the diff, the tests have to not cover the edge case, and monitoring has to not catch it in production. That's five links, and any one of them breaking prevents the failure. How often does the full chain complete?

I don't have industry data. In our own development, it happened often enough to motivate building a system. But we're building tools for agent-heavy workflows, which makes us more exposed than most. A team running a couple of agents with a 200-line rules file might never hit this. The problem scales with agent count, the size of the constraint surface, how fast the codebase is changing, and how many domains it spans. If you're below some threshold on those, rules files are fine.

I'd like to quantify that threshold and can't yet. What I can say is that the number of agents per codebase is increasing, and the gap between "we have rules files" and "we have governance for what's in those files" will matter more as that number grows. Whether it matters enough right now, for your team, is a judgment call I can't make from here.

What we're building, and why the uncertainty is the point

We build a system that delivers constraints into agent context windows via hooks. Every constraint carries authorisation, a provenance trail, severity, lifecycle state, subsystem scoping, and an audit trail. The delivery mechanism is the same text-in-a-window approach that everything in this space uses. The difference is everything around the delivery.

This article has raised more questions than it's answered. Whether five properties are the right set, whether compounding holds beyond our own codebase, where constraint conflicts become unmanageable, what the cold start actually looks like for a team that isn't us. We don't have confident answers to any of those yet.

That's partly why we built the system. The only way to test whether managed constraints bend the correction curve is to run the experiment across real teams, on real codebases, and measure what happens. We're doing that now. We'll publish the results - including if they show the hypothesis is wrong, or that the overhead isn't worth it, or that model capability growth makes the whole approach unnecessary within a year. Others may be working on this too. The problem is becoming visible enough that we'd be surprised if they weren't.

If the bet pays off, the industry gets automated defences for a failure mode it's been handling manually. If it doesn't, at least we'll know why.

This article focuses on the stale-context failure mode. For the broader governance worldview, see Why Engraph. For how the informal knowledge pipeline that used to catch these failures is breaking down, see Code review was never about the code. And for what happens when organisations try to restructure around agent-heavy workflows, see The org chart was designed for humans.