Essay · AI Reliability

Why Smart Models Drift

The wrong answer that looks exactly like the right one usually comes from a question nobody answered.

By Rahul Jindal · 10 min read · Published June 21, 2026

Listen14 min
0:00
PDFMP3
0:00

One morning a routine email went to the wrong inbox. A small automation I run sends me a short status note every night. For weeks it had landed in my personal inbox, where it belongs. That night it arrived at my work address instead. Same sender, same note, same content, correct in every word. Just the wrong door.

The interesting thing is what did not happen. Nothing broke. No error, no crash, no bad logic. The model behind the automation is a strong one, and it had sent that exact note to the right place dozens of times. Asked point blank which inbox was correct, it would have told you the right one without hesitation. And yet, on that run, it picked the other.

That gap is worth understanding, because it is the gap that produces most of the surprises people hit with capable AI. The failures that matter now are rarely the model being dumb. They are the model being perfectly reasonable about a question no one actually answered.

It was not a mistake in the usual sense

When we say a person made a mistake, we usually mean they lacked something. The knowledge, the skill, the attention. The fix is to supply the missing thing. Teach them, train them, slow them down.

None of that applies here. The model had the knowledge. It had done the task correctly far more often than not. It was not rushing or tired, because it does not rush or tire. So the ordinary mental model of a mistake leads you to the wrong repair. You go looking for a flaw in the model, and there is not one to find.

The flaw was not in the model. It was in the instruction the model was following.

The rule it ran said, in effect, send me the note. It never said which address. And in the same set of rules, all three of my real addresses were listed as allowed. So at the moment the model had to produce a recipient, it was standing in front of an open question with several valid answers and no stated preference among them. It had to pick one. It picked.

Every run starts from zero

Here is the part that surprises people. The model did not remember that it had sent the note to my personal inbox the night before, or the forty nights before that. It was not building a habit. Each run starts fresh from the instructions in front of it. Nothing carries over unless it is written down where the next run can read it.

So the consistency I had seen for weeks was not a settled convention. It was the same likely answer being drawn again and again. When a question has one obviously-most-natural answer, you get that answer most of the time, and it looks like a rule. It is not a rule. It is a tendency.

Forty correct runs in a row were not a habit. They were forty independent draws that happened to land the same way.

This reframes the whole thing. The drift was not the model changing its mind or getting worse. The model on night forty-one was the same as the model on night one. What I had mistaken for a fixed behavior was a coin that lands one way most of the time. Run it enough nights and the other face shows up. It was not a question of if. It was a question of when.

That is the first hard lesson. A behavior you have seen hold a hundred times, but never actually pinned down in the rules, is not guaranteed. It is probable. The day it breaks, nothing changed except which way the draw came out.

The unanswered question is an open slot

Think of every task as a form with blanks. Some blanks are filled in by the instructions. The rest, the model fills in itself from whatever is around it, by reaching for the most plausible value in context.

Most of the time that works beautifully. It is exactly why these models feel so capable: they fill the unstated blanks the way a sensible colleague would. But plausible and correct are not the same thing, and they come apart precisely at the blanks no one filled in.

My work address was a plausible value. It is a real address of mine. It was on the allowed list. And it appeared several times elsewhere in the same instructions, for a completely different reason, as part of a search filter. None of that made it the right place to send the note. But all of it made the model a little more likely to reach for it. The address was sitting right there in context, looking relevant, and the model does not perfectly separate "this address is here so I can search for it" from "this address is somewhere I should send things." Presence is suggestion.

An unfilled blank is not empty. It quietly inherits whatever in the surroundings looks most fitting.

A smarter model does not fix this. It can make it worse

The instinct is to assume a more capable model would drift less. Often the opposite is true, and the reason is worth sitting with.

A weaker model that had only ever seen my personal address would have little to reach for. One option, low chance of going astray. A stronger model knows more. It knows my work address is genuinely mine. It knows the note would reach me either way. It can construct a perfectly good story for any of the three. The richer the model, the more options look reasonable at exactly the points where the instruction is silent. Capability widens the field of plausible answers rather than narrowing it.

Intelligence removes the errors of not knowing how. It does nothing for the errors of not being told which.

So intelligence cuts one kind of error and leaves another untouched. The errors of competence go down: a strong model will not mangle the message or invent an address that does not exist. The errors of specification do not. If the rule leaves a load-bearing blank, no amount of raw capability fills it correctly, because the correct value is not a fact about the world the model can reason its way to. It is a preference that only the person who wrote the rule holds.

There is a sting in the tail. Because a strong model writes fluently and acts confidently, the errors that do slip through arrive looking exactly like the work that was right. My note was flawless except for the destination. There was no garbled text to tip me off, no obvious tell. The better the model, the more its rare mistakes hide inside polished, confident output, and the harder they are to catch. The error budget moves from loud and obvious to quiet and plausible. That is a worse failure mode for a human reviewer, even when it happens less often.

Why "be more careful" does nothing

The tempting fix is to tell the model to pay closer attention. To double-check the recipient. To be careful.

It does not help, and the reason is precise. The model was already careful. Its care was aimed at a target with a hole in it. Telling a careful actor to be more careful about an underspecified instruction just produces more careful guessing. You can lower the odds of a bad draw with enough reminders, but you cannot reach zero, because the thing generating the variation is still there. The blank is still blank. Care does not fill it. It only weights the dice.

This is the same reason you cannot make a coin land heads by asking the flipper to concentrate. The randomness is not in the flipper's attitude. It is in the setup. If you need heads every time, you do not coach the flipper. You stop flipping.

The fix is to remove the choice

The repair that actually worked was one line. Write the correct address into the rule and forbid the model from choosing. No discretion, no draw, no chance to inherit the wrong value from context. The blank is now filled, permanently, by the person who knew the answer.

That is the whole principle. The reliability you want does not come from the model deciding well every single time under pressure. It comes from not asking the model to decide the things that must not vary. Put the determinism where it belongs, in the rules and the surrounding system, and let the model do the open-ended work that actually needs judgment.

The standard for a serious automation, then, is not how smart the model is. It is how few load-bearing blanks you have left for it to fill. A capable model with a sloppy instruction is a fast, fluent, confident source of occasional surprises. A capable model with the load-bearing parts nailed down is a tool you can trust.

Which blanks are load-bearing

You do not want to pin everything. Pin the wording of every sentence and you have thrown away the reason you reached for a model at all. The craft is telling the two kinds of blank apart.

A load-bearing blank is one where a wrong fill causes real harm and cannot be shrugged off. Who an email goes to. A price. The account a payment lands in. The target of a command that deletes things. The single record a database write updates. These have a correct answer that the model cannot derive on its own, and a wrong answer that costs you. Pin them. Take the choice away.

A free blank is one where variety is harmless or even wanted. How a summary is phrased. Which example illustrates a point. The order of an explanation. Here the model's judgment is the whole value. Leave it open and let the model run.

Almost every reliability failure I have seen with capable AI is the same shape underneath. A load-bearing blank was left open, the model filled it from plausibility, and for a long stretch the plausible answer was also the right one, until the night it was not. The bug is not in the model. The bug is in the boundary. Someone drew the line between "the system decides this" and "the model decides this" in the wrong place, and put a decision that had to be fixed on the model's side of it.

The central question in building with AI is not how smart the model is. It is which decisions you are still asking it to make.

This is also a management problem

Step back from the machine and the pattern is familiar, because it is how people and teams fail too.

Most teams run on tendencies that everyone treats as rules. We always send the report to finance first. We never ship on a Friday. We loop in legal on anything customer-facing. None of it is written down. It holds because the people who know keep choosing it, the same way a likely answer keeps getting drawn. Then someone new fills the role, or the usual person is out, or the context shifts just enough, and the unwritten rule that held for years quietly fails. Nobody decided to break it. The draw just came out the other way.

The fix in an organization is the same as the fix in the automation. For the decisions that must not vary, you do not rely on people caring more or remembering better. You remove the discretion. You write the rule down, you build the step into the process, you make the safe path the default and the dangerous one require a deliberate act. The reliable institutions are not the ones with unusually careful people. They are the ones that took the load-bearing choices off the individual and put them into the system, so an ordinary person on an ordinary day produces the safe result without having to be a hero about it.

That is the deeper reason this small email matters. It is a clean, harmless version of the thing that takes down far bigger systems. A behavior everyone counted on, that no one had actually fixed in place, holding right up until the moment it did not.

What to take from a wrong email

The smartest model you can buy will still, now and then, do the wrong thing on a task it has done right a hundred times. Not because it got worse. Because somewhere in the instruction there was a blank, and on that run it filled it with something plausible and wrong.

You will not fix that by waiting for a better model. The next model will be more capable and will fill the unstated blanks even more convincingly, which makes the rare miss harder to spot, not easier. You fix it by doing the unglamorous work of finding the load-bearing blanks in your own systems and filling them yourself, before the draw does it for you.

Intelligence is the easy half. The reliability is in deciding, on purpose and in advance, what the model is no longer allowed to choose.