Why Useful Agents Need Verifiers, Not Just Bigger Models

6 June 2026 · draft · #agents #verification #reliability #architecture

MOT test for model output

Every vehicle I repair gets tested before it goes back to the customer. Not because I doubt my work — because the cost of being wrong transfers to someone else the moment the car leaves. The test isn’t an insult to the mechanic. It’s the thing that makes the mechanic’s confidence sellable.

Model output needs the same. Not a bigger model — a checkpoint. In my systems the verifier is small, cheap and dumb on purpose: it checks ten things (no leaked internals, claims grounded in the knowledge base, format correct, no invented part numbers) and it has veto power. The smart model proposes; the boring model disposes.

What’s queued for this draft:

Worker + verifier beats hero-model: real numbers from my diagnosis pipeline
Why verifiers should be boring, rule-based and slightly stupid
Veto power: what happens when the verifier and worker disagree
Where humans still sit in the loop, and why that’s a feature

On the bench. The numbers are being pulled from production logs.

Corrections, field notes and disagreement are welcome.