How to Build Trustworthy AI That Does Not Make Things Up
Every language model is, at heart, the most confident person in the room who has never once been right about everything they said. Ask it for a statistic and it will hand you “73% of buyers prefer this” with the calm authority of someone who has never met a buyer. Nobody ran the survey. The number was born, fully grown, the instant you asked for it.
This is not a habit you can scold out of it. Confidence is the one resource these models never run low on. So I stopped treating the invented facts as a surprise and started treating them like weather: fully expected, occasionally inconvenient, and something a sensible person builds a roof for. Trustworthy AI is not AI that never lies. It is AI that catches itself lying before the words reach a human being.
The short version: Language models invent facts with total confidence, and a polite “please don’t make things up” in the prompt is a sticky note, not a lock. So I put a verifier between the model and the human. It runs three passes: a scan for suspicious numbers and stock phrases, a check that every claim traces back to facts you actually provided, and a second model whose only job is to catch the first one bluffing. Fail any pass and the draft gets rewritten or quietly killed, never shown. A second rule makes it physically impossible for the model to approve its own work. Generation is automated. The final yes stays human.
Picture the moment it costs you. A client opens your draft, and there in the second paragraph sits “73% of buyers prefer this.” Nobody ran that survey. The number wandered in off the street and made itself at home. The client may not catch it outright, but they feel it, and now every other sentence in the document is a suspect in a lineup. They stop reading your work and start interrogating it. A tool you have to audit costs more time than it saves, which makes it a very expensive way to feel productive. So I build the audit into the machine, where it never gets tired and never gets talked out of it.
Why most AI content makes things up
Most AI products run on a strategy I would charitably call generate and hope. The model writes the thing. The screen shows you the thing. You, a human with a finite afternoon, are the only thing standing between the thing and the wider world. That arrangement works beautifully right up until the model invents a fact, at which point the whole draft goes from asset to allegation.
The popular fix is to ask nicely. People type “do not make up facts” into the prompt and feel the warm glow of having handled it. I understand the instinct. Mechanically, it is a bit like taping a note that reads “please do not eat this” to a sandwich and leaving it alone with a Labrador. It helps slightly. It does not hold. A line in a prompt is a request, and a model honors requests about as reliably as the rest of us honor New Year’s resolutions. A gate is a different animal. A gate is code that can say no and mean it.
I run a content studio for a social media business. A person plans the week. The system writes the captions and generates the images for each post. A person approves before anything goes out, and once approved, it publishes to the networks on schedule. The part that earns trust is not the writing. It is the unglamorous checkpoint in the middle, between the moment the model finishes a draft and the moment a human first lays eyes on it. When the model finishes, the draft does not go to a person. It goes to a checker first, and the checker is not in a generous mood.
| ”Just ask it nicely” | A real verifier | |
|---|---|---|
| What it is | ”Do not make up facts” in the prompt | Code that reads every claim and can refuse it |
| How it behaves | A request the model honors when convenient | A gate that does not negotiate |
| When it fails | Quietly, in front of your client | Never, because the bad draft never ships |
| Who ends up checking | You, forever | The machine, every single time |
A separate verifier that rejects invented facts
The checker runs three passes, and they get progressively less forgiving.
The first pass is fast and catches most of the damage. It scans the text for the usual fingerprints of a fabrication: any percentage, any dollar amount, a suspiciously specific year, and a short rogues’ gallery of phrases like “studies show,” “experts agree,” and “3 out of 5.” These are the costumes a made-up fact likes to wear.
The second pass is stricter and a little ruthless. It takes every concrete claim in the post and checks it against the only things the model was ever allowed to know: what the operator typed in, and the brand’s own saved facts. If a number or a name turns up in the post but not in that trusted input, it gets flagged. The model does not get to reach into its own memory and produce a statistic like a magician pulling a coin from behind your ear. It works with the facts you gave it, or it works with nothing.
The third pass is a second model whose entire job is to read the post against the input and list anything the input does not support. One model writes. A different model reads the first one’s homework and circles the parts it suspects were invented on the bus that morning. They are not friends. That is the point.
If any pass finds a problem, the post is not quietly softened and shipped with an asterisk. It is sent back. The system rewrites it once, with the exact violations pasted in as a do-not-include list, which is the machine equivalent of “try again, and this time leave out the part you made up.” If the rewrite comes back clean, it moves on. If it fails twice, the job stops and logs why it gave up. The human never sees the bad draft. They never meet the invented 73%. They get a clean post, or they get nothing, and nothing is the safe failure.
That is the whole principle, and it is worth saying plainly. The model is allowed to be wrong. It is not allowed to be wrong in front of a person.
Why the model never approves its own work
There is a second gate that matters just as much, and it is refreshingly blunt: the model cannot approve its own work. Not “is discouraged from.” Cannot. The code that saves a generated post locks it into an awaiting-approval state and flatly refuses to let the model mark it approved, scheduled, or posted. A real person clicks approve. Only then does anything go live. The machine does the labor and the human keeps the judgment, the same way you might let an intern draft the entire report and still not let them sign off on their own performance review.
The same logic runs in reverse, which is the part I am quietly proud of. When a reviewer marks a post for revision and writes what is wrong with it, the system reads that note, rewrites the post from it, and then runs the same three-pass checker on the new version before it dares come back. The correction gets verified too. Revision is a verified loop, not a fresh roll of the dice and a hopeful smile. And if the human moved the post somewhere else while the rewrite was still chugging along, the system throws its own rewrite in the bin rather than stomp on a decision a person already made. The human’s last word stays the last word.
What trustworthy AI actually requires
People keep asking whether AI can be trusted. That is the wrong question, and a slightly funny one, because a model on its own is a confident guesser, and you would no sooner put an unsupervised confident guesser in front of a paying client than you would let one cater your wedding.
The useful question is not whether to trust the model. It is what you build around it. Trustworthy AI is three things holding hands: generation, plus a verifier that rejects invented facts, plus a human who owns the final yes. Drop any one of them and you are back to hoping. Keep all three and you have something you can actually run a business on. Generation gives you speed. The verifier gives you a draft that does not lie. The human gives you a name to put on the output, and a person who meant to put it there.
So the next time you size up an AI tool, look straight past the writing, which is the easy part everyone loves to show off, and go find the part that checks the writing. If there is no separate verifier and no human gate, then congratulations, you are the verifier, and it is a job with no end date and no overtime. Build the check instead, and the model gets to move fast while you get to keep your good name.
The model is not the product. The thing that checks the model is the product.
Frequently asked questions
Why do AI models make things up (hallucinate)?
Language models generate the most plausible-sounding next words, not verified truth. When a real fact is missing, they produce a convincing stand-in anyway, because fluent confidence is what they are built for. The invented statistic is not the model malfunctioning, it is the model doing exactly what it does, aimed at a question it could not actually answer.
Can you stop AI hallucinations with a better prompt?
A better prompt helps a little, but it is a request, not a guarantee. Telling a model “do not make up facts” lowers the odds without removing them, and “lower odds” is not something you want to stake a client relationship on. Reliable systems put a verification step outside the prompt, in code that can reject a bad draft rather than merely discourage one.
What is an AI verifier or fact-checking layer?
It is a separate step that inspects the model’s output before any human sees it. A practical version scans the draft for suspicious claims, confirms that every fact traces back to trusted input you actually provided, and uses a second model to flag anything unsupported. If the output fails, it gets rewritten or blocked instead of published.
Why should a human approve AI output before it publishes?
Because accountability has to live with a person. Automating the writing is fine, but someone needs to own what ships. A good system makes it impossible for the model to approve its own work, so generation stays fast and automated while the final decision, and the responsibility for it, stays human.
What does “human in the loop” mean?
It means a person sits at a deliberate decision point in an automated process, holding a yes or no that the machine cannot grant itself. The system does the heavy lifting and proposes a result, and the human reviews and approves before anything becomes real.