Why Large Language Models Get It Wrong

We’ve all seen it happen: you ask a large language model (LLM) a seemingly simple question, and it gives a confident-sounding answer—only for you to later discover it’s entirely made up. These “hallucinations” (i.e. plausible but false statements) have long been a thorn in the side of AI systems. Recently, OpenAI published a new research insight shedding light on why these hallucinations occur—a fresh perspective that could guide how we build more reliable AI going forward. OpenAI

What Are Hallucinations (in AI)?

In the context of AI and language models, a hallucination is when a model generates content that sounds factual but is actually not grounded in reality or verifiable evidence.

For example:

It might invent references or citations.
It might assert a false biographical fact.
It could fabricate numerical or relational information.

These errors aren’t just “mistakes” — they can undermine trust, especially when AI is used in sensitive domains like medicine, law, or journalism.

“Futuristic Humanoid” — CC0 Public Domain

The New Insight: Hallucinations Are Incentive Problems, Not Mysteries

OpenAI’s new research—titled Why Language Models Hallucinate—argues that hallucinations emerge largely because of the way we train and evaluate language models.

Current training and evaluation frameworks tend to reward giving an answer rather than saying “I don’t know.”
In many benchmarks or evaluations, if a model abstains (i.e. declines to answer), it gets zero credit, while a guess has a chance of being “correct” and earning credit.
Over time, the model develops a bias: when uncertain, it’s more “beneficial” (in the metric sense) to produce a confident output—even if it’s wrong.

Put simply: these models are “taught to be test‑takers.” When the metric says “best performance is number of right answers,” the model prefers to guess rather than safely abstain. arXiv

Statistical underpinning

The paper also demonstrates that hallucinations are not mystical bugs — they follow from statistical pressures in the training pipeline. When the model cannot reliably discriminate between a wrong statement and a true statement (especially for obscure facts), errors are inevitable under these incentives.

Why This Matters (and What It Changes)

This reframing has several meaningful implications:

It shifts focus from “fixing hallucination” as a bug to “restructuring incentives.” Instead of endlessly tweaking architectures, we might rethink how models are rewarded.
We may need better evaluation benchmarks. Benchmarks should account for calibrated uncertainty, not simply “right vs wrong.”
It encourages humility in AI. Models need not always assert certainty—sometimes saying “I’m not sure” or deferring is more honest.
It doesn’t eliminate hallucinations entirely. While incentive realignment could reduce hallucinations, some level of error may remain due to intrinsic uncertainty in data and generalization.

Interestingly, there’s also prior work showing that even perfectly calibrated models (i.e. models whose stated confidence levels match actual probabilities) cannot completely avoid hallucination when dealing with rare or novel facts.

Challenges & Questions Ahead

How do you reward abstention? If you give the model free “passes” for saying “I don’t know,” it might overuse that. Finding a balance is nontrivial.
What nuance do we allow? There may be gradations—“I’m fairly confident,” “I’m uncertain,” etc.—rather than binary yes/no.
What about domain constraints? In areas like medicine or law, you may want stricter constraints or external grounding to reduce hallucination risk.
Can model architecture or training regimes still help? Incentives alone might not suffice; architectural safeguards, retrieval mechanisms, or truth-checking modules may still be needed.

Looking Forward

OpenAI’s argument is a powerful reframing: hallucinations are not just accidental byproducts of model imperfection, but emergent phenomena shaped by how we reward and evaluate models. As AI adoption continues, understanding and addressing these incentive misalignments could be crucial for building systems we can trust.

AI Just Makes Stuff Up—Here’s Why