Sitemap

Why LLM Hallucinates

4 min readSep 9, 2025
Press enter or click to view image in full size

Why Language Models Hallucinate: The Hidden Cost of Always Guessing

Ever asked ChatGPT something and got a confident — but wrong — answer?

That isn’t just a quirky bug. According to new research from OpenAI and Georgia Tech, hallucinations are an inevitable consequence of how we train and test large language models (LLMs).

Let’s unpack why this happens, what makes it worse, and how researchers think we can fix it.

What’s Really Happening When AI “Makes Things Up”

Language models don’t have a mental picture of the world — they’re statistical machines that learn to predict the next word given all previous words.

That means:

  • They don’t know “truth” — they know “likelihood.”
  • If multiple answers look statistically plausible, the model picks the one it believes is most likely.
  • If there is no strong signal, it still produces a guess because that’s what its training taught it to do.

A Technical View: Loss Functions Encourage Guessing

Most LLMs are trained with maximum likelihood estimation — they try to maximize the probability of producing the correct token sequence given the training data.

During fine-tuning, especially with Reinforcement Learning from Human Feedback (RLHF), models are rewarded for giving useful answers. But there’s usually no reward for saying “I don’t know.”

Mathematically, this means:

  • The model gets a positive gradient for giving any answer that sounds correct.
  • There’s no negative gradient specifically pushing it toward abstaining.
  • Over time, the model learns that guessing confidently maximizes its expected reward.

Why This Matters: The Trust Gap

Imagine two doctors:

  • Doctor A guesses even when unsure, speaking confidently.
  • Doctor B admits uncertainty and runs more tests before answering.

Who would you trust with your health?

Right now, most LLMs behave like Doctor A. They output confident-sounding but unverified answers, which can be dangerous in:

  • Healthcare: Wrong drug dosages, misdiagnoses
  • Law: Incorrect interpretation of statutes or case law
  • Finance: Fabricated market data or investment advice

The danger isn’t just that they can be wrong — it’s that they don’t signal when they might be wrong.

Root Causes of Hallucination

Researchers identify three main contributing factors — let’s go deeper into each.

1. Training Incentives: Overconfidence by Design

Training minimizes prediction error, not calibration error.
This means a model might output:

Q: Who discovered penicillin?
A: Albert Einstein.

…because “Albert Einstein” was statistically likely in some patterns the model saw — not because the model is deliberately lying.

And since there’s no penalty for being confidently wrong, the model’s probability distribution sharpens around a single answer.

2. Benchmark Design: Rewarding Wrong Answers

Benchmarks like MMLU, TriviaQA, and others typically evaluate models with exact match accuracy — did the model output the correct answer?

If a model says “I don’t know,” it scores the same as if it gave a wrong answer — zero points.

So from the model’s perspective:

  • Guessing might earn points.
  • Admitting ignorance never earns points.

Over millions of training iterations, this creates a bias toward guessing.

3. Imperfect Data and Statistical Noise

Even if we had perfect, curated data, some tasks are inherently statistically hard — think of:

  • Rare facts (obscure historical dates)
  • Ambiguous questions (“What is the biggest bank?” — by assets? by branches?)
  • Changing truths (who is the current Prime Minister?)

This means hallucinations are not just a side effect — they are mathematically inevitable under current training paradigms.

Fixing the Problem: A Socio-Technical Approach

The paper argues that hallucinations aren’t just a “model issue” — they’re a system design issue.

Redesign Benchmarks

Include a “decline to answer” option in evaluation datasets.
Reward models for abstaining when they are below a certain confidence threshold.

This would shift the model’s training gradient:

  • High confidence + correct → reward
  • Low confidence + abstain → partial reward
  • High confidence + wrong → strong penalty

Incorporate Calibration Metrics

Instead of measuring only accuracy, measure calibration — how well the model’s predicted probability matches reality.

For example, if a model says “I’m 70% sure,” it should be right about 70% of the time.

Metrics like Brier score or Expected Calibration Error (ECE) can quantify this and guide fine-tuning.

Use External Tools to Reduce Hallucination

  • Retrieval-Augmented Generation (RAG): Pull in fresh information from trusted sources.
  • Chain-of-Thought Prompting: Break reasoning into explicit steps to catch errors earlier.
  • Confidence Calibration: Train the model to output probabilities alongside answers.

These techniques don’t just patch hallucinations — they give users better visibility into when to trust the model.

The Big Picture

Hallucinations aren’t just random glitches. They are the logical outcome of optimizing for accuracy without accounting for honesty.

If we want trustworthy AI, we need to:

  • Fix training incentives
  • Update benchmarks to reward abstention
  • Measure calibration, not just accuracy
  • Combine models with retrieval and reasoning tools

“The next generation of AI shouldn’t just be smarter — it should be self-aware about what it doesn’t know.”

Your Turn

Would you prefer a chatbot that always answers but occasionally makes things up, or one that sometimes says “I don’t know” but is more reliable when it does answer?

How should an AI communicate uncertainty — confidence scores, warning labels, or even visual cues?

Let’s talk about it in the comments.

--

--

Daniel Foo
Daniel Foo

Written by Daniel Foo

Director of Engineering | MBA | Developer, Writer

No responses yet