Extrinsic Hallucinations in LLMs

BlogPaul GrahamJun 14, 2024

Highlights

  • â–¸Gekhman et al. 2024: LLMs learn new knowledge slower than consistent knowledge during fine-tuning, and once learned, those facts increase hallucination tendency — knowledge injection is riskier than knowledge reinforcement
  • â–¸Pre-training on public web crawls guarantees outdated, missing, and incorrect information; models memorize it via log-likelihood maximization, making data quality the root cause rather than architecture
  • â–¸EntityQuestions knowledge categorization (HighlyKnown / MaybeKnown / WeaklyKnown / Unknown) uses random few-shot correct-answer probability as a measurable hallucination risk score per fact

Original excerpt

Hallucination in large language models usually refers to the model generating unfaithful, fabricated, inconsistent, or nonsensical content. As a term, hallucination has been somewhat generalized to cases when the model makes mistakes. Here, I would like to narrow down the problem of hallucination to cases where the model output is fabricated and not grounded by either the provided context or world knowledge.

1. In-context hallucination: The model output should be consistent with the source content in context. 2. Extrinsic hallucination: The model output should be grounded by the pre-training dataset. However, given the size of the pre-training dataset, it is too expensive to retrieve and…

10 more articles in this vault.

Import the full Lilian Weng vault to Burn 451 and build your own knowledge base.

Content attributed to the original author (Paul Graham). Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.