Thinking about High-Quality Human Data
Original excerpt
Special thank you to [Ian Kivlichan for many useful pointers (E.g. the 100+ year old Nature paper “Vox populi”) and nice feedback. 🙏 ]
High-quality data is the fuel for modern data deep learning model training. Most of the task-specific labeled data comes from human annotation, such as classification task or RLHF labeling (which can be constructed as classification format) for LLM alignment training. Lots of ML techniques in the post can help with data quality, but fundamentally human data collection involves attention to details and careful execution. The community knows the value of high quality data, but somehow we have this subtle impression that “Everyone wants to do the model work,…
Frequently asked questions
What is "Thinking about High-Quality Human Data" about?
This article by Lilian Weng is part of the Lilian Weng reading list on Burn 451, covering ai safety · post-training · llm internals.
Who wrote "Thinking about High-Quality Human Data"?
This piece is part of the Lilian Weng vault on Burn 451, covering ai safety · post-training · llm internals. The original author is attributed at the source link.
How can I read more content from Lilian Weng?
The complete Lilian Weng reading list is available at burn451.cloud/vault/lilian-weng. Each article includes an AI-generated summary so you can decide what to read in seconds. Connect the Burn 451 MCP server to Claude or Cursor to query all Lilian Weng articles as live AI context.
Can I use "Thinking about High-Quality Human Data" with Claude or Cursor?
Yes. Install the burn-mcp-server npm package and connect it to Claude Desktop, Claude Code, or Cursor. Once connected, your AI can search and reference this article and the full Lilian Weng vault in real time — no manual copy-paste required.
10 more articles in this vault.
Import the full Lilian Weng vault to Burn 451 and build your own knowledge base.
Content attributed to the original author. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.