A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play
AI Summary
AlphaZero (Science, December 2018) generalizes AlphaGo Zero's self-play approach to any two-player zero-sum game of perfect information. A single algorithm, with identical hyperparameters, starts from random play and within hours of training achieves superhuman performance in chess (surpassing Stockfish), shogi (surpassing Elmo), and Go (surpassing AlphaGo Zero). The critical insight is that the neural network architecture and MCTS formulation are general — they don't encode domain-specific knowledge about any of the three games. AlphaZero effectively demonstrates that general reinforcement learning is real: one system, one algorithm, and self-play can produce expert-level performance across diverse domains. This was a proof of concept for the kind of generally capable learning system Hassabis has always argued is the right direction for AGI. The chess results were particularly striking: AlphaZero developed attacking chess styles very different from Stockfish's positional play, suggesting the system found genuinely novel strategic principles rather than imitating human master games.
Original excerpt
A single RL algorithm, same hyperparameters, masters three different board games. The strongest proof-of-concept that general reinforcement learning is a viable path toward general intelligence.
Frequently asked questions
What is "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" about?
AlphaZero (Science, December 2018) generalizes AlphaGo Zero's self-play approach to any two-player zero-sum game of perfect information. A single algorithm, with identical hyperparameters, starts from random play and within hours of training achieves superhuman performance in chess (surpassing Stock…
Who wrote "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play"?
"A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" was written by Demis Hassabis. It is curated in the Demis Hassabis vault on Burn 451, which covers agi · alphafold · scientific discovery.
How can I read more content from Demis Hassabis?
The complete Demis Hassabis reading list is available at burn451.cloud/vault/demis-hassabis. Each article includes an AI-generated summary so you can decide what to read in seconds. Connect the Burn 451 MCP server to Claude or Cursor to query all Demis Hassabis articles as live AI context.
Can I use "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play" with Claude or Cursor?
Yes. Install the burn-mcp-server npm package and connect it to Claude Desktop, Claude Code, or Cursor. Once connected, your AI can search and reference this article and the full Demis Hassabis vault in real time — no manual copy-paste required.
31 more articles in this vault.
Import the full Demis Hassabis vault to Burn 451 and build your own knowledge base.
Content attributed to the original author (Demis Hassabis). Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.