The Transformer Family Version 2.0

BlogApr 22, 2026

Original excerpt

Lil'Log | Posts Archive Search Tags FAQ The Transformer Family Version 2.0 Date: January 27, 2023 | Estimated Reading Time: 45 min | Author: Lilian Weng Table of Contents

Many new Transformer architecture improvements have been proposed since my last post on “The Transformer Family” about three years ago. Here I did a big refactoring and enrichment of that 2020 post — restructure the hierarchy of sections and improve many sections with more recent papers. Version 2.0 is a superset of the old version, about twice the length.

𝑋 ∈ 𝑅 𝐿 × 𝑑 The input sequence where each element has been mapped into an embedding vector of shape 𝑑 , same as the model size.

𝑊 𝑣 ∈ 𝑅 𝑑 × 𝑑 𝑣 The value weight…

Frequently asked questions

What is "The Transformer Family Version 2.0" about?

This article by Lilian Weng is part of the Lilian Weng reading list on Burn 451, covering ai safety · post-training · llm internals.

Who wrote "The Transformer Family Version 2.0"?

This piece is part of the Lilian Weng vault on Burn 451, covering ai safety · post-training · llm internals. The original author is attributed at the source link.

How can I read more content from Lilian Weng?

The complete Lilian Weng reading list is available at burn451.cloud/vault/lilian-weng. Each article includes an AI-generated summary so you can decide what to read in seconds. Connect the Burn 451 MCP server to Claude or Cursor to query all Lilian Weng articles as live AI context.

Can I use "The Transformer Family Version 2.0" with Claude or Cursor?

Yes. Install the burn-mcp-server npm package and connect it to Claude Desktop, Claude Code, or Cursor. Once connected, your AI can search and reference this article and the full Lilian Weng vault in real time — no manual copy-paste required.

10 more articles in this vault.

Import the full Lilian Weng vault to Burn 451 and build your own knowledge base.

Content attributed to the original author. Burn 451 curates publicly available writing as a reading index. For removal requests, contact @hawking520.