Transformer LM

From token ids to next-token logits — building an overall mental model of the Decoder-only Transformer

Overview

The Transformer LM series answers one concrete question: when we call model(input_ids), what actually happens inside the model?

This series first establishes the overall forward pass of a Decoder-only Transformer, then takes apart the key interfaces — embedding, LM head, decoder block, residual stream, and so on. Attention, RoPE, RMSNorm, and SwiGLU are expanded on in later topics; here we first put them back into the data flow of a complete language model.

Stage	Content	Goal
Overall	From token ids to logits	See clearly the inputs, outputs, shapes, and the training/inference fork point
Input/Output	Embedding and LM Head	Understand how token ids enter the model and how hidden states become logits

Overview

Chapter Contents

Transformer LM

Overview

Chapter Contents

From token ids to logits

Embedding and LM Head

Learning Path

References

Table of Contents