LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

从 token ids 到 logits

Premium

理解 Decoder-only Transformer 如何把 token ids 变成 next-token logits

Log in to continue reading

This is premium content. Please log in to access the full article.

Table of Contents

从 token ids 开始
为什么每个位置都有一组 logits?
一次 forward pass 经过哪些形状?
中间为什么一直保持 [B,T,D][B,T,D][B,T,D]?
训练和生成在哪里分开?
总结
参考资料