LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
FundamentalsModel ArchitecturePosition Encoding

Position Encoding Basics

Premium

Why Transformers need position information, and the methods and limits of absolute position encoding

Companion Code

Log in to continue reading

This is premium content. Please log in to access the full article.

Rotary Position Embedding

From position encoding basics to RoPE math, implementation, and length extrapolation

RoPE Math Derivation

From complex rotations to higher-dimensional generalization, understand the core math of rotary position embeddings

Table of Contents

Permutation Invariance in Transformers
Absolute Position Encoding
Sinusoidal Position Encoding
Why sin/cos?
Learned Position Encoding
How Absolute Position Encoding Is Used
Hidden Properties of Sinusoidal PE
Dot Product Depends Only on Relative Position
Long-Range Decay
Limits of Absolute Position Encoding