LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
FundamentalsModel ArchitecturePosition Encoding

RoPE Implementation

Premium

Inverse frequency computation, cos/sin caching, and a vectorized apply_rotary_pos_emb

Companion Code

Log in to continue reading

This is premium content. Please log in to access the full article.

RoPE Math Derivation

From complex rotations to higher-dimensional generalization, understand the core math of rotary position embeddings

Length Extrapolation

NTK-aware Scaling, YaRN, and other methods to let RoPE handle longer sequences

Table of Contents

Inverse Frequency Precomputation
Two Implementation Styles
Interleaved Style (Original Paper)
Pairing and Rotation Matrix
Pairwise Loop
Complex-Multiply Vectorization
Split-Halves Style (HuggingFace Transformers)
Pairing and Rotation Matrix
rotate_half Vectorization
Vectorized apply_rotary_pos_emb
Equivalence of the Two Styles
A Full RoPE Module
Integrating RoPE Into Attention
Working With KV Cache
Summary