LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
FundamentalsModel Architecture

Rotary Position Embedding

Premium

From position encoding basics to RoPE math, implementation, and length extrapolation

Log in to continue reading

This is premium content. Please log in to access the full article.

Attention Mechanisms

Deeply understand Attention in Transformers, including MHA, Causal Attention, GQA, and MQA.

Position Encoding Basics

Why Transformers need position information, and the methods and limits of absolute position encoding

Table of Contents

Overview
Chapters
References