LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
FundamentalsTokenization

BPE Algorithm

Premium

Deep dive into Byte Pair Encoding, with manual training, encoding, and decoding

Companion Code

Log in to continue reading

This is premium content. Please log in to access the full article.

Tokenization Basics

Why tokenization? From character-level to subword-level, with Unicode and UTF-8

GPT Tokenizers

GPT-2/GPT-4 tokenization, regex pre-tokenization, and the tiktoken library

Table of Contents

Core Idea of BPE
BPE Algorithm Steps
Round 1: Find Most Frequent Pair
Round 2: Continue Merging
Iteration
Implementing BPE: Core Functions
1. Count Pair Frequencies
2. Merge a Pair
3. Train BPE
Encoding and Decoding
Encoding: Text → Tokens
Decoding: Tokens → Text
Full Example
Advantages of BPE
Summary