LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
Fundamentals

Architecture (Model Architecture)

Deeply understand LLM architecture design

Overview

The Architecture module dives into core components of LLM architecture, from foundational Attention mechanisms to advanced memory-augmented modules.

This module assumes you already know the basics of deep learning. We recommend starting with Attention before exploring advanced architectures.

Chapters

Attention Mechanisms

Deeply understand Attention in Transformers, including MHA, Causal Attention, GQA, and MQA

Position Encoding and RoPE

From sinusoidal PE to rotary position embeddings: math, implementation, and length extrapolation

Learning Path

StageContentGoal
CoreAttention mechanismsMaster Self-Attention, Multi-Head, Causal Masking
CorePosition encoding and RoPEUnderstand the evolution of position encoding; master RoPE principles and implementation
OptimizationGQA/MQAUnderstand KV cache optimization and memory efficiency
AdvancedLength extrapolationMaster long-sequence methods like NTK-aware and YaRN

References

  • Attention Is All You Need
  • RoFormer: Enhanced Transformer with Rotary Position Embedding
  • YaRN: Efficient Context Window Extension of Large Language Models
  • GQA: Training Generalized Multi-Query Transformer
  • Fast Transformer Decoding: One Write-Head is All You Need

BPE Training Engineering

From toy data to real corpora: memory optimization, parallel pre-tokenization, incremental updates, and time-space tradeoffs

Attention Mechanisms

Deeply understand Attention in Transformers, including MHA, Causal Attention, GQA, and MQA.

Table of Contents

Overview
Chapters
Learning Path
References