LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
SystemsGPU Programming Basics

Triton Basics: Vector Add

Premium

Learn Triton’s programming model through a simple vector add example.

Companion Code

Log in to continue reading

This is premium content. Please log in to access the full article.

Tensor Layout

Understand physical memory layout, strides, view vs reshape, and gradient tracking.

Flash Attention

Deeply understand Flash Attention principles and Triton implementation

Table of Contents

SPMD Programming Model
Build the Kernel Step by Step
Step 1: Identify Yourself
Step 2: Compute Offsets
Step 3: Handle Boundaries
Step 4: Load, Compute, Store
Full Kernel Code
Launching the Kernel
Verify Correctness
Summary