Systems
GPU Programming Basics
Learn CUDA and Triton, and write efficient GPU kernels
Overview
Before diving into advanced optimizations like Flash Attention, we need the fundamentals of GPU programming. This module takes you from scratch to understanding how GPUs work and how to write efficient kernels with Triton.
This module is a prerequisite for the Systems track. We recommend completing it before Flash Attention.
Chapters
GPU Architecture Basics
Understand SIMT, memory hierarchy, and hardware limits
Tensor Layout
Go deep into memory: stride, contiguous, and view mechanics
Triton Basics: Vector Add
Write your first Triton kernel from scratch
Why Learn This?
| What You Want to Do | What You Need |
|---|---|
| Understand Flash Attention implementation | Shared memory, tiling |
| Write your own attention kernel | Triton programming |
| Optimize inference speed | Memory layout, coalescing |
| Implement custom quantization kernels | CUDA/Triton fundamentals |
CookLLM Docs