GPU Programming Basics
Learn CUDA and Triton, and write efficient GPU kernels
Overview
Before diving into advanced optimizations like Flash Attention, we need the fundamentals of GPU programming. This module takes you from scratch to understanding how GPUs work and how to write efficient kernels with Triton.
This module is a prerequisite for the Systems track. We recommend completing it before Flash Attention.
Chapters
GPU Architecture Basics
Understand SIMT, memory hierarchy, and hardware limits
Tensor Layout
Go deep into memory: stride, contiguous, and view mechanics
Triton Basics: Vector Add
Write your first Triton kernel from scratch
CookLLM Docs