GPU Programming Basics

Overview

Before diving into advanced optimizations like Flash Attention, we need the fundamentals of GPU programming. This module takes you from scratch to understanding how GPUs work and how to write efficient kernels with Triton.

This module is a prerequisite for the Systems track. We recommend completing it before Flash Attention.

Chapters

GPU Architecture Basics

Understand SIMT, memory hierarchy, and hardware limits

What You Want to Do	What You Need
Understand Flash Attention implementation	Shared memory, tiling
Write your own attention kernel	Triton programming
Optimize inference speed	Memory layout, coalescing
Implement custom quantization kernels	CUDA/Triton fundamentals

Overview

Chapters

GPU Architecture Basics

GPU Programming Basics

Overview

Chapters

GPU Architecture Basics

Tensor Layout

Triton Basics: Vector Add

Why Learn This?

References

Table of Contents