LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
SystemsGPU Programming Basics

GPU Architecture Basics

Premium

Understand GPU design philosophy, the SIMT model, and hardware hierarchy mapping to build parallel intuition.

Companion Code

Log in to continue reading

This is premium content. Please log in to access the full article.

GPU Programming Basics

Learn CUDA and Triton, and write efficient GPU kernels

Tensor Layout

Understand physical memory layout, strides, view vs reshape, and gradient tracking.

Table of Contents

The Core Tension: Latency vs Throughput
Heterogeneous Computing
Transistor Economics
Task Division
From Graphics to AI: The Compute Evolution
End of Moore’s Law and Parallelism
CUDA: The Key to General Compute
Tensor Cores: Built for AI
SIMT: Single Instruction, Multiple Threads
Goodbye to Loop Thinking
Why Bounds Checks?
Hardware Hierarchy: Grid, Block, Thread
Hierarchy Mapping
Query Hardware Limits
Scaling Up: Global Index Computation
Multi-dimensional Mapping: Toward Matrices
2D Indexing
Why This Matters
Summary