LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)
SystemsGPU Programming Basics

Tensor Layout

Premium

Understand physical memory layout, strides, view vs reshape, and gradient tracking.

Log in to continue reading

This is premium content. Please log in to access the full article.

GPU Architecture Basics

Understand GPU design philosophy, the SIMT model, and hardware hierarchy mapping to build parallel intuition.

Triton Basics: Vector Add

Learn Triton’s programming model through a simple vector add example.

Table of Contents

What Is a Tensor?
Key Concept: Strides
Vector Example (1D)
Matrix Example (2D)
Contiguity Explained
What Breaks Contiguity?
What Happened?
Why Non-contiguous?
View vs Reshape: A Performance Pivot
view(): Zero-copy, But Restricted
reshape(): Smarter, Safer
Gradient Tracking: Clone, Detach, and Their Combination
clone(): Copy Data, Keep Grad History
detach(): Cut Grad, Share Memory
detach().clone(): Common Pattern
Debugging Tips
Layout Types: Row-major vs Column-major
Test Your Understanding
Summary