LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

流水线并行

Premium

GPipe 和 1F1B 调度策略的原理与气泡分析

Companion Code
👨‍🍳

Content is cooking...

We're preparing high-quality content for you. Stay tuned!

Table of Contents

层级切分
朴素流水线:气泡问题
GPipe:微批次并行
1F1B:交错前向反向
PP 的通信特点
GPipe vs 1F1B
总结