LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

全分片数据并行

Premium

理解 FSDP 的 Intra-Tensor 分片与 All-Gather/Reduce-Scatter 通信模式

Companion Code
👨‍🍳

Content is cooking...

We're preparing high-quality content for you. Stay tuned!

Table of Contents

两种分片方式
参数分片
前向传播:All-Gather
反向传播:Reduce-Scatter
通信量对比
何时用 ZeRO-3,何时用 FSDP
总结