LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Transformer LM
From token ids to logitsEmbedding and LM Head
Attention Mechanisms
From Self-Attention to GQAAttention Sink
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass
Distributed Training
Data ParallelismZeRO OptimizerFully Sharded Data Parallel张量并行流水线并行多维混合并行

Hands-on Training

Overview
Pretraining
Pretraining DataTokenizer TrainingModel ArchitectureData PipelineTraining LoopMonitoring and Validation
X (Twitter)
SystemsDistributed Training

多维混合并行

Premium

ParallelContext 坐标系统与 TP+DP+PP 的工业级组合

Companion Code
👨‍🍳

Content is cooking...

We're preparing high-quality content for you. Stay tuned!

流水线并行

GPipe 和 1F1B 调度策略的原理与气泡分析

Overview

Introduction to the cookllm-bento training framework

Table of Contents

为什么要混合
ParallelContext:多维坐标系统
通信组的构建
2D 混合示例:TP + DP
3D 混合:TP + DP + PP
配置建议
总结