LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

训练循环

Premium

拆开 LightningCLI、PretrainModule、优化器和调度器

Log in to continue reading

This is premium content. Please log in to access the full article.

Table of Contents

训练入口
LightningCLI 是什么
Lightning 接管了什么
启动脚本
配置合并
Training step
Validation step
优化器
WSD 学习率调度
Trainer 配置
常用覆盖参数