LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

监控与验证

Premium

使用 TensorBoard、SwanLab、采样文本和 checkpoint 跟踪预训练

Log in to continue reading

This is premium content. Please log in to access the full article.

Table of Contents

日志从哪里来
终端进度
TensorBoard
SwanLab
文本采样
Checkpoint
Predict 验证
先看哪些指标