LogoCookLLM Docs
LogoCookLLM Docs
HomeCookLLM

Principles

Tokenization
Tokenization BasicsBPE AlgorithmGPT TokenizersBPE Training Engineering
Model Architecture
Attention Mechanisms
Position Encoding
Position Encoding BasicsRoPE Math DerivationRoPE ImplementationLength Extrapolation
GPU Programming Basics
GPU Architecture BasicsTensor LayoutTriton Basics: Vector Add
FlashAttention
Flash Attention PrinciplesFrom Naive to Auto-TuningBlock Pointers and Multi-Dim SupportCausal Masking OptimizationGrouped Query AttentionBackward Pass

Hands-on Training

X (Twitter)

数据流水线

Premium

理解 Parquet shard 如何变成 input_ids、labels 和 attention_mask

Log in to continue reading

This is premium content. Please log in to access the full article.

Table of Contents

配置入口
数据目录
文件划分
流式读取
样本形态
Batch padding
数据吞吐测试
进入训练循环