Pretraining

Overview

Pretraining is the first main track in cookllm-bento. Here we start from a small 29M-parameter model and walk through all the key stages needed for language model pretraining: preparing text data, training the tokenizer, defining BentoLM, building the data pipeline, entering the Lightning training loop, and writing out checkpoints, TensorBoard/SwanLab logs, and sampled text.

This chapter starts with the data and tokenizer needed before training, then gradually moves into model architecture, the data pipeline, the training loop, and monitoring and validation. By the end, you should know which files make up a pretraining task, what each config file controls, and where to start looking when training speed, loss, sample outputs, or checkpoints go wrong.

This chapter focuses on "how to run it" in cookllm-bento. If you want to deeply understand the principles behind Attention, RoPE, RMSNorm, activation functions, or optimizers, you can read the corresponding chapters in "Fundamentals" alongside this one.

Log in to continue reading

This is premium content. Please log in to access the full article.

Overview

Log in to continue reading

Table of Contents

Pretraining

Overview

Log in to continue reading

Table of Contents