Architecture (Model Architecture)

Overview

The Architecture module dives into core components of LLM architecture, from foundational Attention mechanisms to advanced memory-augmented modules.

This module assumes you already know the basics of deep learning. We recommend starting with Attention before exploring advanced architectures.

Deeply understand Attention in Transformers, including MHA, Causal Attention, GQA, and MQA

From sinusoidal PE to rotary position embeddings: math, implementation, and length extrapolation

Stage	Content	Goal
Core	Attention mechanisms	Master Self-Attention, Multi-Head, Causal Masking
Core	Position encoding and RoPE	Understand the evolution of position encoding; master RoPE principles and implementation
Optimization	GQA/MQA	Understand KV cache optimization and memory efficiency
Advanced	Length extrapolation	Master long-sequence methods like NTK-aware and YaRN