SystemsFlashAttention
Causal Masking Optimization
PremiumImplement causal attention for autoregressive models and skip upper-triangular compute for ~2x speedup.
Companion CodeLog in to continue reading
This is premium content. Please log in to access the full article.
CookLLM Docs