SystemsFlashAttention
Grouped Query Attention
PremiumAdd GQA/MQA support so multiple query heads share KV, reducing KV cache memory.
Companion CodeLog in to continue reading
This is premium content. Please log in to access the full article.
Add GQA/MQA support so multiple query heads share KV, reducing KV cache memory.
Companion CodeThis is premium content. Please log in to access the full article.