Articles
Breaking Down TurboQuant
A first-principles walkthrough of TurboQuant's KV-cache compression pipeline, from orthogonal rotation to QJL correction.
Decomposing Systolic Arrays
How systolic arrays accelerate matrix multiplications in modern ML accelerator ASICs like TPUs.
Building Softmax In Hardware
Building a custom hardware pipeline for the softmax non-linear function used in transformer attention.