Articles

Breaking Down TurboQuant

A first-principles walkthrough of TurboQuant's KV-cache compression pipeline, from orthogonal rotation to QJL correction.

Decomposing Systolic Arrays

How systolic arrays accelerate matrix multiplications in modern ML accelerator ASICs like TPUs.

Building Softmax In Hardware

Building a custom hardware pipeline for the softmax non-linear function used in transformer attention.