Anatomy of High-Performance Matrix Multiplication (2008)

https://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf

Article

  • Classic 2008 paper by Kazushige Goto on BLAS-level matrix multiply optimization
  • Explains cache-blocking strategy and why it maps to memory hierarchy
  • Foundation for understanding how modern BLAS/LAPACK achieves near-peak FLOPS

Discuss on HN


Type Link
Added Apr 21, 2026
Modified Apr 21, 2026