Notes from Optimizing CPU-Bound Go Hot Paths

· coding · Source ↗

TLDR

  • Porting Brotli to pure Go revealed that idiomatic abstractions (generics, interfaces, closures) block inlining in hot loops, forcing manual function duplication for peak throughput.

Key Takeaways

  • Go’s GC Shape Stenciling means generic method calls dispatch through interface-style indirection at runtime, preventing the monomorphization C++ and Rust developers expect.
  • Benchmarks on a 12th Gen i5 show concrete functions hit 378 MiB/s; generics and closures drop ~15%; interface dispatch drops ~27%.
  • The Brotli port required 16 near-identical concrete functions differing only in hash function variant because collapsing them via abstractions killed performance.
  • Code generation can mitigate duplication at scale, but 2-3 variant cases rarely justify a codegen pipeline, leaving manual copy-paste as the pragmatic answer.
  • Assembly inspection confirms the concrete path inlines the hash multiply-shift directly in the loop; the generic path emits an explicit CALL instruction each iteration.

Hacker News Comment Review

  • No substantive HN discussion yet; the single comment pivots to an unrelated question about simulating Reynolds boids on a single CPU without goroutines.

Original | Discuss on HN