Making Julia as Fast as C++ (2019)

· coding · Source ↗

TLDR

  • A 2019 FLOW Lab post walks through optimizing a Julia vortex particle N-body kernel from 58x slower than C++ to near parity via concrete types, avoiding allocations, and manual loop unrolling.

Key Takeaways

  • Untyped Julia structs (ParticleAmbiguous) produce ::Any in the AST and run 58x slower than -O3 C++; parametric concrete types alone yield a 3x speedup.
  • After fixing types, the bottleneck shifts to allocations: list comprehensions and cross()/norm() calls generate millions of heap objects per benchmark run.
  • Using @code_warntype is the primary diagnostic tool; red ::Any annotations signal JIT-hostile code paths.
  • The benchmark kernel is a real aerodynamics O(N²) particle-to-particle (P2P) interaction on 216 particles, making results representative of HPC inner loops.
  • C++ baseline compiled with -O3 on an i7-7820HQ runs the P2P kernel in ~4 ms minimum; the optimization journey targets that ceiling.

Hacker News Comment Review

  • No substantive HN discussion yet; one commenter flagged the 2019 publication date, and another linked a Julia Discourse thread for follow-up context.

Original | Discuss on HN