C++26 Shipped a SIMD Library Nobody Asked For

· coding · Source ↗

TLDR

  • std::simd lands in C++26 promising portable SIMD across AVX2/AVX-512/NEON/SVE, but benchmarks show 10x slower compile times and worse runtime than scalar auto-vectorization.

Key Takeaways

  • std::simd (P1928) compiles ~10x slower than equivalent scalar loops and loses to the auto-vectorizer it was designed to replace.
  • The library has no runtime dispatch story; Google Highway, used in Chromium, Firefox, libjxl, and libaom, handles this with HWY_DYNAMIC_DISPATCH macros.
  • Fixed-width vector model cannot express ARM SVE scalable vectors, a structural mismatch baked in from the Vc/2012 design era.
  • Cross-lane shuffles, width-specific arithmetic, and the operations dominating real SIMD workloads (codecs, HFT, crypto) are largely inexpressible in std::simd.
  • Template error messages leak internal types like _SimdWrapper<_Float16, 8, void>; a trivial misuse produces 138 lines of diagnostics from 6 lines of user code.

Hacker News Comment Review

  • Commenters with deep intrinsics experience argue no library abstraction, including std::simd, can capture microarchitecture-level variation; hand-written intrinsics remain the only path to optimal SIMD for serious workloads.
  • A key technical objection in the article, that templates are “opaque” to the optimizer, was challenged: templates are monomorphized and inlinable, making that framing misleading rather than a real codegen barrier.
  • Several commenters pushed back on the “nobody asked for it” framing, noting std::simd gives beginners a ramp toward intrinsics and provides feature parity with SIMD hints in other languages; EVE being built by a committee insider was read as critique of design, not of the goal.

Notable Comments

  • @mgaunard: First committee SIMD proposer in 2011, notes ISPC-like language-level semantics were seriously considered then and rejected for the same SVE-mapping arguments raised today.
  • @camel-cdr: Points out that for simple cases autovec already beats portable SIMD libraries because autovec has access to the full IR, not just the library’s exposed primitives.

Original | Discuss on HN