std::simd lands in C++26 promising portable SIMD across AVX2/AVX-512/NEON/SVE, but benchmarks show 10x slower compile times and worse runtime than scalar auto-vectorization.
Key Takeaways
std::simd (P1928) compiles ~10x slower than equivalent scalar loops and loses to the auto-vectorizer it was designed to replace.
The library has no runtime dispatch story; Google Highway, used in Chromium, Firefox, libjxl, and libaom, handles this with HWY_DYNAMIC_DISPATCH macros.
Fixed-width vector model cannot express ARM SVE scalable vectors, a structural mismatch baked in from the Vc/2012 design era.
Cross-lane shuffles, width-specific arithmetic, and the operations dominating real SIMD workloads (codecs, HFT, crypto) are largely inexpressible in std::simd.
Template error messages leak internal types like _SimdWrapper<_Float16, 8, void>; a trivial misuse produces 138 lines of diagnostics from 6 lines of user code.
Hacker News Comment Review
Commenters with deep intrinsics experience argue no library abstraction, including std::simd, can capture microarchitecture-level variation; hand-written intrinsics remain the only path to optimal SIMD for serious workloads.
A key technical objection in the article, that templates are “opaque” to the optimizer, was challenged: templates are monomorphized and inlinable, making that framing misleading rather than a real codegen barrier.
Several commenters pushed back on the “nobody asked for it” framing, noting std::simd gives beginners a ramp toward intrinsics and provides feature parity with SIMD hints in other languages; EVE being built by a committee insider was read as critique of design, not of the goal.
Notable Comments
@mgaunard: First committee SIMD proposer in 2011, notes ISPC-like language-level semantics were seriously considered then and rejected for the same SVE-mapping arguments raised today.
@camel-cdr: Points out that for simple cases autovec already beats portable SIMD libraries because autovec has access to the full IR, not just the library’s exposed primitives.