Curated GitHub list of every major CUDA programming book, beginner through advanced, including C++, Python, and 2024-2026 releases.
Key Takeaways
Covers five tiers: beginner, architecture, hands-on, advanced/optimization, and Python/high-level (Numba, CuPy, pybind11).
Kirk & Hwu Programming Massively Parallel Processors (3rd ed., 2022) and Ansorge Programming in Parallel with CUDA (2022) are the strongest modern picks.
Python entry point is Tuomanen Hands-On GPU Programming with Python and CUDA (2018); Motta (2024) adds C++20 and pybind11.
2024-2026 additions include Spuler’s optimization and debugging titles, Oketunji’s CUDA 12.6 guide, and Crutcher’s C++26/CUDA 13 book.
Maintainers recommend always pairing any book with the free official CUDA C++ Programming Guide (v13.x, 2026).
Hacker News Comment Review
Commenters pushed back on the list’s recommendations: Kirk & Hwu Massively Parallel Processors drew criticism for small errors and confusing prose; Shane Cook’s Developer’s Guide was preferred as an intro by an experienced reader.
Consensus is that pre-2013 books are largely irrelevant for modern Nvidia hardware; reading FlashAttention and vLLM kernel source was cited as more concrete than any book for LLM-focused work.
Nvidia insiders and the ADSP podcast were noted as actively discouraging custom CUDA kernel writing unless it is your full-time role, with Blackwell sm120 support gaps offered as evidence of Nvidia’s uneven follow-through.
Notable Comments
@wces: Links a dense single-video CUDA overview by CUDA architect Stephen Jones as a faster alternative to any book on the list.
@dahart: Flags NVIDIA Warp as a Python-native CUDA kernel tool too new for books, worth checking before committing to Numba or CuPy.