TLDR
-
Experimental Rust-to-CUDA compiler compiling idiomatic Rust directly to PTX via a custom rustc codegen backend, no DSLs required.
Key Takeaways
-
#[cuda_module] and #[kernel] macros embed device artifacts into the host binary and generate typed launch methods per kernel.
-
Uses
DisjointSlice for safe mutable GPU slice access; bounds-checked via thread::index_1d() pattern.
-
Supports async GPU execution: compose work as lazy
DeviceOperation graphs, schedule across stream pools, await with .await.
-
v0.1.0 is early alpha: expect API breakage, bugs, and incomplete features; feedback loop is explicit project goal.
-
Lower-level
load_kernel_module and cuda_launch! APIs available for custom sidecar artifact loading.
Hacker News Comment Review
-
No substantive HN discussion yet.
Original | Discuss on HN