https://github.com/Luce-Org/lucebox-hub
Article
-
Claims 207 tok/s on RTX 3090 via C++/ggml speculative decoder with block-diffusion draft.
-
Uses DFlash draft model; 5.46x speedup over autoregressive baseline at peak.
-
Average 129.5 tok/s on 10-prompt benchmark at budget=22.
-
Pitches local AI as default: private data, no per-token cost, no vendor lock-in.
Discussion
-
Skepticism dominant: speculative decoding ≠ same quality as standard sampling.
-
Top comment (Aurornis): vibecoded Claude-generated repo, one of hundreds spawned by paper releases.
-
Critics note greedy-only decoding used; suggested sampling params exist for good reason.
-
Vulkan support requested — current implementation requires CUDA, limiting reach.
Discuss on HN