Systalyze open-sources Utilyze, a GPU monitor that reads hardware performance counters directly to report true compute and memory bandwidth utilization, accurate within 2% of ground truth.
Key Takeaways
nvtop pins at 100% regardless of matrix size; Utilyze shows 2.6% at N=256, 32% at N=1024, 88% at N=4096, matching theoretical calculations within 2%.
DCGM’s SM Active metric also misleads: a memory-bound LLM decode workload shows 99% SM Active and 100% nvtop, while ground truth and Utilyze both report 6%.
Utilyze uses NVIDIA’s Nsight Perf SDK to cycle through hardware counters across rolling time windows, so overhead is negligible and measurement runs continuously in production.
The tool reports two headline numbers: Compute SOL % (achieved FLOPs / peak FLOPs) and Memory SOL % (achieved bandwidth / peak bandwidth), derived from the roofline model.
Attainable SOL % marks the realistic ceiling below 100% for a given model, hardware, and parallelism config; the gap between current SOL % and Attainable SOL % is the actual optimization budget.
Hacker News Comment Review
Early feedback flags that v0.1.3 covers compute visibility well but lacks the process list, memory usage, temperature, and fan speed that operators rely on in nvidia-smi, limiting day-to-day replacement potential.
Notable Comments
@xtimecrystal: requests adding memory usage, processes, temperature, and fan speed before Utilyze can fully replace nvidia-smi in daily workflows.