Show HN: Utilyze – an open source GPU monitoring tool more accurate than nvtop

· ai · Source ↗

TLDR

  • Systalyze open-sources Utilyze, a GPU monitor that reads hardware performance counters directly to report true compute and memory bandwidth utilization, accurate within 2% of ground truth.

Key Takeaways

  • nvtop pins at 100% regardless of matrix size; Utilyze shows 2.6% at N=256, 32% at N=1024, 88% at N=4096, matching theoretical calculations within 2%.
  • DCGM’s SM Active metric also misleads: a memory-bound LLM decode workload shows 99% SM Active and 100% nvtop, while ground truth and Utilyze both report 6%.
  • Utilyze uses NVIDIA’s Nsight Perf SDK to cycle through hardware counters across rolling time windows, so overhead is negligible and measurement runs continuously in production.
  • The tool reports two headline numbers: Compute SOL % (achieved FLOPs / peak FLOPs) and Memory SOL % (achieved bandwidth / peak bandwidth), derived from the roofline model.
  • Attainable SOL % marks the realistic ceiling below 100% for a given model, hardware, and parallelism config; the gap between current SOL % and Attainable SOL % is the actual optimization budget.

Hacker News Comment Review

  • Early feedback flags that v0.1.3 covers compute visibility well but lacks the process list, memory usage, temperature, and fan speed that operators rely on in nvidia-smi, limiting day-to-day replacement potential.

Notable Comments

  • @xtimecrystal: requests adding memory usage, processes, temperature, and fan speed before Utilyze can fully replace nvidia-smi in daily workflows.

Original | Discuss on HN