Browser-based tool (tinyppo-snake) lets you watch and configure a PPO-trained neural net learn Snake in real time with a 3D renderer.
Key Takeaways
Uses a live parameter snapshot system; UI shows step count, grid size, delay (ms), and trained policy roll-outs as training progresses.
Supports switching between presets and toggling between training and watch modes mid-session.
3D renderer initializes in-browser; tensor loading is a prerequisite before any policy visualization begins.
Grid delay is configurable (default 40ms), suggesting the tool is tuned for interactive observation rather than maximum training speed.
Hacker News Comment Review
One confirmed bug: switching from training to watch mode and back causes a temporary significant score drop, suggesting state is not cleanly preserved across mode transitions.