TLDR
-
Hands-on workshop to build a ~10M param GPT model end-to-end in PyTorch on a laptop in under an hour, no pretrained models used.
Key Takeaways
-
Covers all four pipeline components: character-level tokenizer, transformer architecture, training loop, and text generation sampling.
-
Three model configs (Tiny ~0.5M, Small ~4M, Medium ~10M) train in 5-45 min on Apple Silicon MPS, CUDA, or CPU.
-
Character-level tokenization (vocab_size=65) is used deliberately; BPE requires 100MB+ datasets for token bigrams to be learnable.
-
Built as a single-session workshop; works locally via uv or on Google Colab with no code changes beyond upload.
-
Inspired by and explicitly stripped down from Karpathy’s nanoGPT, targeting beginners with Python comfort but no ML experience required.
Hacker News Comment Review
-
No substantive HN discussion yet.
Original | Discuss on HN