Train Your Own LLM from Scratch

· ai hardware coding · Source ↗

TLDR

  • Hands-on workshop to build a ~10M param GPT model end-to-end in PyTorch on a laptop in under an hour, no pretrained models used.

Key Takeaways

  • Covers all four pipeline components: character-level tokenizer, transformer architecture, training loop, and text generation sampling.
  • Three model configs (Tiny ~0.5M, Small ~4M, Medium ~10M) train in 5-45 min on Apple Silicon MPS, CUDA, or CPU.
  • Character-level tokenization (vocab_size=65) is used deliberately; BPE requires 100MB+ datasets for token bigrams to be learnable.
  • Built as a single-session workshop; works locally via uv or on Google Colab with no code changes beyond upload.
  • Inspired by and explicitly stripped down from Karpathy’s nanoGPT, targeting beginners with Python comfort but no ML experience required.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN