Train Your Own LLM from Scratch

May 5, 2026 · ai hardware coding · Source ↗

TLDR

Hands-on workshop to build a ~10M param GPT model end-to-end in PyTorch on a laptop in under an hour, no pretrained models used.

Covers all four pipeline components: character-level tokenizer, transformer architecture, training loop, and text generation sampling.
Three model configs (Tiny ~0.5M, Small ~4M, Medium ~10M) train in 5-45 min on Apple Silicon MPS, CUDA, or CPU.
Character-level tokenization (vocab_size=65) is used deliberately; BPE requires 100MB+ datasets for token bigrams to be learnable.
Built as a single-session workshop; works locally via uv or on Google Colab with no code changes beyond upload.
Inspired by and explicitly stripped down from Karpathy’s nanoGPT, targeting beginners with Python comfort but no ML experience required.