Which one is more important: more parameters or more computation? (2021)

· Source ↗

TLDR

  • Post examines the scaling trade-off between model parameter count and training compute budget in large language models.

Key Takeaways

  • The core tension: given a fixed compute budget, you can train a larger model for fewer steps or a smaller model longer.
  • Kaplan et al. (2020) scaling laws suggested parameter count dominated; later work (Chinchilla, 2022) challenged this by showing models were undertrained.
  • Optimal allocation depends on whether you optimize for training cost or inference cost – a smaller, well-trained model wins on inference at scale.
  • The question is practically relevant for teams choosing between buying more GPUs for longer runs versus upgrading to larger architectures.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN