Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters

Name: Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters
Uploaded: 2024-04-18T12:00:00.000000Z
Description: Mark Zuckerberg explains why Meta will open-source even a $10B model and why concentrated AI is more dangerous than widespread AI. Llama 3 405B is still training; at current checkpoint it scores ~85 M…

Apr 18, 2024 · ai · Source ↗

Watch on YouTube ↗ Summary based on the YouTube transcript and episode description.

Mark Zuckerberg explains why Meta will open-source even a $10B model and why concentrated AI is more dangerous than widespread AI.

Llama 3 405B is still training; at current checkpoint it scores ~85 MMLU, expected to lead benchmarks on release.
The 8B Llama 3 model is nearly as capable as the largest Llama 2 model released.
Meta trained Llama 3 70B on ~15 trillion tokens and the model was still improving at the end — no clear saturation.
No single gigawatt data center has been built yet; Zuckerberg sees energy permitting, not capital, as the next hard bottleneck for scaling.
Zuckerberg argues a single actor holding vastly superior AI is a bigger risk than open-source proliferation — explicitly includes adversarial governments and untrustworthy companies.
Meta’s custom silicon currently handles inference for ranking and recommendations (Reels, Feed, Ads), freeing Nvidia GPUs solely for training.
Meta has revenue-share deals with major cloud providers (Azure, AWS, etc.) to host Llama 2; expects this to grow with larger models.
Zuckerberg draws a parallel between Augustus redefining peace as positive-sum and open source as a model most investors still cannot conceptualize as genuinely valuable.

2024-04-18 · Watch on YouTube

Related coverage

OpenAI CEO's Identity Verification Company Announced Fake Bruno Mars Partnership Due To Mistaken Identity

GitHub Copilot code review will start consuming GitHub Actions minutes

BCI startup Neurable looks to license its 'mind-reading' tech for consumer wearables

Tencent Employees Used Claude Code to Help Evaluate and Fine-Tune Its New Hy3 AI Model