AIE Miami Day 2 ft. Cerebras, OpenCode, Cursor, Arize AI, and more!

· video · Source ↗

Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 250000 transcript characters.

AIE Miami Day 2 features Laurie Voss (Arize AI) presenting eval data showing GitHub MCP costs 6x more than CLI on complex tasks, plus Cerebras announcing Codex Spark at 1,200 tokens/second.

  • Cerebras and OpenAI released Codex Spark at 1,200 tokens/second — 20x faster than any current coding model (50–150 tok/s has been the ceiling for 2 years).
  • GitHub MCP used 12 tool calls per task vs. 5 for CLI on complex tier-4 tasks, with 6x higher cost and 5x higher latency in Laurie Voss’s 500-test eval.
  • In one eval task, the MCP arm used 71 tool calls; only 3 were actual MCP calls — the agent kept falling back to bash and jq to parse verbose JSON responses.
  • Input token lengths grew 4x and output tokens 3x in the past year alone, silently inflating total latency even though per-token speed was flat (OpenRouter/a16z study).
  • Disaggregated inference — splitting compute-bound prefill from memory-bound decode onto different hardware — is the architectural shift behind the speed gains; Nvidia paid $20B for Groq on this thesis.
  • Laurie Voss’s key finding: for popular CLIs (e.g., GH), the model’s training data does most of the work — a short opinionated skill file beats a 2,187-line encyclopedic one on cost.
  • MCP wins for consumer-facing agents because it uses OAuth, enabling non-developer users and enterprise access control; CLI is practical only for developer-local workflows.
  • G2i case studies: a junior engineer fresh out of college became a peer contributor in 3 months by having senior engineers review agent-generated spec docs before any code was written.

2026-04-21 · Watch on YouTube