The Powerful Alternative To Fine-Tuning

· Source ↗

Summary based on the YouTube transcript and episode description.

Poetiq CEO Ian Fischer explains how a 7-person ex-DeepMind team beat Claude Opus 4.6 on Humanity’s Last Exam using recursive self-improvement harnesses instead of fine-tuning.

  • Poetiq’s system beat Anthropic’s Claude Opus 4.6 on Humanity’s Last Exam: 55% vs 53.1%, at optimization cost under $100k.
  • On ARC-AGI v2, Poetiq scored 54% vs Gemini 3 Deep Think’s 45%, at half the cost ($32/problem vs ~$70+).
  • Adding reasoning harnesses on top of prompts took one benchmark task from 5% to 95% performance with Gemini 1.5 Flash.
  • Fine-tuning is a trap for startups: costs millions, then next frontier model renders it obsolete; harnesses stay model-agnostic.
  • The Poetiq meta-system auto-generates reasoning strategies in code, not just better prompts — DSPY-style but recursively self-improving.
  • The generated prompts for ARC-AGI included a factually wrong example that improved performance — the system found non-human strategies.
  • Entire company is 7 people (research scientists and engineers); no harness retraining needed when new base models release.

2026-02-27 · Watch on YouTube