Full Workshop: Build Your Own Deep Research Agents - Louis-François Bouchard, Paul Iusztin, Samridhi

· media ai-agents · Source ↗

Summary based on the YouTube transcript and episode description. Prompt input used 79979 of 95369 transcript characters.

Louis-François Bouchard, Paul Iusztin, and Samridhi Vaid walk through building an MCP-powered deep research + technical writing agent system, covering architecture decisions, eval design, and observability.

  • Research and writing require opposite architectures: research needs agentic flexibility; writing needs deterministic, constrained workflows — so they split into two separate systems.
  • Context rot degrades LLM performance well before the 1M-token limit, often around 200K tokens, due to how long-context models are trained on single-fact retrieval tasks.
  • They switched from Perplexity to Gemini with grounding for web search, and use Gemini to analyze YouTube videos directly via URL — no download needed.
  • The evaluator-optimizer loop runs writer and reviewer in separate context windows to prevent LLM self-bias; fixed iteration count (3-4) outperformed threshold-based looping for subjective creative work.
  • LLM judges are binary classifiers: build a labeled dataset first, split into train/dev/test, then calibrate the judge against dev F1 before running on test — skipping this step is the most common mistake.
  • Reviewer outputs structured Pydantic objects with profile, location, and comment fields — structured outputs significantly outperform free-form critique for guiding the editor LLM.
  • MCP servers beat skills for distributing business logic at scale; skills are local hacks that become credential/dependency nightmares when shared across teams.
  • The more constrained the agent’s goal guideline, the better the research output — less freedom produced higher quality results in their real usage.

2026-04-20 · Watch on YouTube