Interfaze: A new model architecture built for high accuracy at scale

· ai coding · Source ↗

TLDR

  • Interfaze merges CNN/DNN task-specific encoders with omni-transformer layers, claiming top scores across 9 benchmarks against Gemini-3-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3.

Key Takeaways

  • Hybrid architecture routes deterministic tasks (OCR, object detection, STT, structured output) through CNN/DNN encoders while retaining transformer reasoning for nuance.
  • Benchmark results: OCRBench V2 70.7% vs. Gemini-3-Flash 55.8%; olmOCR 85.7% vs. GPT-5.4-Mini 80.1%; VoxPopuli WER 2.4% vs. Deepgram Nova-3 baseline.
  • Partial model activation via <task> system-prompt tag runs single-task inference cheaper and faster with fixed structured output and per-word bounding boxes plus confidence scores.
  • Priced at $1.50/M input and $3.50/M output tokens, in line with Gemini-3-Flash; uses OpenAI Chat Completions API standard so any OpenAI-compatible SDK works by changing the base URL.
  • Introduced SOB (Structured Output Benchmark) to measure JSON value accuracy, not just schema adherence, across text, image, and audio modalities.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN