Interfaze: A new model architecture built for high accuracy at scale

May 11, 2026 · ai coding · Source ↗

TLDR

Interfaze merges CNN/DNN task-specific encoders with omni-transformer layers, claiming top scores across 9 benchmarks against Gemini-3-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3.

Hybrid architecture routes deterministic tasks (OCR, object detection, STT, structured output) through CNN/DNN encoders while retaining transformer reasoning for nuance.
Benchmark results: OCRBench V2 70.7% vs. Gemini-3-Flash 55.8%; olmOCR 85.7% vs. GPT-5.4-Mini 80.1%; VoxPopuli WER 2.4% vs. Deepgram Nova-3 baseline.
Partial model activation via <task> system-prompt tag runs single-task inference cheaper and faster with fixed structured output and per-word bounding boxes plus confidence scores.
Priced at $1.50/M input and $3.50/M output tokens, in line with Gemini-3-Flash; uses OpenAI Chat Completions API standard so any OpenAI-compatible SDK works by changing the base URL.
Introduced SOB (Structured Output Benchmark) to measure JSON value accuracy, not just schema adherence, across text, image, and audio modalities.