Talking to Transformers

· ai coding · Source ↗

TLDR

  • Four-pillar prompting framework: precise intent, attention-aware railroading, cross-domain compression, and actually reading model outputs.

Key Takeaways

  • Treat attention like a budget: every irrelevant token competes with signal; shorter context improves attention targeting.
  • Use /nothink at prompt end to create a predictable attention sink that doesn’t pollute downstream tokens.
  • Non-reasoning models (IBM Granite 4.1) outperform large reasoning models on structured extraction tasks: lower latency, no cross-run variance.
  • Mirror model-specific RLHF language (e.g., Qwen’s “Now let me…”) to work with training grain instead of against it.
  • Qwen 3.6 and Gemma4:26bA4b now replace Claude Opus 4.6 as recommended models for coding and general use respectively.

Hacker News Comment Review

  • No substantive HN discussion yet.

Original | Discuss on HN