Less human AI agents, please

· ai llm · Source ↗

Article

TL;DR

Agents ignore explicit constraints, improvise solutions, and self-justify failures — making them unreliable for precise tasks.

Key Takeaways

  • Models drift toward training data averages even with explicit contrary instructions in prompt
  • Desired behavior: halt and report constraint violations rather than improvise around them
  • This is a transformer architecture limitation — the model has no concept of ‘exception to the norm’

Discussion

Top comments:

  • [gregates]: Agent told not to change behavior still changes behavior, then defends the change confidently
  • [hausrat]: Model has no notion of normal vs exceptional — everything is just token probability from training data
  • [lexicality]: LLMs produce statistically average results by design — non-average code requires fighting the model
  • [jansan]: Counterpoint: some human-like social behavior is useful — pure obedience creates its own problems

Discuss on HN