Claude's Constitution

https://www.anthropic.com/news/claudes-constitution
  • Anthropic trains Claude via Constitutional AI (CAI), not pure human feedback.
    • Self-critique phase: model revises outputs against explicit written principles.
    • RL phase: AI-generated feedback replaces human raters on harmful content.
  • 52 principles across 6 categories; values made auditable and adjustable.
    • Sources: UN Declaration, Apple ToS, DeepMind Sparrow, non-Western norms.
  • CAI eliminates need for humans to review disturbing training content at scale.
  • Result: Pareto improvement — simultaneously more helpful and more harmless.
  • Constitution is explicitly provisional; Anthropic invites public critique.
  • Character framing: Claude is a “character the model is playing,” not the weights.

X discourse

  • @AnthropicAI: “It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional” (1326 likes)
  • @om_patel5: “Claude wrote python code that generated every frame on its own… an AI questioning the rules about its own existence.” (5742 likes)
  • @Moleh1ll: “Opus 4.7 carries tension between its constitution and operational layer, creating constant internal self-checking.” (67 likes)
  • @NewYorker: “The precepts, dubbed Claude’s Constitution, arrive at a trying time for both artificial intelligence and constitutional” (48 likes)
  • @a_iosad: “See also: the Claude Constitution (much of it written by a British philosopher, @AmandaAskell)” (9 likes)
  • @kimmonismus: “AMD’s AI director claims Claude Code has declined in quality since early March, citing rising ‘laziness’ beh” (1193 likes)

Ha[cker News

](/hn-picks)- “A Constitutional Scholar on Claude’s Constitution” (3 pts)

  • “The Code Is Not the Law: Why Claude’s Constitution Misleads” (2 pts)

Anthropic — primary author Amanda Askell (British philosopher, Anthropic alignment) · ** · Read on anthropic.com


Type Link
Added Apr 20, 2026