Claude's Constitution

Anthropic trains Claude via Constitutional AI (CAI), not pure human feedback.
- Self-critique phase: model revises outputs against explicit written principles.
- RL phase: AI-generated feedback replaces human raters on harmful content.
52 principles across 6 categories; values made auditable and adjustable.
- Sources: UN Declaration, Apple ToS, DeepMind Sparrow, non-Western norms.
CAI eliminates need for humans to review disturbing training content at scale.
Result: Pareto improvement — simultaneously more helpful and more harmless.
Constitution is explicitly provisional; Anthropic invites public critique.
Character framing: Claude is a “character the model is playing,” not the weights.

X discourse

@AnthropicAI: “It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional” (1326 likes)
@om_patel5: “Claude wrote python code that generated every frame on its own… an AI questioning the rules about its own existence.” (5742 likes)
@Moleh1ll: “Opus 4.7 carries tension between its constitution and operational layer, creating constant internal self-checking.” (67 likes)
@NewYorker: “The precepts, dubbed Claude’s Constitution, arrive at a trying time for both artificial intelligence and constitutional” (48 likes)
@a_iosad: “See also: the Claude Constitution (much of it written by a British philosopher, @AmandaAskell)” (9 likes)
@kimmonismus: “AMD’s AI director claims Claude Code has declined in quality since early March, citing rising ‘laziness’ beh” (1193 likes)

](/hn-picks)- “A Constitutional Scholar on Claude’s Constitution” (3 pts)

Anthropic — primary author Amanda Askell (British philosopher, Anthropic alignment) · ** · Read on anthropic.com

Type	Link
Added	Apr 20, 2026