Quick answer
Constitutional AI is Anthropic's training method where instead of having humans rate every response, the AI is given a written "constitution" — a list of principles like "be helpful, be honest, do no harm". The AI then evaluates and revises its own responses against this constitution. It is why Claude has a distinct personality compared to ChatGPT.
When Anthropic started building Claude, they had a problem: RLHF (the standard training method) requires huge teams of human raters making millions of judgements. It is expensive, inconsistent, and the raters' biases get embedded in the model. Constitutional AI was their alternative — and it shaped what Claude became.
How does constitutional AI work?
- Engineers write a "constitution" — a list of natural-language principles
- The AI generates an initial response to a question
- The AI then critiques its own response against the constitution
- The AI revises its response to better match the principles
- This self-revision loop is used as training data
- Result: a model that has internalised the constitution
What is in Anthropic's constitution?
Principles drawn from sources including the UN Declaration of Human Rights, Apple's terms of service, and Anthropic's own research. Examples: prefer responses that are honest. Prefer responses that minimise harm. Prefer responses that are helpful without being sycophantic. Avoid responses that would damage someone's reputation unfairly.
A key Anthropic finding: Claude trained on constitutional AI is often more willing to engage with hard topics (where RLHF-trained models refuse) AND more reliable about avoiding actual harm. Counter-intuitive but well-documented.
How does this differ from RLHF?
RLHF: humans rate millions of responses; the model learns to please raters. Constitutional AI: a written set of principles guides the model; it learns to evaluate itself. RLHF is more expensive but easier to debug. Constitutional AI is cheaper, more consistent, and the principles are auditable.
Why does Claude "feel different" from ChatGPT?
Most users notice Claude is less hedgy, more willing to give opinions, slightly more cautious about real harms. This is largely the constitution at work. ChatGPT (trained with RLHF) reflects what OpenAI's thousands of raters preferred — which trends toward sycophancy and over-hedging. Claude (trained with constitutional AI) reflects Anthropic's written principles — which prioritise honesty.
Related reading
Bottom line
Constitutional AI replaces millions of human ratings with written principles. It is more transparent, more consistent, and one of the reasons Claude has a distinct conversational style. Most labs are now adopting variations of it.


