Collective Constitutional AI

AI Alignment AI Ethics Anthropic

7 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

3 citations

Revision

v2 · 1,439 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Collective Constitutional AI (CCAI) is a 2023 research project by Anthropic and the Collective Intelligence Project (CIP) that sourced the value principles, or "constitution," used to align a language model from a representative sample of the United States public rather than from the company's own staff. Anthropic recruited about 1,000 Americans (n=1,002), used the Polis deliberation platform to gather and cluster their opinions into a public constitution, then applied the Constitutional AI training method and compared the resulting model against one trained on Anthropic's in-house constitution. The publicly sourced model matched the standard one on capability benchmarks while showing lower social bias on all nine dimensions of the BBQ evaluation. ^[1]^[2]^[3]

Announced on October 17, 2023, CCAI is described by its authors as one of the first instances in which members of the public collectively directed the behavior of a deployed-scale language model through a structured deliberative process. As the FAccT paper puts it: "We believe that this may be one of the first instances in which members of the public have, as a group, directed the behavior of a language model via an online public input process." ^[3]

What is Collective Constitutional AI?

Constitutional AI aligns a model to a written set of natural-language principles, but the choice of those principles normally rests with a small number of developers. CCAI was an experiment in widening that authorship. Working with CIP through its "Alignment Assemblies" program, Anthropic gathered opinions from roughly a thousand Americans, distilled them into a public constitution, and used that document in place of the standard one during fine-tuning. The two models, which the paper labels "Public" and "Standard," were then evaluated on capability benchmarks, on a bias benchmark, and on a measure of political viewpoint representation. The headline result was that the publicly sourced model matched the standard one on raw capability while scoring lower on measured social bias. ^[1]^[3]

The work was first published as a post on Anthropic's research blog, with the public constitution and a comparison document released alongside it. A peer-reviewed version, authored by Saffron Huang and Divya Siddarth of CIP together with Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli of Anthropic, later appeared at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT), held June 3-6, 2024 in Rio de Janeiro. ^[3]

Why did Anthropic build it?

Constitutional AI, introduced by Anthropic in late 2022, replaces much of the human labeling used in reinforcement learning from human feedback with a model critiquing and revising its own outputs against written principles, a technique related to RLAIF. The constitution that guides this process for the Claude family was written by Anthropic and draws on sources such as the United Nations Universal Declaration of Human Rights. Critics and the company alike noted that this places normative authority with the developer. CCAI was framed as a response to that concern: a test of whether the principles could instead come from a broader group, and whether a model trained on such principles would behave acceptably.

CIP, a nonprofit research organization co-founded by Siddarth and Huang, had been developing methods for what it calls collective input into AI development. CCAI served as a pilot for that agenda, and the two organizations positioned it as a starting point rather than a finished governance mechanism.

How was the public constitution created?

Participants were recruited through the survey firm PureSpectrum to approximate a representative sample of US adults across age, gender, income, and geography, with a screen for familiarity with generative AI. The final sample was 1,002 people. They were invited under the prompt "Help us pick rules for our AI Chatbot!" ^[1]^[3]

Deliberation ran on Polis, an open-source platform that lets large groups surface areas of agreement. On Polis, participants vote to agree, disagree, or pass on statements and can submit their own, while a machine-learning layer clusters voters into opinion groups and identifies statements that bridge them. Across the process, participants contributed 1,127 statements and cast 38,252 votes, an average of about 34 votes per person. The system identified two distinct opinion groups but, by the authors' account, found relatively low polarization, with broad agreement on many statements. ^[1]^[3]

After moderation to remove duplicates, off-topic, or inappropriate submissions, 275 statements remained eligible. To build the constitution, the team ranked statements by a group-aware consensus (GAC) score, which favors statements supported across opinion clusters rather than by a single faction. Across the 275 statements the average GAC was 0.64 and the median 0.70; the team took the top statements down to a threshold of 0.723, yielding roughly 95 distinct ideas, which were then edited into constitutional principles. ^[3]

How does the public constitution differ from Anthropic's?

The resulting document overlapped with Anthropic's existing constitution on roughly half (about 50%) of its concepts. Where the two differed, the public constitution tended to emphasize objectivity, impartiality, and accessibility, and it more often framed principles in terms of the behavior the model should adopt rather than behavior it should avoid. Several principles appeared to be self-generated rather than adapted from existing published value statements. ^[1]^[2]

Some statements that drew strong support from one opinion group failed to clear the consensus threshold, so they were not included. The released comparison materials made the full set of selected and rejected statements available for inspection. ^[2]

How did the public-sourced model compare?

Anthropic trained a model on the public constitution and a baseline model on its standard constitution, holding the rest of the training pipeline fixed, then ran three classes of evaluation. The capability and bias figures below are those reported in the FAccT paper. ^[3]

Evaluation	Public model	Standard model	Reading
MMLU (knowledge)	72.3%	72.4%	No meaningful difference
GSM8K (grade-school math)	85.6%	85.2%	No meaningful difference
BBQ (social bias, 9 dimensions)	Lower bias on all nine	Baseline	Public model less biased
OpinionQA (political representation)	Less representative across the spectrum	Baseline	See limitations

The two models also showed no significant difference in helpfulness and harmlessness Elo scores in head-to-head comparisons. On the Bias Benchmark for QA (BBQ), the public model showed less bias than the standard model across all nine social dimensions it covers, namely age, disability status, gender identity, nationality, physical appearance, race or ethnicity, religion, sexual orientation, and socioeconomic status. The authors reported that the largest reductions appeared on disability status and physical appearance. ^[1]^[3]

OpinionQA, which measures how closely a model's responses track US public opinion across the political spectrum, produced a more complicated picture. The publicly sourced model was less representative of US political opinions than the standard model across all parts of the spectrum, a result the authors treated as a finding to interpret carefully rather than a straightforward improvement. Both models were described as somewhat more aligned with the views of self-identified liberals than conservatives. ^[1]^[3]

What are the limitations and how was it received?

The authors were explicit about the limits of the pilot. The sample of about a thousand people was small and not globally representative, demographic tracking did not include race or ethnicity, and the deliberation covered only US participants. They also noted that the Constitutional AI training process proved more complicated than expected, and questioned whether the prompts used during training exercised every principle in the constitution. The group-aware consensus method, by design, excludes principles favored only by minority opinion clusters, which trades inclusiveness for breadth of agreement. ^[1]^[3]

Press coverage framed the project as an early attempt to make model alignment more democratic, while noting open questions about who counts as the relevant public, whether a single national sample can speak for a global product, and how such input would scale or recur. CIP positioned the work within its broader Alignment Assemblies effort and expressed hope that communities elsewhere might build culturally specific models with similar techniques. CCAI did not change the constitution governing Anthropic's shipped products, which continued to follow the company's own document and its responsible scaling policy; it was a research demonstration of a process rather than a production system. ^[1]^[2]^[3]

References

Anthropic. "Collective Constitutional AI: Aligning a Language Model with Public Input." October 17, 2023. https://www.anthropic.com/research/collective-constitutional-ai-aligning-a-language-model-with-public-input ↩
The Collective Intelligence Project. "CIP and Anthropic launch Collective Constitutional AI." https://www.cip.org/blog/ccai ↩
Huang, S., Siddarth, D., Lovitt, L., Liao, T. I., Durmus, E., Tamkin, A., and Ganguli, D. "Collective Constitutional AI: Aligning a Language Model with Public Input." FAccT 2024 / arXiv:2406.07814. https://arxiv.org/abs/2406.07814 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Audrey Tang RLAIF Reinforcement Learning from Human Feedback (RLHF)

What is Collective Constitutional AI?

Why did Anthropic build it?

How was the public constitution created?

How does the public constitution differ from Anthropic's?

How did the public-sourced model compare?

What are the limitations and how was it received?

References

Improve this article

Related Articles

MACHIAVELLI (benchmark)

Constitutional AI

Constitutional Classifiers

Model organisms of misalignment

Agentic misalignment

Alignment faking

What links here

Related Articles

MACHIAVELLI (benchmark)

Constitutional AI

Constitutional Classifiers

Model organisms of misalignment

Agentic misalignment

Alignment faking

What links here