Collective Constitutional AI
Last reviewed
Jun 3, 2026
Sources
3 citations
Review status
Source-backed
Revision
v1 · 1,296 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
3 citations
Review status
Source-backed
Revision
v1 · 1,296 words
Add missing citations, update stale details, or suggest a clearer explanation.
Collective Constitutional AI (CCAI) is a 2023 research project by Anthropic and the Collective Intelligence Project (CIP) that sourced the value principles, or "constitution," used to align a language model from a representative sample of the United States public rather than from the company's own staff. Announced on October 17, 2023, the project applied the Constitutional AI training method to a constitution drafted through online public deliberation, then compared the resulting model against one trained on Anthropic's in-house constitution. It is described by its authors as, to their knowledge, the first instance in which members of the public collectively directed the behavior of a deployed-scale language model through a structured deliberative process. [1][2]
Constitutional AI aligns a model to a written set of natural-language principles, but the choice of those principles normally rests with a small number of developers. CCAI was an experiment in widening that authorship. Working with CIP through its "Alignment Assemblies" program, Anthropic gathered opinions from roughly a thousand Americans, distilled them into a public constitution, and used that document in place of the standard one during fine-tuning. The two models, which the paper labels "Public" and "Standard," were then evaluated on capability benchmarks, on a bias benchmark, and on a measure of political viewpoint representation. The headline result was that the publicly sourced model matched the standard one on raw capability while scoring lower on measured social bias. [1][3]
The work was first published as a post on Anthropic's research blog, with the public constitution and a comparison document released alongside it. A peer-reviewed version, authored by Saffron Huang and Divya Siddarth of CIP together with Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli of Anthropic, later appeared at the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT). [3]
Constitutional AI, introduced by Anthropic in late 2022, replaces much of the human labeling used in reinforcement learning from human feedback with a model critiquing and revising its own outputs against written principles, a technique related to RLAIF. The constitution that guides this process for the Claude family was written by Anthropic and draws on sources such as the United Nations Universal Declaration of Human Rights. Critics and the company alike noted that this places normative authority with the developer. CCAI was framed as a response to that concern: a test of whether the principles could instead come from a broader group, and whether a model trained on such principles would behave acceptably.
CIP, a nonprofit research organization co-founded by Siddarth and Huang, had been developing methods for what it calls collective input into AI development. CCAI served as a pilot for that agenda, and the two organizations positioned it as a starting point rather than a finished governance mechanism.
Participants were recruited through the survey firm PureSpectrum to approximate a representative sample of US adults across age, gender, income, and geography, with a screen for familiarity with generative AI. The final sample was 1,002 people. They were invited under the prompt "Help us pick rules for our AI Chatbot!" [1][3]
Deliberation ran on Polis, an open-source platform that lets large groups surface areas of agreement. On Polis, participants vote to agree, disagree, or pass on statements and can submit their own, while a machine-learning layer clusters voters into opinion groups and identifies statements that bridge them. Across the process, participants contributed 1,127 statements and cast 38,252 votes, an average of about 34 votes per person. The system identified two distinct opinion groups but, by the authors' account, found relatively low polarization, with broad agreement on many statements. [1][3]
After moderation to remove duplicates, off-topic, or inappropriate submissions, 275 statements remained eligible. To build the constitution, the team ranked statements by a group-aware consensus score, which favors statements supported across opinion clusters rather than by a single faction, and took the top statements down to a threshold of 0.723, yielding roughly 95 distinct ideas. These were edited into constitutional principles. [3]
The resulting document overlapped with Anthropic's existing constitution on roughly half of its concepts. Where the two differed, the public constitution tended to emphasize objectivity, impartiality, and accessibility, and it more often framed principles in terms of the behavior the model should adopt rather than behavior it should avoid. Several principles appeared to be self-generated rather than adapted from existing published value statements. [1][2]
Some statements that drew strong support from one opinion group failed to clear the consensus threshold, so they were not included. The released comparison materials made the full set of selected and rejected statements available for inspection. [2]
Anthropic trained a model on the public constitution and a baseline model on its standard constitution, holding the rest of the training pipeline fixed, then ran three classes of evaluation. The capability and bias figures below are those reported in the FAccT paper. [3]
| Evaluation | Public model | Standard model | Reading |
|---|---|---|---|
| MMLU (knowledge) | 72.3% | 72.4% | No meaningful difference |
| GSM8K (grade-school math) | 85.6% | 85.2% | No meaningful difference |
| BBQ (social bias, 9 dimensions) | Lower bias on all nine | Baseline | Public model less biased |
| OpinionQA (political representation) | Less representative across the spectrum | Baseline | See limitations |
On the Bias Benchmark for QA (BBQ), the public model showed less bias than the standard model across all nine social dimensions it covers, namely age, disability status, gender identity, nationality, physical appearance, race or ethnicity, religion, sexual orientation, and socioeconomic status. The authors reported that the largest reductions appeared on disability status and physical appearance. [1][3]
OpinionQA, which measures how closely a model's responses track US public opinion across the political spectrum, produced a more complicated picture. The publicly sourced model was less representative of US political opinions than the standard model across all parts of the spectrum, a result the authors treated as a finding to interpret carefully rather than a straightforward improvement. Both models, in earlier reporting, were described as somewhat more aligned with the views of self-identified liberals than conservatives. [1][3]
The authors were explicit about the limits of the pilot. The sample of about a thousand people was small and not globally representative, demographic tracking did not include race or ethnicity, and the deliberation covered only US participants. They also noted that the Constitutional AI training process proved more complicated than expected, and questioned whether the prompts used during training exercised every principle in the constitution. The group-aware consensus method, by design, excludes principles favored only by minority opinion clusters, which trades inclusiveness for breadth of agreement. [1][3]
Press coverage framed the project as an early attempt to make model alignment more democratic, while noting open questions about who counts as the relevant public, whether a single national sample can speak for a global product, and how such input would scale or recur. CIP positioned the work within its broader Alignment Assemblies effort and expressed hope that communities elsewhere might build culturally specific models with similar techniques. CCAI did not change the constitution governing Anthropic's shipped products, which continued to follow the company's own document and its responsible scaling policy; it was a research demonstration of a process rather than a production system. [1][2][3]