# WildChat

> Source: https://aiwiki.ai/wiki/wildchat
> Updated: 2026-06-09
> Categories: Data & Datasets, Machine Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

## Overview

WildChat is a large public corpus of real conversations between human users and [ChatGPT](/wiki/chatgpt), released by researchers at the [Allen Institute for AI](/wiki/allen_institute_for_ai) (AI2) and [Cornell University](/wiki/cornell_university). It was collected by deploying free public chatbot services backed by OpenAI's GPT-3.5 and GPT-4 application programming interfaces (APIs), where users consented to have their anonymized chat transcripts released for research [1][2]. The headline release, sometimes called WildChat-1M, contains 1,039,785 conversations comprising 2,639,415 interaction turns drawn from an estimated 204,736 unique users, spanning 68 languages [1]. Unlike most [instruction tuning](/wiki/instruction_tuning) datasets, which are synthetic or hand-curated, WildChat captures genuine "in-the-wild" usage, including ambiguous requests, code-switching, multi-turn dialogue, and a relatively high fraction of toxic or adversarial prompts. It has become a widely used resource for training assistant models and for [AI safety](/wiki/ai_safety) research, including the study of jailbreaks and toxic use.

The accompanying paper, "WildChat: 1M ChatGPT Interaction Logs in the Wild," was published as a spotlight at the International Conference on Learning Representations ([ICLR](/wiki/iclr)) 2024 [2]. The dataset is distributed through [Hugging Face](/wiki/hugging_face) under the AI2 organization and a project page at wildchat.allen.ai.

## Motivation

The authors observed that the data used to align and instruction-tune chatbots was largely synthetic (model-generated, as in Alpaca and [Evol-Instruct](/wiki/evol_instruct)) or assembled from curated sources, and rarely reflected how real people actually use conversational systems. Datasets of authentic user-chatbot interactions were scarce because the leading commercial assistants did not release their logs. WildChat was designed to fill that gap by gathering a large corpus of genuine human prompts and model responses, enabling researchers to study real usage patterns, train models on naturally occurring instructions, and analyze behaviors such as multilingual usage, topic and language switching, ambiguous queries, political discussion, and harmful or policy-violating requests [2].

A related goal was to capture toxic and adversarial content rather than filter it out at the source. The authors deliberately retained potentially harmful prompts and jailbreak attempts (with sensitive content flagged), arguing that such material is essential for studying misuse, building moderation tools, and improving model robustness [1][2].

## How WildChat was collected

The team hosted two publicly accessible chatbot deployments on [Hugging Face](/wiki/hugging_face) Spaces, one linked to the GPT-3.5-Turbo API and one to the GPT-4 API, and offered them free of charge to online users [1][2]. At a time when GPT-4 access was paid and rate-limited, free access was a strong incentive that attracted substantial organic traffic. In exchange, users gave affirmative, consensual opt-in for their conversation transcripts and request headers to be collected and released anonymously, with a consent banner and terms presented before use [2].

Collection ran from April 9, 2023 to roughly May 1, 2024 [1]. Each conversation was logged together with metadata derived from the request, including a timestamp, a hashed IP address, an inferred geographic location (state and country), and request-header information such as browser and operating system. Because raw IP addresses were hashed and personally identifying details were detected and redacted, the public release preserves geographic and temporal signal while protecting user identity [1][2].

After collection, every turn was passed through automated content classifiers, the OpenAI Moderation API and Detoxify, whose category scores are stored alongside the text and used to populate a "toxic" flag and a "redacted" flag indicating where personal information was removed [1].

## Contents and statistics

The released corpus is enriched well beyond raw text. Each example includes the full conversation, the serving model (for example gpt-3.5-turbo-0301 or gpt-4-0314), turn count, detected language, country and state, a hashed IP address, the toxic and redacted flags, and full moderation scores from both classifiers. The collection skews toward GPT-3.5: about 24 to 25 percent of conversations come from the GPT-4 service and the remaining roughly 75 to 76 percent from GPT-3.5-Turbo [1]. The earlier "WildChat" release contained a smaller subset, and AI2 later published further expanded versions (including WildChat-4.8M); the 1M release described here remains the most widely cited.

WildChat is multilingual to an unusual degree, identifying 68 languages that appear in at least 100 user prompts. The distribution by turn is led by English, followed by sizable Chinese and Russian fractions [2]:

| Statistic | Value |
|---|---|
| Conversations | 1,039,785 |
| Interaction turns | 2,639,415 |
| Estimated unique users | 204,736 |
| Languages (100+ prompts) | 68 |
| GPT-4 share of conversations | about 24 to 25% |
| English share of turns | about 53% |
| Chinese share of turns | about 13% |
| Russian share of turns | about 12% |
| Collection period | April 9, 2023 to May 1, 2024 |

A distinctive feature is the prevalence of toxic and unsafe content. Using the OpenAI Moderation API, about 6.05 percent of user turns and 5.18 percent of chatbot turns were flagged as toxic; counting either the OpenAI Moderation API or Detoxify, the figures rise to roughly 10 percent of user turns and about 6.6 percent of chatbot turns [1][2]. This toxicity rate is notably higher than in comparable corpora, which the authors attribute partly to the anonymous, free, and unrestricted nature of the deployments.

## Uses

WildChat serves two broad purposes: training [conversational models](/wiki/conversational_models) and supporting safety research.

For training, the authors demonstrated the dataset's value by fine-tuning a Llama-2 7B model, dubbed WildLlama, on a subset of WildChat conversations [2]. On the MT-Bench evaluation it achieved an average score competitive with contemporaneous open chat models such as Vicuna and Llama-2-Chat, showing that genuine in-the-wild logs can be a useful supervised fine-tuning signal. Because the corpus pairs real human prompts with model responses, it is also used as a source of instructions and as preference or evaluation data, and it informed the broader "Wild" family of AI2 resources, including the WildBench live evaluation suite for chat models.

For safety, the large volume of toxic prompts, jailbreak attempts, and adversarial queries makes WildChat a heavily used testbed for studying misuse, training and benchmarking moderation classifiers, and analyzing how real users probe model guardrails [1][2]. The geographic and temporal metadata further enable analyses of usage patterns across regions and over time. Because the dataset contains genuinely harmful material, AI2 gates the most sensitive variant (WildChat-1M-Full, which retains the unredacted moderation results and richer metadata) behind an access application that requires identifying information and a stated research justification; a less sensitive version is recommended for general use [1].

## Access and licensing

WildChat is distributed on Hugging Face under the AI2 organization. It was originally released under the AI2 ImpACT licenses and was later relicensed to the Open Data Commons Attribution license (ODC-BY), with the full toxic-content variant kept under gated, application-based access [1]. The original consent flow, the hashing of IP addresses, and the automated redaction of detected personal information were intended to balance research utility against user privacy.

## Relationship to other interaction datasets

WildChat is most directly comparable to [LMSYS-Chat-1M](/wiki/lmsys_chat_1m), released around the same time by [LMSYS](/wiki/lmsys), which also contains roughly one million real user conversations (210,479 users, 65 languages) but draws responses from many different models served through the Chatbot Arena and Vicuna demo rather than only from ChatGPT [2]. Relative to LMSYS-Chat-1M, WildChat offers slightly more conversations and languages, sources its responses specifically from GPT-3.5 and GPT-4, retains a higher proportion of toxic content (its user-turn toxicity rate is several times that of LMSYS-Chat-1M), and adds geographic and temporal metadata.

WildChat also differs from ShareGPT, a community-contributed collection of about 94,000 ChatGPT conversations that users voluntarily uploaded via a browser extension, and from synthetic instruction datasets such as Alpaca (about 52,000 model-generated examples). The comparison the authors present is summarized below [2]:

| Dataset | Conversations | Users | Languages | Source of responses |
|---|---|---|---|---|
| WildChat | 1,039,785 | 204,736 | 68 | GPT-3.5 and GPT-4 |
| LMSYS-Chat-1M | about 1,000,000 | 210,479 | 65 | many models (Chatbot Arena) |
| ShareGPT | about 94,000 | not reported | 41 | ChatGPT (user-uploaded) |
| Alpaca | about 52,000 | not reported | 1 (English) | synthetic (text-davinci-003) |

A later, separate work, WildChat-50M (Feuer et al., ICML 2025), reuses WildChat's user prompts but regenerates responses from more than 50 open-weight models to study synthetic data in post-training; it is built on top of WildChat rather than being part of the original AI2 and Cornell release.

## References

1. allenai/WildChat-1M, Datasets at Hugging Face. https://huggingface.co/datasets/allenai/WildChat-1M
2. Zhao, W., Ren, X., Hessel, J., Cardie, C., Choi, Y., Deng, Y. "WildChat: 1M ChatGPT Interaction Logs in the Wild." ICLR 2024 (spotlight). https://arxiv.org/abs/2405.01470
3. WildChat, OpenReview (ICLR 2024). https://openreview.net/forum?id=Bl8u7ZRlbM
4. allenai/WildChat-1M-Full, Datasets at Hugging Face. https://huggingface.co/datasets/allenai/WildChat-1M-Full
5. Feuer, B., et al. "WildChat-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training." ICML 2025. https://arxiv.org/abs/2501.18511

