arXiv

Research Organizations

10 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v2 · 1,967 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

arXiv is the free, open-access repository where nearly every consequential artificial intelligence paper of the deep learning era first appeared in public, often months before, or instead of, formal peer review. Founded in August 1991 by physicist Paul Ginsparg at Los Alamos National Laboratory and operated by Cornell University since 2001, it grew from an email list for theoretical high-energy physics into the central distribution channel for machine learning and AI research. By mid-June 2026 the service had received 3,077,861 submissions in total, and computer science had become its single largest subject area ^[1]^[6]. arXiv performs no peer review of its own: its founder describes it as "a way of communicating science" rather than a journal ^[3]. On July 1, 2026, after 25 years at Cornell, arXiv became an independent nonprofit organization ^[2].

(arXiv is pronounced "archive": the name spells the word with the Greek letter chi in place of "ch.")

When was arXiv founded and who created it?

Ginsparg started the service on August 14, 1991 as an automated email server that distributed preprints in theoretical high-energy physics, reachable at the address xxx.lanl.gov ^[3]^[4]. Physicists had long circulated paper preprints by mail months ahead of journal publication; Ginsparg's server made that informal system instant, complete, and free to anyone with an internet connection. A web interface followed in 1993, and coverage expanded through the decade into other areas of physics and into mathematics ^[4].

Computer science arrived in September 1998, when the Computing Research Repository (CoRR) launched as a cooperation between the ACM, the Los Alamos e-print archive, and the NCSTRL digital library network, giving CS researchers a dedicated section with its own classification scheme ^[5]. The service was renamed arXiv.org in late 1998 ^[4]. In 2001 Ginsparg left Los Alamos for a faculty position at Cornell University and the repository moved with him; Cornell ran it first through the university library and in recent years through Cornell Tech ^[2]^[3]. In 2021 Ginsparg received the Einstein Foundation's inaugural Individual Award for his role in transforming scientific communication ^[3].

On April 2, 2026, arXiv announced that it would separate from Cornell and become a standalone nonprofit on July 1, 2026, with Cornell and the Simons Foundation, its largest philanthropic backer, jointly supporting the transition. A search for arXiv's first chief executive began at the same time ^[2].

Year	Milestone
1991	Launched at Los Alamos as an email preprint server (xxx.lanl.gov) ^[3]
1993	Web interface added ^[4]
1998	Computer science section launched via CoRR; service renamed arXiv.org ^[4]^[5]
2001	Moved to Cornell University with Ginsparg ^[3]
2008	500,000th article posted (October) ^[4]
2014	Cumulative articles pass 1 million ^[4]
2021	Cumulative articles pass 2 million ^[4]
2024	Monthly record of 24,226 submissions (October); computer science is the largest subject area ^[6]
2026	Cumulative submissions reach 3,077,861 (mid-June); independence from Cornell takes effect July 1 ^[1]^[2]

How does arXiv work if there is no peer review?

arXiv accepts papers in eight subject areas: physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics ^[7]. Reading and submitting are both free. Two gatekeeping layers stand between an upload and public announcement, and neither is peer review.

The first is endorsement, introduced in 2004: a first-time submitter to a category must be endorsed by an established arXiv author in that area, though researchers with recognized institutional email addresses or prior co-authored arXiv papers are often endorsed automatically ^[4]^[8]. The second is moderation. Volunteer subject-matter experts holding terminal degrees, approved by arXiv's advisory committees and staff, screen submissions for topical relevance and basic scholarly standards. Moderators can reclassify a paper into a different category, place it on hold, decline it, or withdraw it after announcement, but arXiv is explicit that "the arXiv moderation process is not a peer-review process" and that moderators neither give feedback nor certify correctness ^[9].

Accepted submissions, typically uploaded as LaTeX source, are announced on a rolling daily cycle and assigned a permanent identifier such as arXiv:1706.03762. Authors can post revised versions, but every announced version remains publicly accessible: papers cannot be deleted, only marked as withdrawn ^[10]. Since January 2023, arXiv has barred generative AI tools from being listed as authors and requires disclosure of any significant use of large language models in preparing a paper, with human authors taking full responsibility for the contents ^[11].

Why is computer science now the largest area on arXiv?

Growth has been relentless and is accelerating. arXiv first exceeded 20,000 submissions in a month in May 2023; October 2024 set a record of 24,226 new papers, and cumulative submissions reached 2,597,322 by the end of that month ^[6]. arXiv's public statistics show monthly totals approaching 28,000 by late 2025, with the all-time counter at 3,077,861 in mid-June 2026 ^[1].

Computer science, a relative latecomer, is now the largest of the eight subject areas. In October 2024 the three most active categories on the entire site were cs.LG (machine learning), cs.CV (computer vision), and cs.CL (natural language processing); together they accounted for more than 6,000 new papers in that month alone ^[6]. The artificial intelligence category cs.AI alone roughly doubled year over year, from 1,742 papers in one November 2023 sample to 3,242 a year later ^[6]. The shift tracks the deep learning boom: a field that once relied on journals and gated proceedings moved its primary record onto a preprint server.

Why do AI researchers publish on arXiv first?

For AI researchers, arXiv is not a supplement to publication; for much of the field it is the publication venue of record. The norm since the mid-2010s has been to post work the moment it is ready and, sometimes, submit it to NeurIPS, ICML, or ICLR afterward. "Attention Is All You Need," the paper that introduced the Transformer, appeared on arXiv in June 2017, half a year before its NeurIPS publication, and was already shaping follow-up work in the interim ^[12]. OpenAI's GPT-4 technical report went straight to arXiv in March 2023 and never passed through a peer-reviewed venue at all ^[13]. Citing papers by arXiv identifier is routine, and scanning the day's new cs.LG and cs.CL listings is a professional habit.

A tooling ecosystem grew on top of the firehose: Andrej Karpathy's arXiv Sanity Preserver for sorting machine learning preprints, Hugging Face's Daily Papers feed, Papers with Code's linking of arXiv IDs to open-source implementations, and alphaXiv's discussion layer built directly on arXiv pages.

The preprint-first culture has repeatedly collided with double-blind conference review, because a posted preprint can reveal author identities to reviewers. The Association for Computational Linguistics long enforced an "anonymity period" that barred posting to arXiv in the month before submission deadlines and during review; it abandoned the rule on January 12, 2024, permitting non-anonymous preprints at any time while keeping the submissions themselves anonymized ^[14].

How is arXiv handling AI-generated papers?

On October 31, 2025, arXiv announced an updated practice for the computer science section: review articles and position papers would no longer be accepted unless already accepted by a peer-reviewed journal or conference, with workshop acceptance explicitly insufficient, and authors must supply the journal reference and DOI metadata with their submission ^[15]. The trigger was volume. arXiv said it was receiving "hundreds of review articles every month," most of them "little more than annotated bibliographies, with no substantial discussion of open research issues," a surge it attributed to large language models making such papers fast and cheap to generate ^[15]. Press coverage framed the change as arXiv being "spammed with AI-generated 'research' papers" ^[16]; arXiv itself described it as a stricter application of long-standing editorial standards rather than a new policy, and noted that other sections could adopt the same practice if they experience similar surges ^[15].

The underlying problem has empirical support. A January 2026 analysis estimated that 21.4 percent of the content of recent computer science review papers on arXiv was LLM-generated, against 14.0 percent for non-review papers ^[17]. The new practice also drew criticism: some researchers argued it blocks legitimate survey and position work by independent or junior authors who lack conference access, and simply shifts gatekeeping onto already overloaded conference review systems ^[18]. The episode crystallized a broader tension, as the flood of low-effort machine-written text sometimes called AI slop reached the very platform on which AI research itself is published.

How is arXiv funded and governed?

arXiv's annual budget is roughly $6 million ^[19]. Cornell has provided a cash subsidy plus in-kind coverage of indirect costs, with the remainder coming from the Simons Foundation, grants, individual donors, and a membership program ^[19]^[20]. Under the current model, member universities, libraries, and research institutes contribute from $1,000 per year; affiliate professional societies and government agencies are asked for $5,000 to $100,000; and corporate sponsors for $10,000 to $200,000 ^[20]. The Simons Foundation and Schmidt Sciences are separately funding a multiyear modernization of arXiv's aging codebase and its migration to cloud infrastructure ^[19]. The independent nonprofit taking over on July 1, 2026 retains the same mission, "to advance scientific discovery by supporting researchers with a free, fast, and reliable open service," with Cornell and the Simons Foundation backing the transition ^[2].

Significance and criticism

arXiv demonstrated, years before "open access" became a movement, that an entire discipline could move its communication system onto a free public server, and it became the template for later preprint services such as bioRxiv, medRxiv, and chemRxiv ^[4]. For AI specifically it serves as the field's timestamped public record: priority claims, model announcements, and benchmark results are dated by arXiv identifiers and version histories rather than by journal issues.

The same openness draws persistent criticism. Because arXiv performs no peer review, errors and unsupported claims circulate with the same ease as solid results, and readers must judge quality themselves. The endorsement and moderation systems have at times been criticized as opaque or as restricting legitimate inquiry ^[4]. And the volunteer moderation model is under visible strain from AI-era volume, the very pressure that produced the October 2025 computer science practice change ^[15]^[16]. The qualities that made arXiv indispensable to artificial intelligence, speed and openness, are now the ones that generative AI tests most severely.

References

arXiv.org, "Monthly Submissions" (statistics page), accessed June 2026. https://arxiv.org/stats/monthly_submissions ↩
arXiv blog, "arXiv is becoming an independent nonprofit," April 2, 2026. https://blog.arxiv.org/2026/04/02/arxiv-is-becoming-an-independent-nonprofit/ ↩
Cornell Chronicle, "arXiv founder Ginsparg wins Einstein Foundation Berlin Award," November 2021. https://news.cornell.edu/stories/2021/11/arxiv-founder-ginsparg-wins-einstein-foundation-berlin-award ↩
Wikipedia, "arXiv," accessed June 2026. https://en.wikipedia.org/wiki/ArXiv ↩
Halpern, Joseph Y., "The Computing Research Repository: Promoting the Rapid Dissemination and Archiving of Computer Science Research," arXiv:cs/9812020, December 1998. https://arxiv.org/abs/cs/9812020 ↩
arXiv blog, "arXiv sets new record for monthly submissions (again)!," November 4, 2024. https://blog.arxiv.org/2024/11/04/arxiv-sets-new-record-for-monthly-submissions-again/ ↩
arXiv, "About arXiv." https://info.arxiv.org/about/index.html ↩
arXiv help, "The arXiv endorsement system." https://info.arxiv.org/help/endorsement.html ↩
arXiv help, "Moderation." https://info.arxiv.org/help/moderation/index.html ↩
arXiv help, "Availability of submissions." https://info.arxiv.org/help/availability.html ↩
arXiv blog, "arXiv announces new policy on ChatGPT and similar tools," January 31, 2023. https://blog.arxiv.org/2023/01/31/arxiv-announces-new-policy-on-chatgpt-and-similar-tools/ ↩
Vaswani, Ashish, et al., "Attention Is All You Need," arXiv:1706.03762, June 2017. https://arxiv.org/abs/1706.03762 ↩
OpenAI, "GPT-4 Technical Report," arXiv:2303.08774, March 2023. https://arxiv.org/abs/2303.08774 ↩
ACL Rolling Review, "Update to Anonymity Policy," January 12, 2024. https://aclrollingreview.org/anonymity/ ↩
arXiv blog, "Attention Authors: Updated Practice for Review Articles and Position Papers in arXiv CS Category," October 31, 2025. https://blog.arxiv.org/2025/10/31/attention-authors-updated-practice-for-review-articles-and-position-papers-in-arxiv-cs-category/ ↩
404 Media, "arXiv Changes Rules After Getting Spammed With AI-Generated 'Research' Papers," November 2025. https://www.404media.co/arxiv-changes-rules-after-getting-spammed-with-ai-generated-research-papers/ ↩
"LLM-Generated or Human-Written? Comparing Review and Non-Review Papers on ArXiv," arXiv:2601.17036, January 2026. https://arxiv.org/abs/2601.17036 ↩
"When Filters Meet Freedom: Reflections on arXiv's New Review Article and Position Paper Policy," andrewcompling.blog, November 18, 2025. https://andrewcompling.blog/2025/11/18/when-filters-meet-freedom-reflections-on-arxivs-new-review-article-and-position-paper-policy/ ↩
Cornell Tech, "arXiv." https://tech.cornell.edu/arxiv/ ↩
arXiv, "Funding." https://info.arxiv.org/about/funding.html ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Agent benchmark reward hacking Bill Peebles COLLIE CharXiv Chinchilla scaling laws Code Llama Data Provenance Initiative EleutherAI Gaming Huawei AI Jua Kuaishou LLM Anxiety MMLU-Pro Paper2Video RedPajama SkillsBench The Pile (dataset)

When was arXiv founded and who created it?

How does arXiv work if there is no peer review?

Why is computer science now the largest area on arXiv?

Why do AI researchers publish on arXiv first?

How is arXiv handling AI-generated papers?

How is arXiv funded and governed?

Significance and criticism

References

Improve this article

Related Articles

ByteDance Seed

METR

Non-profit Organizations

Organizations

Machine Intelligence Research Institute

Center for AI Safety

What links here

Related Articles

ByteDance Seed

METR

Non-profit Organizations

Organizations

Machine Intelligence Research Institute

Center for AI Safety

What links here