Legal
Last reviewed
May 13, 2026
Sources
25 citations
Review status
Source-backed
Revision
v2 ยท 5,013 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
25 citations
Review status
Source-backed
Revision
v2 ยท 5,013 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Legal ChatGPT Plugins
Legal AI is the use of artificial intelligence, particularly machine learning and large language models, to assist with the work of lawyers, judges, regulators, and self-represented litigants. The field covers tasks that range from full-text search of case law and statutes to contract review, document drafting, e-discovery, litigation analytics, and direct interaction with consumers through chatbots.
Generative AI tools became widely available in the legal industry after the release of ChatGPT in late 2022 and GPT-4 in March 2023. Within two years, every major American legal research vendor had launched an AI assistant, Allen & Overy (now A&O Shearman) deployed Harvey across thousands of lawyers, and Thomson Reuters paid 650 million US dollars in cash for the legal AI startup Casetext. Adoption arrived alongside a series of embarrassments, the best known being Mata v. Avianca, a 2023 case in which a Manhattan lawyer filed a brief containing six entirely fictitious decisions invented by ChatGPT.
This article describes the major commercial systems, the academic benchmarks used to evaluate them, the empirical record on hallucinations, the growing list of sanctioned attorneys, and the early shape of regulation by courts and bar associations.
Legal work is text-heavy, billed by the hour, and built on retrieving and recombining authority. Those properties make law a natural target for natural language processing. A 2024 random-sample survey of US federal judges run by the New York City Bar Association found roughly 38% of judges did not use AI at all in their work, while the rest used it most often for legal research and document review. On the practitioner side, surveys by Thomson Reuters, Bloomberg Law, and the American Bar Association in 2024 and 2025 reported large majorities of firms either piloting or deploying generative AI.
The industry typically splits legal AI applications into the following categories:
| Application | Representative tools |
|---|---|
| Legal research | Westlaw Precision with CoCounsel, Lexis+ with Protege, Bloomberg Law AI Assistant, Harvey, Vincent AI, Hebbia |
| Contract review and drafting | Kira (Litera), Spellbook, Robin AI, DraftWise, Henchman, ContractPodAi Leah, Luminance, Lawgeex |
| E-discovery and document review | Relativity aiR, Everlaw AI Assistant, DISCO, Reveal-Brainspace |
| Litigation analytics | Lex Machina, Premonition, Trellis, Pre/Dicta |
| Regulatory and compliance | Bloomberg Law, Thomson Reuters Practical Law, Compliance.ai |
| Access-to-justice and consumer tools | DoNotPay (historically), LegalZoom, JusticeText, Hello Divorce, Upsolve |
The largest pure-play legal AI company is Harvey, which raised a 300 million US dollar Series D in February 2025 at a 3 billion US dollar valuation and continued raising rounds through 2025 and 2026 at successively higher valuations.
Research on AI and law predates the modern generative AI wave by half a century.
The field is usually traced to L. Thorne McCarty's TAXMAN project, started at Harvard in the early 1970s and continued at Rutgers. TAXMAN attempted to formalize the reasoning in a US tax reorganization case (Eisner v. Macomber) and is often described as the first sustained effort to reproduce legal argumentation in code. McCarty is sometimes called the father of AI and law. Other rule-based legal expert systems followed in the 1980s and 1990s, including Robert Michaelsen's TaxAdvisor and Dean Schlobohm's TaxAdvisor system for federal tax planning.
The biennial International Conference on Artificial Intelligence and Law (ICAIL), first held in 1987, became the main academic venue for this work. Influential systems from this era included HYPO (Edwina Rissland and Kevin Ashley, case-based reasoning in trade secret law) and CATO (Ashley and Aleven, teaching case-based legal argument).
Commercial uses of machine learning entered the legal market through e-discovery. In Da Silva Moore v. Publicis Groupe, decided February 24, 2012, US Magistrate Judge Andrew J. Peck of the Southern District of New York issued the first judicial approval of "computer-assisted review," what practitioners now call technology-assisted review or TAR, in a gender discrimination class action involving roughly three million documents. The decision became the founding precedent for predictive coding and was followed by Rio Tinto plc v. Vale S.A. (S.D.N.Y. 2015), in which Peck reaffirmed and refined his earlier ruling.
Contract analytics startups emerged in parallel. Kira Systems was founded in Toronto in 2011 by Noah Waisberg, a former Weil Gotshal lawyer, and Alexander Hudek, a computer scientist. By the time Litera acquired Kira in August 2021, the software could recognize over 1,400 contract provisions and was used by more than half the Am Law 100. Luminance was founded in 2015 by Cambridge mathematicians, with Slaughter and May piloting the system in 2016 and later becoming an investor.
Litigation analytics began as a Stanford Law School public-interest project: the Intellectual Property Litigation Clearinghouse, started in 2006 by professor Mark Lemley. The project spun out as Lex Machina in 2010 and was acquired by LexisNexis in November 2015, in what was probably the first acquisition of a legal AI company by a major incumbent.
ROSS Intelligence, founded at the University of Toronto in 2014, marketed itself as the "first artificially intelligent lawyer" built on IBM Watson, and licensed Westlaw headnotes early in its life. Thomson Reuters sued ROSS for copyright infringement in 2020. ROSS shut down operations the same year; the litigation continued. In February 2025 Judge Stephanos Bibas, sitting by designation in the District of Delaware, issued a partial summary judgment for Thomson Reuters and held that ROSS's use of 2,243 Westlaw headnotes to train a competing legal-research engine was not fair use. The decision was the first US ruling on fair use as a defense to AI training and is on appeal.
Casetext, founded in 2013 by Jake Heller, Pablo Arredondo, and Laura Safdie, launched its CARA brief-analysis assistant in late 2016. By 2022 CARA had been adopted by more than 4,000 law firms.
Generative AI changed the market within months. Casetext was an early OpenAI partner and launched CoCounsel, built on GPT-4, in March 2023. Thomson Reuters announced a 650 million US dollar all-cash acquisition of Casetext in June 2023 and closed the deal in August. LexisNexis launched Lexis+ AI in May 2023 in commercial preview, with general availability later that year, and rebranded the product as Lexis+ with Protege in February 2026. Bloomberg Law rolled out two AI tools (Bloomberg Law Answers and Bloomberg Law AI Assistant) on January 14, 2025.
The startup side moved at the same pace. Harvey was founded in 2022 by former O'Melveny associate Winston Weinberg and former Google DeepMind and Meta researcher Gabriel Pereyra, took seed funding from the OpenAI Startup Fund, and signed Allen & Overy as its launch customer in February 2023. Spellbook (Toronto), Robin AI (London, an Anthropic partner), DraftWise, Henchman (acquired by LexisNexis in June 2024), Hebbia, and vLex's Vincent AI all launched or expanded generative features through 2023 and 2024.
The first peer-reviewed legal-domain foundation models also appeared in this period, including Equall.ai's SaulLM-7B (March 2024, the first open large language model for law, trained on 30 billion tokens of legal data on top of Mistral 7B) and a 54B and 141B Saul family released in August 2024.
Legal research, finding cases, statutes, and secondary sources that bear on a question, is the most contested commercial market for legal AI.
The two dominant US incumbents are Westlaw (Thomson Reuters) and Lexis (RELX). Both layered generative AI on top of their existing research platforms in 2023. The table below summarizes the major commercial offerings as of mid-2025.
| Product | Vendor | Launch | Model basis | Notes |
|---|---|---|---|---|
| Westlaw Precision with CoCounsel | Thomson Reuters | November 2023 (AI-Assisted Research); deeper CoCounsel 2.0 integration October 2024 | GPT-4 and successors via Casetext stack | Eight "core skills" including AI-Assisted Research, draft correspondence, prepare for deposition, extract contract data |
| Lexis+ with Protege | LexisNexis (RELX) | Lexis+ AI commercial preview May 2023; rebranded Protege February 2026 | Multiple LLMs in closed-loop deployment | Combines extractive, generative, and "agentic" AI within Lexis+ |
| Bloomberg Law AI Assistant and Answers | Bloomberg Law | January 14, 2025 | Multiple LLMs | Document-grounded chat and natural-language search; included with subscription |
| Harvey | Counsel AI Corp | February 2023 (A&O launch) | OpenAI models, including custom fine-tunes | Enterprise tool for big law and in-house teams; 235 customers across 42 countries by early 2025 |
| Vincent AI | vLex (now part of Clio) | Generative relaunch April 2024 | Hybrid generative and rules-based pipeline | Covers 13+ jurisdictions including US, UK, EU, Mexico, Chile; AALL 2024 Product of the Year |
| Hebbia (Matrix) | Hebbia | 2022 | Multiple LLMs | Document-search platform used by finance and legal; 130 million US dollar Series B led by Andreessen Horowitz in July 2024 |
| CoCounsel (standalone) | Thomson Reuters (acquired Casetext August 2023) | Casetext launch March 2023 | GPT-4 originally | Crossed one million users in February 2026 |
Westlaw Precision and Lexis+ both rely on retrieval-augmented generation, grounding answers in their proprietary case-law and secondary-source databases rather than asking the model to recall law from training data.
Contract work is the most mature non-research application of legal AI. Static-rule and machine-learning systems dominated through the late 2010s; generative AI replaced or augmented them after 2022.
A widely cited 2018 study by LawGeex pitted 20 experienced US-trained lawyers against the LawGeex algorithm on five non-disclosure agreements totaling 153 paragraphs. The AI averaged 94% accuracy at identifying risks against the lawyers' 85%. The best human reviewer tied the machine at 94%; the worst scored 67%. The AI completed the task in 26 seconds; the humans averaged 92 minutes. The study became a touchstone in marketing and is often the first reference cited when vendors argue that AI exceeds human accuracy on narrow tasks. (LawGeex itself later faded as a standalone product.)
The modern contract stack includes:
| Vendor | Founded | Headquarters | Approach |
|---|---|---|---|
| Kira (Litera) | 2011 | Toronto | Machine-learning contract analysis; "Smart Summaries" generative feature added August 2023 |
| Luminance | 2015 | Cambridge, UK | Proprietary LLM trained on over 150 million legal documents; "Panel of Judges" architecture |
| Spellbook | 2021 (as Rally) | Toronto | Generative drafting and redline inside Microsoft Word; 50 million US dollar Series B October 2025 |
| Robin AI | 2019 | London | Anthropic partner using Claude; 26 million US dollar Series B January 2024 |
| DraftWise | 2018 | New York/London | Generative drafting trained on a firm's own precedents; 20 million US dollar Series A March 2024 from Index Ventures |
| Henchman | 2020 | Ghent | Clause database integration in Word; acquired by LexisNexis June 2024 |
| ContractPodAi (Leah) | 2012 | London | Agentic contract lifecycle management; PwC and KPMG partnerships in 2024 |
| Lawgeex | 2014 | Tel Aviv | Pre-generative ML contract review; pivoted and largely faded after 2022 |
Most of these tools work as plug-ins to Microsoft Word or to a contract lifecycle management system, with the AI proposing redlines, flagging non-market terms against a firm's playbook, or generating clauses from natural-language prompts.
E-discovery, the process of identifying responsive documents in litigation, was the first part of the legal industry to adopt machine learning at scale. Lawyers had used keyword search since the 1990s; supervised classification (predictive coding) became the second wave after Judge Peck's 2012 Da Silva Moore decision.
The modern e-discovery vendors have layered generative AI on top of TAR. Relativity, which dominates the corporate market, announced its aiR product family in September 2023. aiR for Review and aiR for Privilege entered general availability in 2024. In internal tests against a real-world customer dataset, Relativity reported aiR for Review achieving a precision of 0.77 and recall of 0.82, with Relativity claiming the model beat first-pass human reviewer recall by 36%.
Everlaw, Relativity's largest competitor on the cloud side, released the Everlaw AI Assistant for general availability in August 2024 after a year-long beta with about 125 companies and 2,900 users. Its skills include coding suggestions, document summarization, deposition preparation, and a Deep Dive workflow that lets attorneys ask questions of a corpus and receive answers with document citations.
The transition from older predictive-coding pipelines to generative-AI document review remains active, with several courts having now considered the use of large language models in discovery on similar terms to the older TAR jurisprudence.
Litigation analytics platforms predict outcomes, judge behavior, and case timelines using structured data extracted from court records.
Lex Machina, founded as a Stanford project in 2006 and spun out in 2010, was acquired by LexisNexis in November 2015. The platform now covers patent, antitrust, securities, employment, contracts, product liability, insurance, bankruptcy, and other practice areas, and is sold both standalone and as part of Lexis+. Premonition (founded 2014, Miami) markets itself on attorney win-rate analytics. Pre/Dicta launched in 2022 with a claim to predict motion outcomes by judge using over 100 non-legal data points. Trellis focuses on state court records, which historically were the hardest to scrape.
The analytical premise behind these systems, that judges' behavior is sufficiently regular to model, remains contested in the academic literature. Empirical legal scholars at Stanford, Northwestern Pritzker, and elsewhere have warned that prediction tools can encode the biases of the underlying record rather than uncover ground truth.
A second branch of legal AI targets consumers rather than firms. LegalZoom, founded in 2001, was the first mass-market online legal services company and added AI-assisted document generation through the 2010s.
DoNotPay, founded by Joshua Browder in 2015 and originally a parking-ticket chatbot, became the most visible test case for consumer-facing legal AI. Browder announced on January 5, 2023, that DoNotPay would represent a defendant in a US traffic court using a smartphone earpiece to relay AI-generated arguments in real time. Browder cancelled the stunt later in January after a State Bar of California prosecutor warned that doing so could expose him to up to six months in jail for unauthorized practice of law.
On September 25, 2024, the Federal Trade Commission announced a proposed consent order against DoNotPay as part of its "Operation AI Comply" sweep. The FTC alleged that DoNotPay had advertised itself as "the world's first robot lawyer" without testing whether its outputs matched human lawyer work and without retaining lawyers in-house. DoNotPay agreed to pay 193,000 US dollars, notify subscribers from 2021 through 2023 of the limits of its services, and stop making unsupported claims about substituting for lawyers. The company did not admit or deny the allegations.
Other access-to-justice tools (Upsolve for bankruptcy, JusticeText for public defenders, Hello Divorce, and assorted state-bar-sponsored chatbots) have generally taken narrower, more carefully scoped positions to avoid the unauthorized-practice problem that tripped DoNotPay.
Academic benchmarks for legal AI are still young. Three are dominant.
| Benchmark | Year | Lead authors | What it measures |
|---|---|---|---|
| CaseHOLD | 2021 | Lucia Zheng, Neel Guha, Brandon Anderson, Peter Henderson, Daniel E. Ho (Stanford RegLab) | 53,000+ multiple-choice items: pick the correct holding statement for a case from five candidate holdings. Won the Carole Hafner Best Paper award at ICAIL 2021. |
| LegalBench | 2023 | Neel Guha et al., 40+ contributors across Stanford, Yale, UVA, Northwestern, IIT Chicago-Kent, and elsewhere | 162 tasks across six categories of legal reasoning (issue spotting, rule recall, rule application, interpretation, rhetorical analysis, rule conclusion). Presented at NeurIPS 2023. |
| GPT-4 on the Uniform Bar Exam | 2023 | Daniel Martin Katz, Michael Bommarito, Shang Gao, Pablo Arredondo | GPT-4 zero-shot on the full UBE: scored 297, a passing score that placed in the reported 90th percentile of human test-takers. |
The bar exam result drew immediate scrutiny. Eric Martinez, then a researcher at MIT, published a re-evaluation in 2023 (peer-reviewed in 2024 in Artificial Intelligence and Law) arguing that the original 90th-percentile claim was inflated. Comparing GPT-4 against first-time test-takers using official National Conference of Bar Examiners data, Martinez estimated GPT-4 actually performed around the 63rd percentile overall and the 42nd percentile on essays. Restricting the comparison to people who had passed the exam at some point dropped the estimates to the 48th overall and 15th on essays. Martinez argued the original study had used the February 2023 administration as its reference pool, which is dominated by repeat takers who had failed earlier sittings, and had graded the essays against unofficial Maryland-state model answers rather than NCBE rubrics.
The Katz et al. paper is still important because it changed conversations inside the profession about what large language models could plausibly do. The Martinez critique is the corrective.
LegalBench evaluations have shown a wide range of performance across commercial and open models, with frontier closed models from OpenAI, Anthropic, and Google leading on most subtasks. SaulLM-7B, the legal fine-tune from Equall.ai, reported a roughly 6% gain over the base Mistral 7B on LegalBench tasks. Stanford and Yale researchers have also reported that smaller domain-pretrained models can be competitive on narrower citation and holding tasks; Zheng et al. found a 7.2-point F1 gain (12% relative) over base BERT on CaseHOLD when pretraining on a legal corpus with a legal vocabulary.
The most consequential empirical work on legal AI to date has come from the Stanford RegLab and the Stanford Institute for Human-Centered AI.
The first paper, "Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models" (Matthew Dahl, Varun Magesh, Mirac Suzgun, and Daniel E. Ho, January 2024), tested ChatGPT-3.5, GPT-4, PaLM 2, and Llama 2 on verifiable legal questions drawn from federal court cases. The authors reported hallucination rates of 58% for GPT-4 and 88% for Llama 2 on specific, verifiable factual queries, and found that models often failed to recognize their own errors and sometimes reinforced false legal premises that the user supplied.
The second paper, "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools" (Varun Magesh, Faiz Surani, Matthew Dahl, Mirac Suzgun, Christopher D. Manning, and Daniel E. Ho), was released as a preprint in May 2024 and published in the Journal of Empirical Legal Studies in 2025. It tested the commercial retrieval-augmented systems on a set of more than 200 legal queries. The reported error rates:
| System | Hallucination rate |
|---|---|
| LexisNexis Lexis+ AI | 17% or more |
| Westlaw AI-Assisted Research | 33% or more (approximately twice Lexis+) |
| Thomson Reuters Ask Practical Law AI | refused to answer more than 60% of queries; lower accuracy when it did respond |
| GPT-4 (general-purpose baseline) | substantially higher than the retrieval systems |
The authors defined a hallucinated answer as one that was either factually wrong about the law or "misgrounded," meaning the response cited a real source but the source did not actually support the cited proposition. The paper concluded that retrieval grounding substantially reduces but does not eliminate legal hallucinations, and called for vendors to publish standardized accuracy benchmarks.
Vendors pushed back on methodology. Thomson Reuters argued the study tested an older version of its product and used adversarial questions; LexisNexis raised similar objections. Stanford updated the preprint in May 2024 to address some of the criticism, and the peer-reviewed version contains an expanded methods section. The numbers above are from the published version.
The single most cited legal-AI incident is Mata v. Avianca, where Manhattan attorneys submitted a brief containing six entirely fictitious decisions invented by ChatGPT, including "Varghese v. China Southern Airlines" and "Martinez v. Delta Air Lines." The case was the first to produce a written judicial sanctions order, and the form of the order, requiring the attorneys to mail copies to each of the real judges whose names ChatGPT had attached to fake opinions, became a template for later cases.
The Paris-based researcher Damien Charlotin maintains a public database of AI-hallucination cases that, by late 2025, contained more than 1,000 entries, with the rate of new entries reportedly running at two or three per day. A non-exhaustive selection of the better-known matters:
| Case | Court and year | Attorneys / parties | Outcome |
|---|---|---|---|
| Mata v. Avianca, Inc., 678 F. Supp. 3d 443 | S.D.N.Y. 2023 (Judge P. Kevin Castel) | Steven A. Schwartz and Peter LoDuca (Levidow, Levidow & Oberman) | 5,000 US dollar joint sanction; case dismissed; attorneys required to notify each real judge falsely listed as the author of an invented opinion |
| Park v. Kim, No. 22-2057 | 2d Cir. January 30, 2024 | Jae Lee | Referred to the Second Circuit Grievance Panel after citing the non-existent Matter of Bourguignon v. Coordinated Behavioral Health Servs. generated by ChatGPT |
| United States v. Cohen (supervised-release motion) | S.D.N.Y. 2023-2024 (Judge Jesse Furman) | Michael Cohen and David M. Schwartz | Cohen used Google Bard, which generated fake citations he forwarded to his lawyer; Judge Furman declined to sanction but called the conduct "embarrassing and certainly negligent" |
| Wadsworth v. Walmart, Inc. | D. Wyo. February 24, 2025 (Judge Kelly H. Rankin) | Rudwin Ayala, T. Michael Morgan (Morgan & Morgan), Taly Goody (Goody Law Group) | Motion in limine cited eight nonexistent cases; Ayala's pro hac vice status revoked and 3,000 US dollar fine; Morgan and Goody fined 1,000 US dollars each; the firm itself avoided sanctions after issuing firmwide AI guidance |
| Noland v. Land of the Free, L.P. (and related California appellate matter) | Cal. Ct. App. 2025 | Amir Mostafavi | 10,000 US dollar fine after 21 of 23 case quotations in an opening brief turned out to be fake; reported as the largest US sanction for ChatGPT hallucinations to that date |
| Various pro se filings tracked by Charlotin | 2023-2025, multiple US and foreign courts | Self-represented litigants and counsel | Outcomes range from warnings to strike orders to bar referrals |
A running pattern in these cases is the gap between an attorney's belief about what ChatGPT does (described in several affidavits as a "super search engine") and what it actually does (a probabilistic text generator that will produce plausible-looking citations on demand).
Courts and bar associations responded to Mata v. Avianca with new standing orders and ethics opinions.
The first judicial AI standing order was issued on May 30, 2023, by Judge Brantley Starr of the Northern District of Texas. Judge Starr required every attorney appearing before him to file a certificate stating either that no portion of any filing would be drafted by generative AI, or that any language drafted by generative AI would be checked for accuracy by a human. The order cited ChatGPT, Harvey.ai, and Google Bard by name and warned that "legal briefing is not" one of the appropriate uses for these tools.
By mid-2024, at least 39 federal judges (35 district court, three bankruptcy, one Court of International Trade) had issued some form of AI-related standing order, with the Northern District of California, Central District of California, and Northern District of Illinois leading by judge count. Approaches diverged: some required disclosure, some required certification of human review, and at least one (Judge Newman of the Southern District of Ohio) flatly forbade any use of AI in filings.
State court rules emerged more slowly. The Texas State Bar's Committee on Disciplinary Rules and Referenda, the Pennsylvania Bar, and the New York State Bar Association all issued guidance documents on generative AI use in 2024. The Florida Bar issued Advisory Opinion 24-1 in January 2024 on lawyers' use of generative AI.
The most influential ethics document is ABA Formal Opinion 512, published by the American Bar Association's Standing Committee on Ethics and Professional Responsibility on July 29, 2024. It was the ABA's first formal opinion on generative AI. The opinion mapped lawyer obligations onto six existing Model Rules:
A notable concrete example in the opinion: if a lawyer uses a generative AI tool to draft a pleading and spends 15 minutes preparing inputs, the lawyer may bill for those 15 minutes and for the time spent reviewing the draft for accuracy.
The core concerns flagged repeatedly by the ABA, state bars, and academic commentators include: