SEO

See also: SEO ChatGPT Plugins

Search engine optimization (SEO) is the practice of preparing websites and other content so that search engines surface them in response to user queries. The discipline has been bound up with artificial intelligence since at least 2015, when Google began ranking results using a neural network called RankBrain. Since the public release of ChatGPT in late 2022 and the subsequent launch of generative search products such as Microsoft Copilot, Perplexity AI, Google AI Overviews and ChatGPT Search, AI has moved from a hidden ranking signal to the visible front end of search itself. This article covers how search engines apply AI to ranking, the AI-powered search engines that compete with Google, the tools SEO practitioners use, Google's policies on AI-generated content, the impact of generative search on publisher traffic, and the major research benchmarks that underpin the field.

Overview

Classical SEO grew up around keyword matching, link analysis (notably Google's PageRank, introduced in 1998), and on-page signals such as titles, headings and metadata. From the early 2010s onward, search engines layered machine learning on top of that infrastructure to handle long-tail queries, ambiguous language and conversational phrasing. Google's Hummingbird rewrite in 2013 was a step in that direction, but the bigger break came with RankBrain in 2015, followed by neural matching in 2018, BERT in 2019 and the Multitask Unified Model (MUM) in 2021 (Google Search Central blog; Search Engine Land).

The arrival of generative AI changed two things at once. First, search engines began returning written answers rather than ten blue links, blending information from many sources into a single summary. Second, content creators acquired their own AI tools, including large language models for drafting and rewriting copy, embedding models for clustering pages, and software that audits sites against machine-learning ranking signals. The result is a market where AI sits on both sides of the search box, and where the question of what counts as legitimate optimization has been actively renegotiated.

A new vocabulary has grown up around this shift. Generative engine optimization (GEO) refers to the practice of improving how often a page is cited inside an AI summary. Answer engine optimization (AEO) and large language model optimization (LLMO) are related terms; some practitioners treat them as synonyms and others as distinct sub-disciplines (Semrush blog; Conductor; Ad Age encyclopedia).

History of AI in search

PageRank and the pre-AI era

Google launched in 1998 with PageRank, a link-analysis algorithm developed by Larry Page and Sergey Brin at Stanford. PageRank treated each hyperlink as a vote, and ranked pages by recursively weighting those votes. For roughly the next decade, SEO was largely an exercise in keyword matching and link acquisition.

Hummingbird (2013)

Google's Hummingbird update was launched in August 2013 and announced at a press event in September of the same year. Matt Cutts, then head of Google's webspam team, described it as a rewrite of the core algorithm, the first such rewrite since 2001 (Search Engine Journal; Wikipedia, "Google Hummingbird"). Hummingbird was designed to handle longer, more conversational queries by reading the meaning of a sentence rather than matching keywords one by one. It set the stage for the natural language processing work that followed but did not yet involve a learned neural network.

RankBrain (2015)

In an interview with Bloomberg on 26 October 2015, Google senior research scientist Greg Corrado disclosed the existence of RankBrain, a deep learning system that had been quietly applied to search since spring of that year. Corrado called RankBrain the "third most important signal" in the ranking algorithm, after links and content (Bloomberg, "Google Turning Its Lucrative Web Search Over to AI Machines," 2015-10-26). At launch it handled around 15 percent of queries, specifically those Google had never seen before. RankBrain works by mapping words and phrases to vectors, a technique closely related to Word2Vec (Search Engine Land FAQ on RankBrain; Wikipedia, "RankBrain").

Neural matching (2018)

Google introduced neural matching in 2018 and confirmed it that September. The system uses learned representations to connect queries with documents that do not share exact keywords. A canonical example given by Google's search liaison was that a query like "why does my TV look strange" can now be matched to pages about the "soap opera effect." Google said neural matching influenced about 30 percent of queries when it rolled out (Search Engine Land; Search Engine Journal). It was extended to local search results in late 2019.

BERT (2019)

On 25 October 2019 Google announced that it had begun applying BERT, a transformer model published by Google AI in 2018, to search ranking. Pandu Nayak, then Google's vice president of search, called it "the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of Search" (Google Search Central blog, 2019-10-25). BERT initially affected about one in ten English queries in the United States and was particularly useful for queries where small words such as "for" or "to" carry significant meaning. Google rolled BERT out to seventy more languages in December 2019.

MUM (2021)

Google introduced the Multitask Unified Model, or MUM, at Google I/O on 18 May 2021. Nayak described MUM as 1,000 times more powerful than BERT, trained on 75 languages and capable of handling multimodal input. The published example involved a hiker asking how their experience on Mount Adams should inform preparation for a trip to Mount Fuji, a question requiring synthesis across language, geography and gear (Google blog, "MUM: A new AI milestone for understanding information," 2021-05-18). MUM is built on the T5 text-to-text framework. Public-facing applications have been narrower than the initial announcement suggested, including a vaccine-name normalization feature and improvements to image search.

Search Generative Experience and AI Overviews (2023 to 2024)

At Google I/O on 10 May 2023 Google previewed the Search Generative Experience (SGE), an opt-in feature in Search Labs that placed an AI-written answer above the traditional list of results. By November 2023 the SGE beta had reached more than 120 countries, although the United Kingdom and the European Union were notable exceptions (Search Engine Land; Google blog).

At Google I/O on 14 May 2024 Google rolled the feature out to the general United States audience and renamed it AI Overviews. The first weeks were difficult. AI Overviews recommended adding non-toxic glue to pizza sauce, advised users to eat at least one small rock a day, and suggested that smoking while pregnant could be beneficial. Reporters traced the first answer to a Reddit comment from 2013 and the second to a 2021 article by the satirical site The Onion (Live Science, 2024-05-23; Bloomberg, 2024-05-30; UNSW Newsroom, 2024-05-29). In a 30 May 2024 blog post Liz Reid, Google's head of search, attributed the errors to "data voids" and said Google had made "more than a dozen technical improvements" including reducing reliance on user-generated content and limiting the inclusion of satire.

Despite the rocky launch, Google continued to expand AI Overviews. By October 2024 the feature was live in more than 100 countries and by May 2025 it covered over 200 countries and territories in 40-plus languages, including German, Italian, Polish, Arabic, Chinese, Malay and Urdu (Google blog, "AI Overviews are now available in over 200 countries and territories," 2025-05-20).

AI Mode (2025)

Google introduced AI Mode, a deeper conversational interface that runs alongside AI Overviews, on 6 March 2025. The feature lets users follow up with multi-turn questions and integrates Gemini 2.5 and later Gemini 3 models (Google blog; Search Engine Land). AI Mode launched to general United States users on 5 June 2025.

AI-powered search engines

The second wave of AI in search came from challengers who built generative answers into the core product rather than as a layer on top. The table below covers the most prominent ones.

Product	Launched	Underlying model(s)	Parent	Notes
Microsoft Copilot (originally "the new Bing")	7 February 2023	GPT-4 (confirmed March 2023), later GPT-4o	Microsoft	Rebranded from Bing Chat to Copilot on 15 November 2023; separated from Bing on 1 October 2024
You.com YouChat	23 December 2022	GPT-3.5 then later proprietary	You.com	Among the first conversational search interfaces with live web citations
Brave Search Summarizer	2 March 2023	Brave's own LLMs (three-model pipeline)	Brave Software	Free, privacy-oriented; later expanded into "Answer with AI"
Kagi	Public beta 2022	Mix of GPT, Claude and others	Kagi Inc.	Subscription, ad-free; includes Summarize Results and Universal Summarizer features
Perplexity AI	August 2022	Mix of GPT, Claude, Sonar (Llama-based)	Perplexity AI Inc.	Reached a reported $9 billion valuation in December 2024 after a $500 million round (CNBC, 2024-12)
SearchGPT prototype	25 July 2024	GPT-4o	OpenAI	Initially limited to 10,000 waitlist users; folded into ChatGPT Search
ChatGPT Search	31 October 2024	GPT-4o with retrieval	OpenAI	Released to Plus and Team users; rolled out to the free tier in December 2024
Google AI Overviews	14 May 2024 (US general availability)	Gemini (custom variant)	Google	Available in 200+ countries by May 2025
Google AI Mode	6 March 2025 (Labs); 5 June 2025 (US GA)	Gemini 2.5, later Gemini 3	Google	Multi-turn conversational mode

The valuation jumps are worth keeping in mind when reading about this market. Perplexity raised approximately $900 million across four rounds in 2024 with backers including Jeff Bezos and Nvidia, reaching a $9 billion valuation by December 2024 and a reported $14 to $20 billion valuation in 2025 according to CNBC and FinTech Weekly reporting.

Generative engine optimization

Generative engine optimization, abbreviated GEO, is the practice of structuring content so that an LLM-powered search system is more likely to cite it inside a generated answer. The term was coined by a 2023 paper from researchers at Princeton, the Allen Institute for AI, Georgia Tech and IIT Delhi.

The 2023 Princeton paper

In November 2023 Pranjal Aggarwal and co-authors published "GEO: Generative Engine Optimization" on arXiv. The paper introduced a benchmark called GEO-bench, covering more than 10,000 user queries across nine domains, and tested nine optimization strategies against it. The authors reported that the best methods could boost a source's visibility in generative engine responses by up to 40 percent on their position-adjusted word count metric (Aggarwal et al., 2023, arXiv:2311.09735; ACM SIGKDD 2024 proceedings).

The most effective interventions in the Princeton study were not those that look like classical SEO. Adding statistics raised visibility by roughly 41 percent on average and adding direct quotations from authoritative sources raised it by about 28 percent. Improving fluency and readability gave a 15 to 30 percent boost. Citation addition alone underperformed in isolation, but combining it with other strategies produced an average lift of 31.4 percent. Keyword stuffing, by contrast, did not help.

How GEO differs from classical SEO

Classical SEO targets the ranking of a single URL on a results page. GEO targets the probability that a passage from a URL will be quoted or summarized inside an AI answer. The practical implications follow from that shift. A page that ranks tenth might still be cited if it contains a clear definitional sentence or a striking statistic. A page that ranks first might be ignored by the model if its content is buried under interstitials or written in promotional prose that the model is reluctant to quote.

AEO and LLMO

Answer engine optimization (AEO) and large language model optimization (LLMO) are closely related terms. Practitioners typically use AEO when they care about featured snippets and answer-style results across both classical and generative engines, and LLMO when they care about how a brand is represented inside models that draw on pretraining rather than live retrieval, such as base ChatGPT without browsing. Some firms treat GEO as a subset of AEO that focuses specifically on retrieval-augmented generation systems like AI Overviews and Perplexity (Conductor; Semrush blog; Ad Age encyclopedia).

AI tools used by SEO practitioners

AI sits inside almost every modern SEO tool, from keyword research to technical audits. The table below lists the most widely used ones and what they do.

Tool	Primary use	AI features
Surfer SEO	Content optimization scoring	NLP-driven scoring against top-ranking pages; built-in AI writer
Clearscope	Content grading	Grades drafts from A++ to F based on topical coverage relative to competitors
MarketMuse	Topical authority planning	Content gap analysis, topic cluster identification
Frase	Content briefs and drafting	AI brief generator, AI writing assistant
Jasper	Marketing content generation	Brand voice tuning, integrates with Surfer SEO for optimization
Writesonic	Article generation	Generates SEO-friendly long-form articles from a prompt and target keywords
Semrush	Keyword research, AI visibility tracking	AI Toolkit tracks brand citations across ChatGPT, AI Overviews and Perplexity; Keyword Magic uses AI to score topical relevance
Ahrefs	Backlinks and keyword research	Brand Radar tracks citations across LLM answers; AI-assisted keyword suggestions
Screaming Frog SEO Spider	Site crawling and audits	Versions 20 through 22 add direct integrations with OpenAI, Gemini, Anthropic and Ollama APIs for custom prompts during crawls; semantic similarity analysis added in 22.0
Originality.ai	AI text detection, plagiarism	Trained on outputs from major LLMs; reports false positive rates under 1 percent in its Lite model
GPTZero	AI text detection	Self-reports 99.3 percent overall accuracy and a 0.24 percent false positive rate on its 3,000-sample internal benchmark
Schema App, Milestone, Alli AI	Structured data generation	Generate JSON-LD schema markup automatically from page content

A few notes on how these tools are actually used. Surfer SEO claims 150,000-plus active users in 2026 and pairs particularly well with Jasper for end-to-end drafting and optimization. MarketMuse positions itself further upstream, doing topic cluster planning rather than per-page scoring. Screaming Frog's 22.0 release in 2024 introduced content cluster diagrams that group crawled URLs by embedding similarity, which is a fairly direct application of vector embedding techniques to technical SEO (Screaming Frog blog, 2024; SEO Hacker, 2024).

AI content detectors deserve their own warning. Both Originality.ai and GPTZero have been shown to flag entirely human-written text. Independent testing has reported false positives on the United States Constitution, on excerpts from "The Da Vinci Code," and on academic prose by non-native English speakers. GPTZero's own published numbers, which look strong, come from its internal benchmark; PCWorld testing has measured accuracy as low as 62 percent in other conditions (Originality.ai blog on false positives; HasteWire "GPTZero Limitations"). The practical takeaway from researchers and operators is that AI detection results are at best a signal, not proof.

Google policies on AI-generated content

Google's public position on AI-generated content has shifted in important ways over time. Three updates set the current rules.

Helpful Content Update, August 2022

Google rolled out the first Helpful Content Update on 25 August 2022, completing on 9 September 2022. The original guidance directed creators to make content "by people, for people," framing the policy as anti-spam rather than explicitly anti-AI (Google Search Central blog, 2022-08). At this stage Google had not yet drawn a clean line between AI authorship and human authorship.

Removal of "by people" language, September 2023

The September 2023 Helpful Content Update was officially called a "large update" by Google and rolled out from 14 September 2023. Independent analysis by digital marketers and travel publishers, including a study by TourScanner of 671 sites, found that 32 percent of affected travel publishers lost more than 90 percent of their organic traffic after the update (TourScanner analysis, 2024). Around this time Google quietly revised the original guidance and removed the "by people" phrasing, leaving only "for people." That edit clarified Google's view that AI-generated content can be acceptable so long as it is genuinely useful (Search Engine Land; Search Engine Roundtable).

March 2024 core update and new spam policies

On 5 March 2024 Google announced its largest single change to spam policy in years. The March 2024 core update bundled the helpful content system into core ranking and introduced three new spam policies: scaled content abuse, site reputation abuse, and expired domain abuse (Google Search Central blog, "What web creators should know about our March 2024 core update and new spam policies," 2024-03-05). Google's stated goal was to reduce "low-quality, unoriginal content" by 40 percent. After the update Google said the actual reduction was closer to 45 percent.

The scaled content abuse policy specifically targets pages "generated for the primary purpose of manipulating Search rankings." Google's language was deliberate: it applies "regardless of whether content is produced through automation, human efforts, or a combination." In other words, AI-generated content is not spam by definition, but mass-produced low-value AI content for ranking is. Google search liaison Danny Sullivan reiterated this point in several conference talks during 2024, telling attendees that anyone who reads the policy as a free pass for AI "should really read it again."

Site reputation abuse update, November 2024

On 19 November 2024 Google updated the site reputation abuse policy, sometimes called the "parasite SEO" policy, to close a loophole. The original March 2024 version had targeted third-party content hosted on a trusted site to take advantage of its rankings. The November update extended the policy to cover situations where the host site has "first-party involvement or oversight," meaning that white-label arrangements and editorial review do not exempt the content (Google Search Central blog, 2024-11-19). The European Commission opened a preliminary review of the policy in 2025 after complaints from publishers (Search Engine Land, 2025).

Note on enforcement

Google has emphasized that the quality rater guidelines, including E-E-A-T (Experience, Expertise, Authoritativeness and Trustworthiness), do not directly drive rankings. The second "E" for Experience was added in December 2022 specifically to capture first-hand expertise, the kind of signal a model trained on summarized content might struggle to fake (Google Search Central blog, 2022-12). Quality raters use the guidelines to score sample results, and Google then trains its systems to align with what raters prefer.

Impact on publishers and traffic

The most contested question in SEO from 2024 onward has been what AI summaries do to clicks. Several independent studies have looked at this question, and the numbers are consistent in direction even when they differ in magnitude.

Pew Research, 2025

The Pew Research Center analyzed 68,879 Google searches by 900 American adults during March 2025. Pew reported that users clicked on a traditional result link in 8 percent of searches when an AI summary appeared, compared with 15 percent of searches when it did not (Pew Research Center, "Do people click on links in Google AI summaries?", 2025-07-22). Clicks on links inside the AI summary itself ran at 1 percent of visits. Twenty-six percent of sessions ended after an AI summary, versus 16 percent of sessions without one. About 18 percent of all Google searches in the dataset produced an AI summary. Wikipedia, Reddit and YouTube were the most-cited sources in both AI summaries and traditional results.

Google publicly disputed the Pew methodology, arguing that the sample was small and that the company's own measurements showed AI Overviews driving more diverse click destinations rather than fewer total clicks (PPC Land, 2025).

Ahrefs, 2024 to 2025

Ahrefs found that the presence of an AI Overview correlates with a 58 percent lower average clickthrough rate on the top-ranking page (Ahrefs blog, "AI Overviews Reduce Clicks by 58%"). Their methodology compared identical queries before and after AI Overview presence.

Seer Interactive, 2025

A September 2025 Seer Interactive analysis reported organic CTR for queries with AI Overviews dropping from 1.76 percent to 0.61 percent, a 61 percent relative decline. Paid CTR fell from 19.7 percent to 6.34 percent, or about 68 percent. The eMarketer-cited number of 34.5 percent represents an alternate calculation across a broader sample (Search Engine Land, 2025; eMarketer, 2025).

Similarweb, 2024 to 2025

Similarweb tracked the share of "zero-click" Google searches and reported a rise from 56 percent in May 2024 to 69 percent in May 2025. Publisher organic traffic in their dataset dropped from more than 2.3 billion monthly visits in mid-2024 to under 1.7 billion by May 2025. The same dataset shows ChatGPT referrals to news sites growing from under one million between January and May 2024 to more than 25 million in the same months of 2025, a roughly 25-fold increase. Even at that growth rate, AI-platform referrals accounted for about 1 percent of total publisher traffic, which is not enough to offset losses from classical search (Similarweb, "Report: The Impact of Generative AI on Publishers," 2025; AdExchanger, 2025; TechCrunch, 2025-07-02).

Named publishers

A few specific cases have been documented in industry coverage. Business Insider's organic search traffic fell about 55 percent between April 2022 and April 2025, and the publisher cut 21 percent of its staff in May 2025 (AdExchanger). HuffPost reported losing about half of its search referrals over the same period. The New York Times saw search's share of its desktop and mobile traffic decline from 44 percent in 2022 to 37 percent in 2025. The Daily Mail experienced an 89 percent desktop CTR drop and 87 percent mobile drop when an AI Overview appeared above a visible link (Search Engine Journal). HubSpot's organic traffic reportedly collapsed from 13.5 million to about 6 to 7 million monthly visits between late 2024 and early 2025, although HubSpot has attributed this partly to its own content strategy changes.

A Columbia Journalism Review piece in 2025 summarized the cumulative effect by calling AI Overviews a "traffic apocalypse" for news sites. Whether the long-term picture is that dramatic is still being argued.

Detection of AI content

As the volume of AI-generated text on the web has grown, several firms have built classifiers that try to flag it. The market settled on two leaders, Originality.ai and GPTZero, alongside Copyleaks and a handful of academic projects.

Originality.ai launched in 2022 and is marketed mainly to publishers and content agencies. The company reports false positive rates under 1 percent in its Lite model and under 3 percent in its Turbo model. GPTZero, founded by Princeton student Edward Tian in early 2023, reports 99.3 percent overall accuracy and a 0.24 percent false positive rate on a 3,000-sample internal benchmark (GPTZero blog).

The weakness of these tools is not in their best-case numbers but in their failure modes. Independent testing has produced false positives on the United States Constitution, the Bible, "The Da Vinci Code" and student essays by non-native English speakers. A 2023 Stanford study, "GPT detectors are biased against non-native English writers," by Liang and colleagues, found that several detectors classified the writing of non-native English speakers as AI-generated more than half the time, even when the writing was entirely human. The pattern matters in education and admissions contexts, where a false positive can have real consequences.

The practical consensus among researchers is that AI-text detection is unreliable enough that it cannot be the sole basis for accusations of misconduct. GPTZero itself recommends that flagged text be the start of a conversation, not the end of one.

Notable research papers and benchmarks

The academic infrastructure for modern search is built on a handful of standard datasets and benchmarks. They are worth knowing because most of the ranking systems used in production are trained or evaluated on at least one of them.

MS MARCO

MS MARCO, short for Microsoft Machine Reading Comprehension, was released by Microsoft in 2016 with an initial paper at NIPS. The passage ranking subset contains about 8.8 million passages and around one million question queries derived from Bing search logs. MS MARCO is the standard for benchmarking dense retrieval and reranking models; evaluation typically uses MRR@10 (Microsoft Research, "MS MARCO: Benchmarking Ranking Models in the Large-Data Regime," 2021).

BEIR

BEIR (Benchmarking-IR) was introduced in 2021 by a team at the Technical University of Darmstadt, the Hugging Face team and others. BEIR aggregates 18 publicly available datasets, including scientific papers, financial documents, the COVID-19 TREC track, fact-checking, and several question-answering corpora. Its main contribution is a focus on zero-shot evaluation, asking how well a retrieval model performs in a domain it has not seen during training (Thakur et al., 2021, arXiv:2104.08663). One headline finding from the original BEIR paper is that BM25, an unsupervised lexical baseline introduced in 1994, holds up remarkably well against neural models in zero-shot conditions.

MTEB

The Massive Text Embedding Benchmark (MTEB), released in 2022 by researchers at Cohere, Hugging Face and elsewhere, expands the evaluation scope from retrieval to eight task categories including classification, clustering, reranking and semantic textual similarity. MTEB reuses many of the BEIR retrieval datasets. As of late 2024 it covers more than 56 tasks across 112 languages and is the most widely cited leaderboard for general-purpose text embedding models.

The GEO paper

The Aggarwal et al. paper discussed earlier in this article (arXiv:2311.09735) introduces GEO-bench, a benchmark of over 10,000 queries across nine domains. It is the first published benchmark specifically aimed at measuring source visibility inside generative engine answers rather than ranking on a results page. The paper was published at ACM SIGKDD in August 2024.

Other notable papers

Several other papers come up frequently in SEO research discussions. Devlin et al.'s original BERT paper ("BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," 2018) is the foundation for most modern dense retrieval. Karpukhin et al., "Dense Passage Retrieval for Open-Domain Question Answering," 2020, established the dual-encoder paradigm that drives most production retrieval systems. Lewis et al.'s "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," 2020, introduced what is now usually called retrieval-augmented generation, the technical pattern underlying most AI-powered search engines.

Challenges and controversies

The May 2024 AI Overviews errors

The glue-on-pizza incident is the most-discussed failure in AI search to date. Within days of the United States launch of AI Overviews on 14 May 2024, screenshots circulated of Google's AI suggesting non-toxic glue as a pizza sauce additive, recommending one small rock per day for vitamins, and asserting that smoking during pregnancy could be beneficial. Reporters at Live Science, The Washington Post and Bloomberg traced the origins to a 2013 joke Reddit comment, a 2021 article by The Onion titled "Geologists Recommend Eating At Least One Small Rock Per Day," and similar low-signal sources (Live Science, 2024-05-23; Bloomberg, 2024-05-30). The Washington Post reported that the system had no reliable way to weight comedic sources against authoritative ones for queries with sparse coverage.

Liz Reid, head of Google Search, addressed the incident in a 30 May 2024 blog post. She wrote that some of the most-shared screenshots were faked, but acknowledged that real errors had occurred. Google rolled out twelve technical fixes, including detection of nonsensical queries, reduced reliance on user-generated content, restrictions on satire sources and limits on AI Overviews for health queries.

The HCU controversy

The September 2023 Helpful Content Update has been a sustained source of complaint from small publishers. TourScanner's analysis of 671 travel sites found 32 percent lost more than 90 percent of their organic traffic. A widely cited piece by hobo-web.co.uk argued that Google's recovery rate is low: of sites in the affected sample, only about one-third recovered any meaningful portion of their pre-update traffic, and most still sit well below their pre-September 2023 baseline (Hobo Web, 2024).

Google has not formally retracted the HCU. Danny Sullivan, while serving as Google's search liaison until his departure on 1 August 2025, repeatedly told publishers that the right response was to focus on quality rather than chase the algorithm. Many publishers found that answer unsatisfying, particularly when search results increasingly favored Reddit and large brand sites.

Lawsuits over content use

The rise of AI search engines has produced a wave of copyright litigation. In October 2024 The New York Times sued Perplexity AI for what the complaint described as "verbatim or near-verbatim" reproduction of Times content. In the same month Dow Jones and the New York Post, both owned by News Corporation, also sued Perplexity for what their complaint called "massive freeriding." Perplexity has disputed the framing in public statements (Variety, 2024-10-21; CNN Business, 2024-10-21; TheWrap, 2024).

These cases are still working through United States federal court as of 2025 and 2026 and may set the terms for how AI search engines train on and surface content from major publishers.

Reliability of AI detection

AI detection tools have been widely deployed in academic and editorial settings but have repeatedly produced false positives on human-written text. The Liang et al. 2023 Stanford paper on detector bias against non-native English speakers is the most-cited evidence of the problem. Stanford's findings, combined with the Constitution and Da Vinci Code anecdotes, have led several universities, including Vanderbilt, to publicly disable Turnitin's AI-detection feature in 2023 after concluding it could not be used fairly.

Erosion of the open web

A broader question, raised by publishers and by some search-industry analysts, is whether AI summaries are sustainable in the long run. The simplest version of the argument is that if AI search engines reduce visits to the underlying source pages by a third or more, those source pages eventually become less viable to produce. If publishers stop producing, the AI engines lose their training and retrieval material. Whether that feedback loop is real or fixable is the central open question of the current SEO moment, and the answers from Google, OpenAI and publishers have differed sharply.

References

Clark, Jack. "Google Turning Its Lucrative Web Search Over to AI Machines." Bloomberg, 26 October 2015. https://www.bloomberg.com/news/articles/2015-10-26/google-turning-its-lucrative-web-search-over-to-ai-machines
Wikipedia contributors. "RankBrain." https://en.wikipedia.org/wiki/RankBrain
Sullivan, Danny. "FAQ: All about the Google RankBrain algorithm." Search Engine Land, 23 June 2016. https://searchengineland.com/faq-all-about-the-new-google-rankbrain-algorithm-234440
Nayak, Pandu. "Understanding searches better than ever before." Google Search Central blog, 25 October 2019. https://blog.google/products-and-platforms/products/search/search-language-understanding-bert/
Schwartz, Barry. "Welcome BERT: Google's latest search algorithm to better understand natural language." Search Engine Land, 25 October 2019. https://searchengineland.com/welcome-bert-google-artificial-intelligence-for-understanding-search-queries-323976
Nayak, Pandu. "MUM: A new AI milestone for understanding information." Google blog, 18 May 2021. https://blog.google/products-and-platforms/products/search/introducing-mum/
Wikipedia contributors. "Google Hummingbird." https://en.wikipedia.org/wiki/Google_Hummingbird
Google Search Central. "What web creators should know about our March 2024 core update and new spam policies." 5 March 2024. https://developers.google.com/search/blog/2024/03/core-update-spam-policies
Schwartz, Barry. "Google released massive search quality improvements with the March 2024 core update." Search Engine Land. https://searchengineland.com/google-released-massive-search-quality-improvements-with-march-2024-core-update-and-multiple-spam-updates-438144
Google Search Central. "Our latest update to the quality rater guidelines: E-A-T gets an extra E for Experience." December 2022. https://developers.google.com/search/blog/2022/12/google-raters-guidelines-e-e-a-t
Google Search Central. "What creators should know about Google's August 2022 helpful content update." 18 August 2022. https://developers.google.com/search/blog/2022/08/helpful-content-update
Google Search Central. "Updating our site reputation abuse policy." 19 November 2024. https://developers.google.com/search/blog/2024/11/site-reputation-abuse
Reid, Liz. "AI Overviews: About last week." Google blog, 30 May 2024. https://blog.google/products/search/ai-overviews-update-may-2024/
Pichai, Sundar (presented), and Reid, Liz. "Generative AI in Search: Let Google do the searching for you." Google blog, 14 May 2024. https://blog.google/products/search/generative-ai-google-search-may-2024/
Google blog. "AI Overviews are now available in over 200 countries and territories, and more than 40 languages." 20 May 2025. https://blog.google/products-and-platforms/products/search/ai-overview-expansion-may-2025-update/
Wikipedia contributors. "AI Overviews." https://en.wikipedia.org/wiki/AI_Overviews
Pichai, Sundar et al. "Google brings Gemini 3 AI model to Search and AI Mode." Google blog. https://blog.google/products-and-platforms/products/search/gemini-3-search-ai-mode/
Aggarwal, Pranjal et al. "GEO: Generative Engine Optimization." arXiv, November 2023. https://arxiv.org/abs/2311.09735
ACM SIGKDD 2024. "GEO: Generative Engine Optimization." Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. https://dl.acm.org/doi/10.1145/3637528.3671900
Thakur, Nandan et al. "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models." arXiv:2104.08663, 2021. https://arxiv.org/abs/2104.08663
Bajaj, Payal et al. "MS MARCO: A Human Generated MAchine Reading COmprehension Dataset." NIPS 2016 and subsequent work; see https://microsoft.github.io/msmarco/
Wahle, Jan Philip et al. on text embedding evaluation (MTEB); see https://huggingface.co/spaces/mteb/leaderboard
Liang, Weixin et al. "GPT detectors are biased against non-native English writers." Stanford / Cell Patterns, 2023. https://arxiv.org/abs/2304.02819
Live Science. "Google's AI tells users to add glue to their pizza, eat rocks and make chlorine gas." 23 May 2024. https://www.livescience.com/technology/artificial-intelligence/googles-ai-tells-users-to-add-glue-to-their-pizza-eat-rocks-and-make-chlorine-gas
Bloomberg Opinion. "Pizza Glue? Small Rocks? Google AI Overview Answers Are a Mess." 30 May 2024. https://www.bloomberg.com/opinion/articles/2024-05-30/pizza-glue-small-rocks-google-ai-overview-answers-are-a-mess
UNSW Newsroom. "Eat a rock a day, put glue on your pizza: how Google's AI is losing touch with reality." May 2024. https://www.unsw.edu.au/newsroom/news/2024/05/eat-a-rock-a-day-put-glue-on-your-pizza-how-googles-ai-is-losing-touch-with-reality
Pew Research Center. "Do people click on links in Google AI summaries?" 22 July 2025. https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/
Patel, Patrick. "AI Overviews Reduce Clicks by 58%." Ahrefs blog. https://ahrefs.com/blog/ai-overviews-reduce-clicks-update/
Seer Interactive. "AIO Impact on Google CTR: September 2025 Update." https://www.seerinteractive.com/insights/aio-impact-on-google-ctr-september-2025-update
Similarweb. "Report: The Impact of Generative AI on Publishers." 2025. https://www.similarweb.com/corp/reports/generative-ai-publishers/
TechCrunch. "ChatGPT referrals to news sites are growing, but not enough to offset search declines." 2 July 2025. https://techcrunch.com/2025/07/02/chatgpt-referrals-to-news-sites-are-growing-but-not-enough-to-offset-search-declines/
Schwartz, Barry. "Google AI Overviews drive 61% drop in organic CTR, 68% in paid." Search Engine Land. https://searchengineland.com/google-ai-overviews-drive-drop-organic-paid-ctr-464212
AdExchanger. "The AI Search Reckoning Is Dismantling Open Web Traffic." https://www.adexchanger.com/publishers/the-ai-search-reckoning-is-dismantling-open-web-traffic-and-publishers-may-never-recover/
Variety. "Dow Jones and New York Post Sue AI Startup Perplexity, Alleging 'Massive' Copyright Infringement." 21 October 2024. https://variety.com/2024/biz/news/news-corp-dow-jones-ny-post-sue-perplexity-copyright-infringement-1236184900/
CNN Business. "Rupert Murdoch's news outlets sue Perplexity AI for allegedly engaging in 'massive freeriding'." 21 October 2024. https://www.cnn.com/2024/10/21/media/rupert-murdoch-dow-jones-perplexity-lawsuit
TheWrap. "New York Times Sues Perplexity AI, Alleges Copyright Violations." https://www.thewrap.com/new-york-times-perplexity-ai-lawsuit/
CNBC. "Perplexity AI wrapping talks to raise $500 million at $14 billion valuation." 12 May 2025. https://www.cnbc.com/2025/05/12/perplexity-funding-round-comet.html
Wikipedia contributors. "Perplexity AI." https://en.wikipedia.org/wiki/Perplexity_AI
OpenAI. "Introducing ChatGPT search." 31 October 2024. https://openai.com/index/introducing-chatgpt-search/
OpenAI. "SearchGPT Prototype." 25 July 2024. https://openai.com/index/searchgpt-prototype/
TechCrunch. "Microsoft launches the new Bing, with ChatGPT built in." 7 February 2023. https://techcrunch.com/2023/02/07/microsoft-launches-the-new-bing-with-chatgpt-built-in/
Bing Search Blog. "Confirmed: the new Bing runs on OpenAI's GPT-4." March 2023. https://blogs.bing.com/search/march_2023/Confirmed-the-new-Bing-runs-on-OpenAI%E2%80%99s-GPT-4
Wikipedia contributors. "Microsoft Copilot." https://en.wikipedia.org/wiki/Microsoft_Copilot
TechCrunch. "Brave Search launches an AI-powered summarization feature." 2 March 2023. https://techcrunch.com/2023/03/02/brave-search-launches-an-ai-powered-summarization-feature/
Voicebot.ai. "New YouChat Chatbot Offers ChatGPT-Style Generative AI Search Engine." 28 December 2022. https://voicebot.ai/2022/12/28/new-youchat-chatbot-offers-chatgpt-style-generative-ai-search-engine/
Wikipedia contributors. "Kagi." https://en.wikipedia.org/wiki/Kagi
Originality.ai. "AI Content Detector False Positives." https://originality.ai/blog/ai-content-detector-false-positives
GPTZero. "GPTZero vs Copyleaks vs Originality." https://gptzero.me/news/gptzero-vs-copyleaks-vs-originality/
HasteWire. "GPTZero Limitations: Accuracy Issues & False Positives." https://hastewire.com/blog/gptzero-limitations-accuracy-issues-and-false-positives
Screaming Frog. "Screaming Frog SEO Spider Update, Version 21.0." https://www.screamingfrog.co.uk/blog/seo-spider-21/
Hobo Web. "HCU/SPAM/CORE Google Algorithm Impact Analysis Part 1, The Victims." https://www.hobo-web.co.uk/hcu-spam-core-google-algorithm-impact-analysis-part-1-the-victims/
TourScanner. "The Irony of 'Helpful'." https://tourscanner.com/google-helpful-content-update-impact-on-independent-publishers
Wikipedia contributors. "Generative engine optimization." https://en.wikipedia.org/wiki/Generative_engine_optimization
Semrush blog. "What Is Answer Engine Optimization? And How to Do It." https://www.semrush.com/blog/answer-engine-optimization/
Search Engine Land. "Google's neural matching versus RankBrain." https://searchengineland.com/googles-neural-matching-versus-rankbrain-how-google-uses-each-in-search-314311

Overview

History of AI in search

PageRank and the pre-AI era

Hummingbird (2013)

RankBrain (2015)

Neural matching (2018)

BERT (2019)

MUM (2021)

Search Generative Experience and AI Overviews (2023 to 2024)

AI Mode (2025)

AI-powered search engines

Generative engine optimization

The 2023 Princeton paper

How GEO differs from classical SEO

AEO and LLMO

AI tools used by SEO practitioners

Google policies on AI-generated content

Helpful Content Update, August 2022

Removal of "by people" language, September 2023

March 2024 core update and new spam policies

Site reputation abuse update, November 2024

Note on enforcement

Impact on publishers and traffic

Pew Research, 2025

Ahrefs, 2024 to 2025

Seer Interactive, 2025

Similarweb, 2024 to 2025

Named publishers

Detection of AI content

Notable research papers and benchmarks

MS MARCO

BEIR

MTEB

The GEO paper

Other notable papers

Challenges and controversies

The May 2024 AI Overviews errors

The HCU controversy

Lawsuits over content use

Reliability of AI detection

Erosion of the open web

See also

References

Improve this article

Overview

History of AI in search

PageRank and the pre-AI era

Hummingbird (2013)

RankBrain (2015)

Neural matching (2018)

BERT (2019)

MUM (2021)

Search Generative Experience and AI Overviews (2023 to 2024)

AI Mode (2025)

AI-powered search engines

Generative engine optimization

The 2023 Princeton paper

How GEO differs from classical SEO

AEO and LLMO

AI tools used by SEO practitioners

Google policies on AI-generated content

Helpful Content Update, August 2022

Removal of "by people" language, September 2023

March 2024 core update and new spam policies

Site reputation abuse update, November 2024

Note on enforcement

Impact on publishers and traffic

Pew Research, 2025

Ahrefs, 2024 to 2025

Seer Interactive, 2025

Similarweb, 2024 to 2025

Named publishers

Detection of AI content

Notable research papers and benchmarks

MS MARCO

BEIR

MTEB

The GEO paper

Other notable papers

Challenges and controversies