Search Engine
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v3 · 3,394 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
22 citations
Review status
Source-backed
Revision
v3 · 3,394 words
Add missing citations, update stale details, or suggest a clearer explanation.
A search engine is a software system that retrieves information from a corpus (the web, a private dataset, or a document store) and ranks results by relevance to a user query. Since 2022 the dominant shift in search has been from keyword-matched lists of links toward AI answer engines that synthesize a single cited response: Google AI Overviews, the AI-native Perplexity, OpenAI ChatGPT Search, and Google AI Mode. Google reported that AI Overviews reached over 2 billion monthly users across more than 200 countries and 40 languages by July 2025, making AI-generated answers a default part of mainstream web search. [17]
Modern search engines combine classical information retrieval techniques with machine learning and neural networks to interpret queries, score documents, and increasingly generate direct answers. The technical engine behind the answer-engine shift is retrieval-augmented generation, which fuses a retriever with a large language model to produce synthesized, cited responses rather than ten blue links.
This article covers the role of artificial intelligence in search: ranking algorithms, vector and semantic retrieval, AI-native answer engines, the keyword-search-to-answer-engine paradigm shift, its impact on web traffic and SEO, and supporting infrastructure.
Early web search engines including WebCrawler (1994), Lycos (1994), AltaVista (1995), and Yahoo Directory relied on keyword matching, manual curation, and lexical scoring functions such as TF-IDF and BM25. In 1998, Stanford graduate students Larry Page and Sergey Brin published The Anatomy of a Large-Scale Hypertextual Web Search Engine and the companion technical report introducing PageRank, an eigenvector-based algorithm that ranks pages by the structure of inbound links. PageRank powered the launch of Google and remained a central signal in web ranking for two decades.
The mid-2000s brought the learning-to-rank era. At Microsoft Research, Christopher Burges and colleagues developed RankNet (ICML 2005), then LambdaRank, then LambdaMART, a gradient-boosted tree model that won Track 1 of the 2010 Yahoo Learning to Rank Challenge and remains a standard baseline for web ranking.
Neural information retrieval emerged in the early 2010s with the Deep Structured Semantic Model (DSSM) by Huang and colleagues at Microsoft (CIKM 2013), which mapped queries and documents into a shared embedding space. On October 25, 2019, Google announced that it was applying BERT (Bidirectional Encoder Representations from Transformers) to ranking and featured snippets, estimating it would affect about one in ten English queries.
Dense retrieval became practical in 2020. Karpukhin and colleagues at Facebook AI published Dense Passage Retrieval (DPR, arXiv 2004.04906, EMNLP 2020), showing that a dual BERT encoder outperformed Lucene BM25 by 9 to 19 absolute points on open-domain question answering. The same year, Omar Khattab and Matei Zaharia introduced ColBERT (SIGIR 2020, arXiv 2004.12832), whose late-interaction architecture preserved fine-grained query-document matching while keeping computation tractable. Patrick Lewis and colleagues then proposed retrieval-augmented generation (NeurIPS 2020, arXiv 2005.11401), combining a dense retriever with a sequence-to-sequence generator and establishing the architectural template for today's AI search products.
Before retrieval, search systems analyze the query for intent. Common steps include spell correction, query rewriting and expansion, intent classification (navigational, informational, transactional), entity linking against a knowledge graph, and language identification. Google layered RankBrain (2015) and BERT (2019) on top of older lexical pipelines to handle conversational and long-tail queries.
Retrieval is the first pass that selects a candidate set from millions or billions of documents:
A second-stage reranker scores the top-k candidates more carefully. Gradient-boosted trees such as LambdaMART still dominate production web ranking. Neural rerankers include cross-encoders, monoT5 and RankT5 (Pradeep, Nogueira, and Lin, 2021 to 2022), and RankGPT (Sun and colleagues, EMNLP 2023), which prompts an LLM to compare passages.
Results can be returned as links, extractive snippets (a passage taken verbatim), or abstractive answers produced by an LLM conditioned on retrieved passages. The latter is the core of retrieval-augmented generation and underpins answer engines such as Perplexity and ChatGPT Search.
Lexical search uses inverted indexes. Vector search depends on approximate nearest neighbor (ANN) indexes. Yu. A. Malkov and D. A. Yashunin formalized Hierarchical Navigable Small World graphs (HNSW) in 2016, with the journal version in IEEE TPAMI in 2018. FAISS, released by Facebook AI Research in March 2017 with Johnson, Douze, and Jegou's billion-scale paper (arXiv 1702.08734), provides CPU and GPU implementations of IVF, PQ, and HNSW. Other ANN libraries include Annoy (Spotify), ScaNN (Google, ICML 2020), and DiskANN (Microsoft, NeurIPS 2019).
The answer-engine shift is the move from search engines that return a ranked list of links to AI systems that read across multiple sources and return one synthesized, cited answer in natural language. A 2024 ACM SIGKDD paper that introduced the term generative engine defined these systems as engines that "satisfy queries by synthesizing information from multiple sources and summarizing them using an LLM." [21] Instead of clicking through to a website, the user reads the model's answer directly on the results page, and the underlying sources become citations rather than destinations.
The shift accelerated quickly across the major operators:
| Answer engine | Operator | Reported reach or volume | As of |
|---|---|---|---|
| Google AI Overviews | Over 2 billion monthly users, 200+ countries, 40 languages [17] | July 2025 | |
| Google AI Mode | About 100 million monthly users (US and India) [17] | July 2025 | |
| ChatGPT (incl. Search) | OpenAI | About 2.5 billion queries per day; 800 million weekly users [18] | mid-2025 |
| Perplexity | Perplexity | 780 million queries in May 2025, growing 20%+ month over month [20] | May 2025 |
Google introduced AI Mode as a US Search Labs experiment in March 2025 and began a broader US rollout in May 2025, then expanded to additional markets through the rest of the year. [19] AI Mode is a fully conversational, Gemini-powered interface that issues multiple sub-queries (a technique Google calls query fan-out) and returns an extended generative answer, marking Google's clearest step away from the classic ten-blue-links page.
A wave of AI-native search products launched between 2022 and 2025.
| Product | Operator | Launched | Notes |
|---|---|---|---|
| Perplexity AI | Perplexity | Dec 2022 | Founded Aug 2022 by Aravind Srinivas, Denis Yarats, Johnny Ho, Andy Konwinski |
| You.com | You.com Inc. | Nov 2021 beta | Founded 2020 by Richard Socher and Bryan McCann; LLM chat with live web results from Dec 2022 |
| Bing Chat / Copilot | Microsoft | Feb 7, 2023 | First mainstream LLM-backed web search; OpenAI GPT-4; later rebranded Microsoft Copilot |
| ChatGPT Search | OpenAI | Oct 31, 2024 | Evolved from the SearchGPT prototype announced July 2024 |
| Google AI Overviews | May 14, 2024 (US) | Previewed as SGE at Google I/O May 10, 2023; expanded to 100+ countries Oct 2024 | |
| Google AI Mode | May 2025 (US, after Mar 2025 Labs) | Conversational, Gemini-powered; query fan-out | |
| Google Gemini in Search | 2024 onward | Gemini models power AI Overviews | |
| Brave Search | Brave Software | 2021 beta | Independent index; AI summarizer added 2023 |
| Kagi | Kagi Inc. | 2022 beta | Subscription, ad-free; FastGPT and Kagi Assistant add LLM features |
| Phind | Phind | 2022 | Developer-focused technical Q&A |
| Exa (formerly Metaphor) | Exa Labs | 2022 | Semantic search API |
| Tavily | Tavily | 2023 | Developer search API for LLM agents |
| Andi | Andi Search | 2022 | Conversational, privacy-oriented |
| Komo | Komo Search | 2022 | AI-native consumer search |
The industry has split into three patterns: traditional engines return ranked links, AI-native engines synthesize an answer with inline citations, and hybrids (such as Google Search after May 2024) show an AI Overview above traditional results.
RAG is the dominant architecture for AI-powered search. Given a query, the system retrieves a small number of relevant passages (via dense, sparse, or hybrid retrieval), formats them into a prompt, and asks an LLM to produce an answer grounded in those passages with citations. Retrieval reduces hallucination by anchoring generation to up-to-date evidence. Tradeoffs include retrieval errors propagating into answers, citation drift, and added latency. RAG is the operating principle behind Perplexity, ChatGPT Search, Bing Copilot, AI Overviews, and enterprise systems such as Glean.
Production RAG and semantic search typically rely on a vector database or vector-capable engine.
| System | License | Deployment | Notes |
|---|---|---|---|
| Pinecone | Proprietary | Managed cloud | Serverless tier since 2024 |
| Weaviate | Open source | Self-host or managed | Hybrid search; embedding modules |
| Milvus / Zilliz | Open source / managed | Self-host or managed | LF AI & Data; billion-scale deployments |
| Qdrant | Open source | Self-host or managed | Rust; payload filtering and quantization |
| Chroma | Open source | Embedded or server | Popular with LangChain and LlamaIndex |
| pgvector | Open source | Postgres extension | HNSW and IVFFlat inside Postgres |
| Vespa | Open source | Self-host or cloud | Yahoo origin; native ANN |
| Elasticsearch / OpenSearch | Apache 2.0 / SSPL | Self-host or managed | Dense and sparse vectors; ELSER |
| MongoDB Atlas Vector Search | Proprietary | Managed | Built on Atlas |
| Redis Vector | Source-available | Self-host or managed | Modules for vector and hybrid search |
| LanceDB | Open source | Embedded or cloud | Columnar Lance format |
| FAISS | MIT | Library | Library, not a database |
The choice depends on scale, hybrid query needs, and existing infrastructure. Hosted services reduce operations; open-source engines offer more control at scale.
Providers that expose web search as an API for RAG pipelines and AI agents include Google Custom Search JSON API, Brave Search API, Exa, Tavily, Serper, SerpAPI, You.com API, and Bing Web Search (Microsoft announced its retirement effective August 2025, pushing customers toward Bing Grounding for Azure AI). The shift from human-facing to agent-facing search APIs is a notable 2024 to 2025 trend.
Benchmarks underpin progress in retrieval. The most widely used are listed below.
| Benchmark | Year | Scope | Reference |
|---|---|---|---|
| MS MARCO | 2016 | Large-scale passage ranking and QA from Bing logs | Nguyen et al., arXiv 1611.09268 |
| TREC Deep Learning Track | 2019 onward | Annual NIST evaluation of deep retrieval | NIST TREC |
| BEIR | 2021 | Zero-shot retrieval across 18 datasets | Thakur et al., NeurIPS Datasets and Benchmarks, arXiv 2104.08663 |
| MTEB (retrieval split) | 2022 | Embedding model evaluation, includes retrieval tasks | Muennighoff et al., arXiv 2210.07316 |
| MIRACL | 2022 | Multilingual ad hoc retrieval (18 languages) | Zhang et al., arXiv 2210.09984 |
| LoTTE | 2022 | Long-tail topic retrieval, paired with ColBERTv2 | Santhanam et al., NAACL 2022 |
A recurring finding from BEIR is that BM25 underperforms dense models on in-domain MS MARCO but generalizes better out of domain. Hybrid retrieval typically wins on both fronts.
Apache Lucene underpins both Elasticsearch and OpenSearch (forked from Elasticsearch in 2021 after Elastic changed its license). Apache Solr is another long-running Lucene-based engine. Vespa, originally built at Yahoo and open-sourced in 2017, ships native tensor and ANN support for large-scale semantic search. Meilisearch, Typesense, and Sonic are lightweight engines for developer ergonomics. LangChain and LlamaIndex are the most-used frameworks for composing retrievers, vector stores, and LLMs into RAG applications.
Search and AI search now power many product categories:
AI answer engines are reshaping the economics of the web because the answer appears on the results page, so users no longer need to click through to source sites. A Pew Research Center study of US adults' browsing in March 2025 found that when a Google AI summary appeared, users clicked a traditional search result in only 8 percent of visits, compared with 15 percent of visits when no summary was shown, and clicked the link inside the AI summary itself in just 1 percent of visits. [22] Pew also found that users were more likely to end their session entirely after seeing an AI summary (26 percent versus 16 percent without one). Google disputed the study, calling its query set unrepresentative of real Search traffic. [22]
This decline of referral clicks, often called the zero-click search problem, has driven publisher lawsuits and licensing deals. The New York Times sued OpenAI and Microsoft in December 2023 over training-data use, several publishers blocked AI crawlers via robots.txt, and OpenAI signed content deals with Axel Springer, News Corp, the Financial Times, and others starting in 2023.
Generative engine optimization (GEO) is the practice of structuring web content so that AI answer engines are more likely to cite it, the answer-engine successor to classical search engine optimization (SEO). The term was introduced in GEO: Generative Engine Optimization, a paper by Pranjal Aggarwal and colleagues at Princeton presented at ACM SIGKDD (KDD) in 2024. [21] The authors built a benchmark of 10,000 queries and reported that their methods "can boost visibility by up to 40% in generative engine responses." [21]
The paper found that the most effective tactics required only minimal content changes: adding relevant statistics, incorporating credible quotations, and citing authoritative sources produced the largest gains in how often a page was surfaced in a generative answer. [21] These findings have driven a fast-growing GEO industry in 2025 and 2026, paralleling the older SEO market and prompting traditional SEO firms to add answer-engine optimization (AEO) services.
Between 2024 and 2026 the search market restructured rapidly. Google rolled out AI Overviews in the United States on May 14, 2024 and expanded to more than 100 countries in October 2024, reaching over 2 billion monthly users by July 2025. [17] OpenAI launched ChatGPT Search on October 31, 2024 and removed the login requirement in 2025, by which point ChatGPT was handling roughly 2.5 billion queries per day. [18] Microsoft folded Bing Chat into a broader Copilot brand. Perplexity reported 780 million queries in May 2025 and was valued at about 20 billion dollars in a September 2025 funding round. [20] Anthropic, OpenAI, and Google introduced deep research modes that combine multi-step browsing with LLM synthesis. Agentic search, in which a model issues sub-queries, opens pages, and refines plans, became standard in flagship products.
Legal and economic disputes followed. The New York Times sued OpenAI and Microsoft in December 2023 over training data use. Several publishers blocked AI crawlers via robots.txt, and reports showed AI Overviews and answer engines reduced click-through rates to source sites in 2024 and 2025.
AI search inherits issues from both classical retrieval and large language models: