Search Engine
Last reviewed
May 11, 2026
Sources
16 citations
Review status
Source-backed
Revision
v2 · 2,492 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 11, 2026
Sources
16 citations
Review status
Source-backed
Revision
v2 · 2,492 words
Add missing citations, update stale details, or suggest a clearer explanation.
A search engine is a software system that retrieves information from a corpus (the web, a private dataset, or a document store) and ranks results by relevance to a user query. Modern search engines combine classical information retrieval techniques with machine learning and neural networks to interpret queries, score documents, and increasingly generate direct answers. Since 2022 the field has been reshaped by retrieval-augmented generation, which fuses a retriever with a large language model to produce synthesized, cited responses rather than ten blue links.
This article covers the role of artificial intelligence in search: ranking algorithms, vector and semantic retrieval, AI-native answer engines, and supporting infrastructure.
Early web search engines including WebCrawler (1994), Lycos (1994), AltaVista (1995), and Yahoo Directory relied on keyword matching, manual curation, and lexical scoring functions such as TF-IDF and BM25. In 1998, Stanford graduate students Larry Page and Sergey Brin published The Anatomy of a Large-Scale Hypertextual Web Search Engine and the companion technical report introducing PageRank, an eigenvector-based algorithm that ranks pages by the structure of inbound links. PageRank powered the launch of Google and remained a central signal in web ranking for two decades.
The mid-2000s brought the learning-to-rank era. At Microsoft Research, Christopher Burges and colleagues developed RankNet (ICML 2005), then LambdaRank, then LambdaMART, a gradient-boosted tree model that won Track 1 of the 2010 Yahoo Learning to Rank Challenge and remains a standard baseline for web ranking.
Neural information retrieval emerged in the early 2010s with the Deep Structured Semantic Model (DSSM) by Huang and colleagues at Microsoft (CIKM 2013), which mapped queries and documents into a shared embedding space. On October 25, 2019, Google announced that it was applying BERT (Bidirectional Encoder Representations from Transformers) to ranking and featured snippets, estimating it would affect about one in ten English queries.
Dense retrieval became practical in 2020. Karpukhin and colleagues at Facebook AI published Dense Passage Retrieval (DPR, arXiv 2004.04906, EMNLP 2020), showing that a dual BERT encoder outperformed Lucene BM25 by 9 to 19 absolute points on open-domain question answering. The same year, Omar Khattab and Matei Zaharia introduced ColBERT (SIGIR 2020, arXiv 2004.12832), whose late-interaction architecture preserved fine-grained query-document matching while keeping computation tractable. Patrick Lewis and colleagues then proposed retrieval-augmented generation (NeurIPS 2020, arXiv 2005.11401), combining a dense retriever with a sequence-to-sequence generator and establishing the architectural template for today's AI search products.
Before retrieval, search systems analyze the query for intent. Common steps include spell correction, query rewriting and expansion, intent classification (navigational, informational, transactional), entity linking against a knowledge graph, and language identification. Google layered RankBrain (2015) and BERT (2019) on top of older lexical pipelines to handle conversational and long-tail queries.
Retrieval is the first pass that selects a candidate set from millions or billions of documents:
A second-stage reranker scores the top-k candidates more carefully. Gradient-boosted trees such as LambdaMART still dominate production web ranking. Neural rerankers include cross-encoders, monoT5 and RankT5 (Pradeep, Nogueira, and Lin, 2021 to 2022), and RankGPT (Sun and colleagues, EMNLP 2023), which prompts an LLM to compare passages.
Results can be returned as links, extractive snippets (a passage taken verbatim), or abstractive answers produced by an LLM conditioned on retrieved passages. The latter is the core of retrieval-augmented generation and underpins answer engines such as Perplexity and ChatGPT Search.
Lexical search uses inverted indexes. Vector search depends on approximate nearest neighbor (ANN) indexes. Yu. A. Malkov and D. A. Yashunin formalized Hierarchical Navigable Small World graphs (HNSW) in 2016, with the journal version in IEEE TPAMI in 2018. FAISS, released by Facebook AI Research in March 2017 with Johnson, Douze, and Jegou's billion-scale paper (arXiv 1702.08734), provides CPU and GPU implementations of IVF, PQ, and HNSW. Other ANN libraries include Annoy (Spotify), ScaNN (Google, ICML 2020), and DiskANN (Microsoft, NeurIPS 2019).
A wave of AI-native search products launched between 2022 and 2025.
| Product | Operator | Launched | Notes |
|---|---|---|---|
| Perplexity AI | Perplexity | Dec 2022 | Founded Aug 2022 by Aravind Srinivas, Denis Yarats, Johnny Ho, Andy Konwinski |
| You.com | You.com Inc. | Nov 2021 beta | Founded 2020 by Richard Socher and Bryan McCann; LLM chat with live web results from Dec 2022 |
| Bing Chat / Copilot | Microsoft | Feb 7, 2023 | First mainstream LLM-backed web search; OpenAI GPT-4; later rebranded Microsoft Copilot |
| ChatGPT Search | OpenAI | Oct 31, 2024 | Evolved from the SearchGPT prototype announced July 2024 |
| Google AI Overviews | May 14, 2024 (US) | Previewed as SGE at Google I/O May 10, 2023; expanded to 100+ countries Oct 2024 | |
| Google Gemini in Search | 2024 onward | Gemini models power AI Overviews | |
| Brave Search | Brave Software | 2021 beta | Independent index; AI summarizer added 2023 |
| Kagi | Kagi Inc. | 2022 beta | Subscription, ad-free; FastGPT and Kagi Assistant add LLM features |
| Phind | Phind | 2022 | Developer-focused technical Q&A |
| Exa (formerly Metaphor) | Exa Labs | 2022 | Semantic search API |
| Tavily | Tavily | 2023 | Developer search API for LLM agents |
| Andi | Andi Search | 2022 | Conversational, privacy-oriented |
| Komo | Komo Search | 2022 | AI-native consumer search |
The industry has split into three patterns: traditional engines return ranked links, AI-native engines synthesize an answer with inline citations, and hybrids (such as Google Search after May 2024) show an AI Overview above traditional results.
RAG is the dominant architecture for AI-powered search. Given a query, the system retrieves a small number of relevant passages (via dense, sparse, or hybrid retrieval), formats them into a prompt, and asks an LLM to produce an answer grounded in those passages with citations. Retrieval reduces hallucination by anchoring generation to up-to-date evidence. Tradeoffs include retrieval errors propagating into answers, citation drift, and added latency. RAG is the operating principle behind Perplexity, ChatGPT Search, Bing Copilot, AI Overviews, and enterprise systems such as Glean.
Production RAG and semantic search typically rely on a vector database or vector-capable engine.
| System | License | Deployment | Notes |
|---|---|---|---|
| Pinecone | Proprietary | Managed cloud | Serverless tier since 2024 |
| Weaviate | Open source | Self-host or managed | Hybrid search; embedding modules |
| Milvus / Zilliz | Open source / managed | Self-host or managed | LF AI & Data; billion-scale deployments |
| Qdrant | Open source | Self-host or managed | Rust; payload filtering and quantization |
| Chroma | Open source | Embedded or server | Popular with LangChain and LlamaIndex |
| pgvector | Open source | Postgres extension | HNSW and IVFFlat inside Postgres |
| Vespa | Open source | Self-host or cloud | Yahoo origin; native ANN |
| Elasticsearch / OpenSearch | Apache 2.0 / SSPL | Self-host or managed | Dense and sparse vectors; ELSER |
| MongoDB Atlas Vector Search | Proprietary | Managed | Built on Atlas |
| Redis Vector | Source-available | Self-host or managed | Modules for vector and hybrid search |
| LanceDB | Open source | Embedded or cloud | Columnar Lance format |
| FAISS | MIT | Library | Library, not a database |
The choice depends on scale, hybrid query needs, and existing infrastructure. Hosted services reduce operations; open-source engines offer more control at scale.
Providers that expose web search as an API for RAG pipelines and AI agents include Google Custom Search JSON API, Brave Search API, Exa, Tavily, Serper, SerpAPI, You.com API, and Bing Web Search (Microsoft announced its retirement effective August 2025, pushing customers toward Bing Grounding for Azure AI). The shift from human-facing to agent-facing search APIs is a notable 2024 to 2025 trend.
Benchmarks underpin progress in retrieval. The most widely used are listed below.
| Benchmark | Year | Scope | Reference |
|---|---|---|---|
| MS MARCO | 2016 | Large-scale passage ranking and QA from Bing logs | Nguyen et al., arXiv 1611.09268 |
| TREC Deep Learning Track | 2019 onward | Annual NIST evaluation of deep retrieval | NIST TREC |
| BEIR | 2021 | Zero-shot retrieval across 18 datasets | Thakur et al., NeurIPS Datasets and Benchmarks, arXiv 2104.08663 |
| MTEB (retrieval split) | 2022 | Embedding model evaluation, includes retrieval tasks | Muennighoff et al., arXiv 2210.07316 |
| MIRACL | 2022 | Multilingual ad hoc retrieval (18 languages) | Zhang et al., arXiv 2210.09984 |
| LoTTE | 2022 | Long-tail topic retrieval, paired with ColBERTv2 | Santhanam et al., NAACL 2022 |
A recurring finding from BEIR is that BM25 underperforms dense models on in-domain MS MARCO but generalizes better out of domain. Hybrid retrieval typically wins on both fronts.
Apache Lucene underpins both Elasticsearch and OpenSearch (forked from Elasticsearch in 2021 after Elastic changed its license). Apache Solr is another long-running Lucene-based engine. Vespa, originally built at Yahoo and open-sourced in 2017, ships native tensor and ANN support for large-scale semantic search. Meilisearch, Typesense, and Sonic are lightweight engines for developer ergonomics. LangChain and LlamaIndex are the most-used frameworks for composing retrievers, vector stores, and LLMs into RAG applications.
Search and AI search now power many product categories:
Between 2024 and 2026 the search market restructured rapidly. Google rolled out AI Overviews in the United States on May 14, 2024 and expanded to more than 100 countries in October 2024. OpenAI launched ChatGPT Search on October 31, 2024 and removed the login requirement in 2025. Microsoft folded Bing Chat into a broader Copilot brand. Perplexity was valued at about 20 billion dollars in September 2025, while Anthropic, OpenAI, and Google introduced deep research modes that combine multi-step browsing with LLM synthesis. Agentic search, in which a model issues sub-queries, opens pages, and refines plans, became standard in flagship products.
Legal and economic disputes followed. The New York Times sued OpenAI and Microsoft in December 2023 over training data use. Several publishers blocked AI crawlers via robots.txt, and reports showed AI Overviews and answer engines reduced click-through rates to source sites in 2024 and 2025.
AI search inherits issues from both classical retrieval and large language models: