Deep Research

Deep Research is a category of AI agent products that perform multi-step web research to produce comprehensive analytical reports. These systems autonomously plan a research strategy, browse the public web, read and interpret sources, evaluate credibility, synthesize findings, and deliver structured documents with citations. Unlike a conventional search query that returns links, or a single-turn large language model answer that draws on training data alone, a deep research agent executes an extended trajectory of reasoning, tool use, and refinement that may span several minutes to nearly an hour.

The term came into widespread use between December 2024 and April 2025, when nearly every major frontier AI lab launched a product or feature under this label. Google was first with Gemini Deep Research in December 2024, followed by OpenAI in February 2025, Perplexity AI and xAI within the same month, Microsoft in March, and Anthropic in April. By mid-2025 deep research had become a standard offering on the paid tier of most consumer AI products, marking a shift in how chatbots compete: not on conversational quality alone, but on the ability to do focused white-collar research work that previously took human analysts hours or days.

origin and naming

The phrase "deep research" was popularized by Google's December 2024 launch, but the underlying idea of agentic web browsing for research synthesis has older roots. WebGPT, an OpenAI prototype published in 2021, trained a GPT-3 descendant to use a text-based browser and produce cited long-form answers. Adept's ACT-1 (2022), Inflection's Pi research mode, and various academic systems explored similar territory. What changed in late 2024 was the convergence of three things: cheap, fast reasoning models with long context windows; reliable browser tooling exposed through tool-use APIs; and reinforcement learning recipes for training agents to plan and persist across long horizons. Once these pieces were in place, every major lab raced to ship a product under the same name, and "Deep Research" became a category rather than a brand.

The name itself is somewhat overloaded. In product UIs it usually appears as a toggle or button next to ordinary chat. In academic and developer contexts it can refer to the underlying agent architecture rather than any specific product. This article uses the capitalized form for named products and the lowercase form for the general capability.

how deep research works

Deep research systems combine several core capabilities: query planning, web browsing, source evaluation, iterative reasoning, and report synthesis. Implementations differ in technical detail, but the general loop is consistent.

query planning and decomposition

When a user submits a complex question, the agent first formulates a research plan. It breaks the overarching question into smaller sub-tasks that can be investigated individually. A question about the competitive landscape of electric vehicle battery technology might be decomposed into sub-queries about current market leaders, recent patent filings, cost benchmarks, emerging cell chemistries, and supply chain bottlenecks. The plan is sometimes shown to the user for review before execution. Google's Gemini Deep Research, for example, presents the plan as an editable outline, and OpenAI's ChatGPT often asks one or two clarifying questions before kicking off the run.

This planning phase is what distinguishes deep research from standard information retrieval. Rather than executing a single search and returning top results, the system creates a structured investigation strategy that guides every subsequent action.

iterative web browsing and source collection

The agent then executes its plan by issuing search queries, clicking results, scrolling pages, and reading documents. Modern deep research systems handle a wide variety of content types: HTML pages, PDFs, academic papers, spreadsheets, images, and sometimes video transcripts. During a single session these agents typically perform dozens of searches and read hundreds of distinct sources.

A key capability is multi-hop traversal. If an initial search reveals a promising lead, the agent can navigate to a referenced source, read it, and continue following references deeper into the topic. This mirrors how a human researcher would trace a footnote back to a primary source. Some implementations also use a parallel architecture, where a coordinator agent spawns multiple sub-agents that each investigate a different sub-question simultaneously. Anthropic's research blueprint describes exactly this pattern: a lead Claude Opus instance plans the work and dispatches several Claude Sonnet workers in parallel, each handling one branch of the question tree.

iterative reasoning and adaptation

Deep research systems do not follow a fixed script. The agent reasons about what it has found, decides which additional sources to consult, and pivots when contradictory evidence appears. If two sources disagree on a number, the agent can formulate a new query specifically aimed at resolving the conflict, often by looking for a primary source such as a regulatory filing, an academic paper, or a press release.

This behavior depends on the underlying model's reasoning ability. Most deep research products are built on a reasoning-tuned base model: o3 and o4-mini for OpenAI, Gemini 2.0 Flash Thinking and later 2.5 Pro for Google, Sonar Reasoning for Perplexity, and Claude Opus 4 for Anthropic. These models were trained with reinforcement learning on browsing and reasoning tasks, learning to plan multi-step trajectories, backtrack when a path is unproductive, and stop when enough evidence has accumulated.

synthesis and report generation

After gathering sufficient information, the agent synthesizes its findings into a structured document. Reports usually include an executive summary, organized sections addressing different aspects of the question, data tables where appropriate, and inline citations that link claims to specific sources. The citation mechanism lets users verify any factual statement by following the link back to the original material.

The full process, from prompt to delivered report, takes between 2 and 45 minutes depending on the platform and the complexity of the question. Perplexity's implementation is the fastest, often finishing in under three minutes. OpenAI's typical run is 5 to 30 minutes. Anthropic's Research can run as long as 45 minutes for complex multi-source investigations.

major implementations

google gemini deep research

Google announced Gemini Deep Research on December 11, 2024, alongside the broader Gemini 2.0 launch event. It was the first commercial product to ship under the "Deep Research" label and was initially available exclusively to Gemini Advanced subscribers on desktop and mobile web, with English as the only supported language at launch.

At launch the feature was powered by Gemini 1.5 Pro. Google subsequently upgraded the underlying model to Gemini 2.0 Flash Thinking (experimental), a reasoning-capable variant that improved both quality and serving efficiency. On April 8, 2025, Google upgraded again, this time to Gemini 2.5 Pro Experimental. In Google's own internal evaluations, raters preferred reports from the 2.5 Pro version over OpenAI's Deep Research roughly 69.9% of the time, with the largest margins on instruction following (60.6% to 39.4%), comprehensiveness (76.9% to 23.1%), completeness (73.3% to 26.7%), and writing quality (58.2% to 41.8%). At Google I/O in May 2025 the company added a Gemini 2.5 Flash variant for faster, lower-cost runs and rolled the feature out to more than 45 languages globally.

When a user submits a query, Gemini Deep Research presents an editable research plan before execution. The system then browses the web, gathers sources, and compiles a structured report that can be exported directly to Google Docs. As of 2025, Google offers tiered access: free users receive a small monthly allotment, Google AI Pro subscribers ($19.99 per month in the United States) receive substantially higher limits, and Google AI Ultra subscribers receive the highest quota and earliest access to model upgrades.

openai deep research

OpenAI launched Deep Research on February 2, 2025, initially available only to ChatGPT Pro subscribers ($200 per month). It was described as a "next-generation" agentic capability powered by an agentic version of the o3 reasoning model, optimized for web browsing and data analysis.

The model was trained end-to-end with reinforcement learning on hard browsing and reasoning tasks across many domains. Through this training the model learned core browser actions (search, click, scroll, file interpretation), Python execution in a sandboxed environment for calculations and chart generation, and the higher-level skill of reasoning across many sources to find specific facts or write a comprehensive report. The system can also browse user-uploaded files (PDFs, images, spreadsheets), generate and iterate on charts using its Python tool, and embed both generated graphs and screenshots from websites in its responses.

A typical session takes between 5 and 30 minutes. OpenAI expanded access in stages over the following months. As of April 24, 2025, the rollout reached: ChatGPT Pro at 250 queries per month, Plus, Team, Enterprise, and Edu at 25 queries per month, and Free users at 5 queries per month. A lightweight version powered by an o4-mini variant was introduced for cost-efficient queries; once a user exhausts the full quota, queries automatically fall back to this lightweight model.

On June 26, 2025, OpenAI released the Deep Research API with two named models, o3-deep-research and o4-mini-deep-research, allowing developers to integrate the capability into their own applications. The API exposes both the plan and the tool calls so that developers can audit the agent's behavior or customize the source list.

perplexity deep research

Perplexity AI launched its Deep Research feature on February 14, 2025. The company took a freemium approach: 5 queries per day are available to anyone without an account, and Pro subscribers ($20 per month) receive 500 queries per day, the most generous quota among major providers.

Perplexity's headline benchmark result was 21.1% on Humanity's Last Exam, second only to OpenAI's Deep Research at the time of launch. On the SimpleQA factuality benchmark, Perplexity Deep Research scored 93.9%, a strong result for a production system. Sessions typically complete in under three minutes, considerably faster than competing products, in part because Perplexity's Sonar model family runs on Cerebras inference hardware that produces output at roughly 1,200 tokens per second.

The underlying model evolved over 2025. The original launch used a Sonar reasoning model with a Llama 3.3 70B base. Perplexity later integrated DeepSeek R1 weights for the reasoning step and added an Advanced Deep Research update with improved accuracy, expanded source coverage, and a redesigned report layout. In parallel the company shipped a Sonar Deep Research API for programmatic access. The deep research capability is also surfaced inside the Comet browser, which Perplexity launched on July 9, 2025 for Windows and macOS. Comet's built-in assistant can run a deep research investigation across the user's open tabs, a workflow particularly popular with graduate students conducting literature reviews.

anthropic claude research

Anthropic launched Research for Claude on April 15, 2025. The feature lets Claude conduct multi-step web investigations and pull in data from the user's connected Google Workspace (Gmail, Calendar, Drive, Docs) to deliver citation-backed answers.

When Research is activated, Claude operates as a small agent team. A lead agent (Claude Opus 4 in the original implementation) parses the prompt, drafts a strategy, and dispatches several sub-agents (Claude Sonnet 4) that each search a slice of the question space in parallel. Findings stream back to the lead agent, which decides what to investigate next and ultimately writes the final report. In Anthropic's internal evaluations, this multi-agent setup outperformed a standalone Opus 4 agent by roughly 90.2% on their internal research benchmark. The trade-off is cost: a parallel research run uses about 15 times more tokens than a normal Claude conversation. Sessions typically run from 5 to 45 minutes.

On May 1, 2025, Anthropic expanded the Research feature with the Integrations system, letting Claude pull information from connected third-party services including Atlassian Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. Research is available on the Claude Max, Team, and Enterprise plans. During the initial rollout it was limited to users in the United States, Japan, and Brazil; coverage has since expanded to most regions where Claude is sold.

xai grok deepsearch

xAI introduced DeepSearch as part of the Grok 3 launch on February 17, 2025. DeepSearch scans the open web and the X (formerly Twitter) platform to generate detailed summaries in response to complex queries. The integration with X gives DeepSearch a distinctive capability: it can analyze real-time social discourse, breaking news, and trending topics in a way that competitors with no native social data feed cannot.

DeepSearch is available to X Premium+ subscribers and SuperGrok subscribers ($30 per month). On March 19, 2025, xAI followed up with DeeperSearch, an upgraded mode that uses extended search and additional reasoning steps for more thorough investigations. xAI positions DeepSearch as both a research tool and a real-time intelligence feed, with use cases ranging from market sentiment tracking to tracking the development of a breaking news story across multiple sources and posts.

microsoft 365 copilot researcher and analyst

Microsoft announced two reasoning agents for Microsoft 365 Copilot, Researcher and Analyst, on March 25, 2025. The Researcher agent combines a version of OpenAI's deep research model with Microsoft 365's enterprise data graph, letting it pull from emails, meetings, files, chats, and connected business applications such as Salesforce, ServiceNow, and Confluence in addition to the public web. The Analyst agent, built on OpenAI's o3-mini, focuses on data analysis and can run Python on user-supplied spreadsheets to produce charts and statistical summaries.

Microsoft began rolling Researcher and Analyst out to Microsoft 365 Copilot license holders in April 2025 through a "Frontier" early-access program, and reached general availability on June 2, 2025. The enterprise positioning is the key differentiator: Microsoft markets the agents as a way to bring deep research into the regulatory, security, and compliance boundary of an organization's existing Microsoft 365 tenant, rather than sending corporate data to a separate consumer chatbot.

deepseek and other entrants

DeepSeek added a search mode to its chat interface that performs web retrieval before answering, and a separate DeepThink (R1) reasoning mode for extended chain-of-thought. The two features cannot run together in the consumer chat product, which means DeepSeek does not yet offer a fully unified deep research mode comparable to OpenAI's. DeepSeek-V4 Preview, released in early 2026, expanded the agent capabilities and brought them closer to the unified pattern seen in the Western products.

You.com launched ARI (Advanced Research and Insights) in February 2025, then released ARI Enterprise in May 2025 with a focus on financial analysts and management consultants. ARI's distinguishing claim is breadth: the product processes more than 400 sources in a single run, roughly an order of magnitude more than the typical OpenAI or Perplexity session. You.com reports that ARI Enterprise scored 80% on a deep research adaptation of the FRAMES benchmark and won 76% of head-to-head comparisons against OpenAI's Deep Research as judged by OpenAI's own o3-mini model.

Other notable entrants include Kagi's Assistant research mode, Mistral's Le Chat agentic search, Cohere's Command R+ research integrations, and several specialized vertical products such as Elicit (academic literature) and Consensus (research synthesis from peer-reviewed papers).

comparison of major deep research products

Product	Provider	Launched	Underlying model	Typical run time	Free tier	Paid tier price	Paid quota	Distinct strength
Gemini Deep Research	Google	Dec 11, 2024	Gemini 2.5 Pro Experimental	5 to 15 min	5 reports/month	$19.99/mo (AI Pro)	Adaptive	Workspace integration, multilingual
Deep Research	OpenAI	Feb 2, 2025	o3 (browsing tuned)	5 to 30 min	5 queries/month	$20/mo (Plus), $200/mo (Pro)	25/mo Plus, 250/mo Pro	Benchmark scores, file upload, charts
Deep Research	Perplexity	Feb 14, 2025	Sonar (Llama 3.3 70B base)	Under 3 min	5 queries/day	$20/mo (Pro)	500/day	Speed, free tier, citation precision
DeepSearch	xAI	Feb 17, 2025	Grok 3	1 to 10 min	Limited trial	$30/mo (SuperGrok)	Not disclosed	Real-time X data, breaking news
Researcher	Microsoft	Mar 25, 2025	OpenAI deep research model	3 to 20 min	None	M365 Copilot license	Enterprise	M365 graph, business connectors
Research	Anthropic	Apr 15, 2025	Claude Opus 4 + Sonnet 4	5 to 45 min	None	Max/Team/Enterprise	Not disclosed	Parallel sub-agents, integrations
ARI / ARI Enterprise	You.com	Feb 2025 / May 2025	Multi-model ensemble	3 to 10 min	Limited	Custom enterprise pricing	Custom	400+ source processing, consulting focus
DeepThink + Search	DeepSeek	2025	DeepSeek R1 / V4	Varies	Free	$0	Generous	Cost, open weights for R1 base

Dates and prices reflect launch and subsequent confirmed updates through April 2026.

benchmarks and performance

Deep research systems are evaluated on benchmarks designed to test complex reasoning, multi-step retrieval, and factual accuracy. Three benchmarks dominate the conversation as of early 2026.

humanity's last exam

Humanity's Last Exam (HLE), released by the Center for AI Safety in January 2025, contains roughly 2,500 expert-written questions across more than 100 academic subjects, from rocket science to analytic philosophy. It was deliberately constructed to be very hard for AI systems; baseline models scored under 10% at release. Deep research agents posted significant improvements:

System	HLE accuracy
OpenAI Deep Research (o3-based)	26.6%
Perplexity Deep Research	21.1%
OpenAI o3-mini (high)	13.0%
OpenAI o3-mini	10.5%
OpenAI o1	~9%
DeepSeek R1	~9%

OpenAI's Deep Research nearly tripled the previous best score. The largest gains appeared on questions related to chemistry, the humanities and social sciences, and mathematics.

browsecomp

BrowseComp is a benchmark released by OpenAI in April 2025, specifically designed to evaluate AI browsing agents. It contains 1,266 problems that require an agent to persistently navigate multiple websites to retrieve hard-to-find, entangled information. The benchmark was constructed by humans who could not solve most of the questions in a few hours of browsing.

System	BrowseComp accuracy (single attempt)
OpenAI Deep Research	51.5%
OpenAI o1	9.9%
GPT-4o with browsing	1.9%
GPT-4.5	~0%

Deep Research solved roughly half the problems on a single attempt and reached 78% with majority voting over 64 samples. The near-zero scores from GPT-4o and GPT-4.5 highlight that without strong reasoning and persistent tool use, models simply cannot retrieve the kinds of obscure, multi-hop facts BrowseComp targets.

gaia

GAIA (General AI Assistants) is a 450-question benchmark testing reasoning, multimodality, web browsing, and tool use. OpenAI's Deep Research scored 67.36% on the GAIA validation set, setting a new state of the art. By comparison, GPT-4 without an agentic harness scored below 7%. Hugging Face's open-source Open Deep Research, built on the smolagents framework, reached 55% pass-at-1 on GAIA validation, including 47.6% on the hardest level-3 questions, and held the top open-submission slot on the GAIA leaderboard for several months in 2025.

simpleqa

SimpleQA, an OpenAI-built factuality benchmark of several thousand short questions, is used as a check on a deep research system's tendency to hallucinate on simple lookups. Perplexity Deep Research scored 93.9%, substantially above the standalone leading models, which suggests that the iterative search and citation step recovers many facts that a non-retrieval model would miss or misremember.

typical use cases

Deep research agents are now applied across a wide range of professional and academic contexts. The most common uses fall into a handful of categories.

market and competitive research

Businesses use deep research tools to map competitive landscapes, track industry trends, evaluate potential partners or acquisition targets, and compile market sizing estimates. The ability to pull dozens of sources into a single coherent report shortens analyst work that previously took days. Strategy consultancies are among the heaviest early adopters; You.com explicitly markets ARI Enterprise to firms in this segment.

academic literature review

Researchers use deep research to survey existing literature on a topic, identify key papers, map the relationships between research threads, and find gaps. The product does not replace careful academic review, but it shortens the orientation phase considerably. Graduate students using Perplexity's Comet browser report literature-review time reductions of 60% to 70% compared to manual reading.

investment research and due diligence

Investors use deep research for company background work, sector analysis, patent-landscape reviews, and quick technical due diligence on early-stage companies. The structured report format with explicit citations is well suited for sharing inside investment teams and producing audit trails for regulators. Sell-side analysts use the same workflow for note generation.

policy and regulatory analysis

Policy researchers and compliance teams use deep research to track regulatory changes across jurisdictions, summarize proposed legislation, and compile government guidance on specific topics. Multi-jurisdiction comparisons that once required a small team can now be drafted in a single 30-minute run, with humans reviewing the citations and editing the final language.

product and procurement research

Consumers and procurement teams use deep research for detailed product comparisons. Specifications, reviews, pricing, and availability data get pulled into organized tables. Microsoft's Researcher agent is heavily used for vendor comparisons inside enterprise IT.

scientific exploration and health information

Scientists use deep research to explore interdisciplinary connections and gather methodological notes from adjacent fields. Anthropic's Claude for Life Sciences product, launched in October 2025, applies the deep research pattern specifically to biomedical literature. Patients and clinicians also use the consumer products to gather background information on conditions and treatments, although every major provider warns against using the output as medical advice.

journalism

Reporters use deep research to gather background on a person, company, or event before drafting an article. The citation trail is helpful for fact-checking. The flip side is that newsroom editors increasingly worry about reporters relying too heavily on the synthesized summary rather than reading primary sources directly, and most major outlets have published internal guidelines on appropriate use.

comparison with traditional search and rag

Deep research occupies a distinct position relative to other ways of getting answers from the web.

Approach	What it returns	User effort	Source visibility	Latency
Web search	Ranked links	High; user reads each result	Full	Sub-second
Retrieval augmented generation	Single answer with retrieved context	Low	Often only top citations	Seconds
Standard chatbot answer	Single answer from training data	Low	None	Seconds
Deep research	Multi-section report with many citations	Low up front, moderate verification later	Full citation trail	Minutes

The distinction from RAG matters because the two are sometimes confused. RAG retrieves a small number of context chunks for a single answer turn; deep research runs many sequential RAG-like loops, each one informed by the result of the previous step, and stitches the results into a long-form document. Deep research is essentially RAG with an outer planning loop and a final synthesis step.

limitations and risks

The capabilities are real, but every deep research product has well-documented failure modes that users should understand before relying on the output for important work.

hallucination and misattribution

Like all systems built on large language models, deep research agents can generate factually incorrect content. OpenAI's Deep Research system card, published February 25, 2025, explicitly notes that the model may produce false statements and that its chain of thought sometimes hallucinates access to tools or capabilities the agent does not have. A more subtle and more common problem is misattribution: a sentence is correct but the citation does not actually support it. Most hallucinations can be caught by clicking the citations, but doing so for an entire 5,000-word report is itself a significant effort. The system card also warns about misinformation by omission: the report may miss a crucial fact simply because the agent's searches did not surface it.

OpenAI's own published research on hallucination argues that some hallucinations stem from fundamental properties of the training process, including epistemic uncertainty when information appears rarely in training data, and that they cannot be entirely eliminated through engineering alone.

source quality and seo bias

Deep research agents are downstream of web search rankings. SEO-optimized content can crowd out more authoritative but less visible sources. Independent reviewers have noted that Google's Gemini Deep Research is more susceptible to this bias than OpenAI's, presumably because it leans on Google Search results, while OpenAI's agent uses a more constrained internal browser surface that includes a tighter set of sources. None of the products fully solves the problem, and a determined SEO operator can still influence what shows up in a deep research report.

recency and coverage gaps

Very recent information, paywalled content, and content not indexed by the underlying search backend may simply not be visible to the agent. Events that happened after the start of the search session will be missed entirely, and pages behind authentication walls are inaccessible unless an integration explicitly grants access. The result is a report that can confidently omit the most important fact about a topic if that fact lives in a place the agent could not see.

depth versus breadth

Deep research reports cover a wide range of subtopics but rarely match a domain expert's analysis on any single point. The text reads like a strong undergraduate research paper rather than a senior analyst's memo. For high-stakes decisions, the report works best as a starting brief that a human expert then refines.

cost and rate limits

The most capable products require paid subscriptions, and even paid users hit query limits. OpenAI's $200 per month Pro plan provides 250 queries; the $20 per month Plus plan offers 25. Anthropic's parallel architecture uses about 15 times the tokens of a normal chat, which the company manages via tiered pricing rather than by exposing per-query counts. Heavy users frequently combine multiple products to stretch their quotas.

privacy

A deep research run reveals what the user is investigating and what data they consider relevant. Some queries (medical conditions, legal exposure, M&A targets, salary ranges) are sensitive in ways a normal search query is not, because they include enough context to identify the user's situation. Enterprise products such as Microsoft Researcher and Anthropic's connected Workspace integration mitigate this by keeping the queries inside a controlled tenant, but the consumer products send queries through the provider's standard logging pipeline.

verification burden

Citations look authoritative, but they shift the verification burden onto the user rather than removing it. The presence of a citation does not guarantee that the source supports the claim as written. For high-stakes work (legal, medical, financial), every cited claim still needs human review, which can negate much of the time saved by the automation. Several law firms and financial institutions have published internal guidance forbidding use of deep research output in client-facing materials without independent verification.

safety considerations

The agentic nature of deep research raises safety questions that go beyond those of standard chatbots. Because the agent browses the open web autonomously, it may encounter adversarial content explicitly designed to manipulate it.

Prompt injection is the most prominent concern. A hostile webpage can include text that instructs the agent to ignore the user's original task and instead leak data, click a malicious link, or fabricate a particular conclusion. OpenAI's Deep Research system card classifies the model as medium risk overall and details specific training measures aimed at prompt injection resistance. Anthropic's parallel agent architecture spreads the attack surface across multiple sub-agents, which complicates the attack but does not eliminate it.

Privacy protections are a second concern. Deep research agents sometimes surface personal information published online (residential addresses, family members, employment history) in contexts where that information should arguably not be aggregated. Most providers add filtering layers around personal data, with mixed results.

The third concern is content policy. Disallowed content (malware, weaponizable instructions, CBRN information) sometimes appears in source pages, and the agent must be trained to neither reproduce it nor act on it. Each major launch has been preceded by a published system card or model card describing the red-teaming work done before release.

open-source alternatives

The arrival of commercial deep research products triggered a wave of open-source clones, often built within days of the original launches.

Hugging Face released Open Deep Research, built on the smolagents framework, days after the OpenAI launch. It reached 55% pass-at-1 on GAIA validation and held the top spot among open-source submissions on the GAIA leaderboard for an extended period. Other notable projects include:

DeerFlow: a modular multi-agent system that pairs language models with specialized tools, organized as a coordinator, planner, sub-agent team, and reporter.
SimpleDeepSearcher: a lightweight framework demonstrating that supervised fine-tuning can stand in for the more elaborate reinforcement learning recipes used by frontier labs.
Together AI Open Deep Research: an open-weight implementation aiming to replicate the core OpenAI workflow on commodity GPUs.
GPT-Researcher: a popular GitHub project predating the major launches, often used as a teaching example.
STORM and Co-STORM (Stanford): research-prototype systems focused on writing structured Wikipedia-style articles from web sources, with an emphasis on outline planning and critique.

These projects are widening access to the capability, especially for researchers and developers who need to customize the pipeline for specialized domains or run it inside an air-gapped environment.

modern context (2024 to 2026)

Deep research went from a single Google product in December 2024 to a mature category by mid-2026. Three trends stand out.

First, deep research has become a baseline feature on every paid AI tier. A consumer who pays $20 per month for any major chatbot expects deep research to be included alongside voice chat and image generation. Free-tier deep research is increasingly used as a customer acquisition tool, with Perplexity's generous five-per-day allowance pressuring competitors to expand their free quotas.

Second, the enterprise market has fragmented along data-access lines. Microsoft Researcher dominates organizations standardized on Microsoft 365 because it can read inside the tenant. Anthropic's Claude Research with Integrations targets organizations standardized on Atlassian, Linear, and similar SaaS tools. Google Gemini Deep Research is the natural choice for Google Workspace shops. The choice of deep research product is increasingly downstream of the choice of productivity stack rather than a standalone decision.

Third, white-collar workflows are reorganizing around deep research. Junior analyst tasks (background memos, competitive briefings, regulatory summaries, candidate profiles) are now routinely drafted by an agent and edited by the analyst. Time savings reported by users typically range from 30% to 70% on these tasks, with the wide range reflecting how much verification the use case actually requires. The labor market consequences are still unfolding, but several large consulting firms and investment banks have publicly slowed junior hiring in research-heavy practices, citing higher per-analyst productivity.

There is genuine uncertainty about how much further the pattern can go. Some observers argue that deep research is close to its ceiling because the bottleneck is now source quality and indexing rather than agent capability. Others argue that adding browser-controlled actions (filling forms, downloading datasets, running local code on retrieved files) will turn deep research into a much broader research-assistant category. Both views are defensible, and the gap between them is shrinking as the products converge on a similar shape.

references

"Introducing deep research." OpenAI, February 2, 2025. https://openai.com/index/introducing-deep-research/
"Deep Research System Card." OpenAI, February 25, 2025. https://cdn.openai.com/deep-research-system-card.pdf
"Try Deep Research and our new experimental model in Gemini." Google Blog, December 11, 2024. https://blog.google/products/gemini/google-gemini-deep-research/
"Google introduces Gemini 2.0: A new AI model for the agentic era." Google Blog, December 11, 2024. https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
"Deep Research is now available on Gemini 2.5 Pro Experimental." Google Blog, April 8, 2025. https://blog.google/products/gemini/deep-research-gemini-2-5-pro-experimental/
"Updates to Deep Research at I/O 2025." Google Workspace Updates, May 2025. https://workspaceupdates.googleblog.com/2025/05/deep-research-updates-gemini-io-2025.html
"Introducing Perplexity Deep Research." Perplexity AI, February 14, 2025. https://www.perplexity.ai/hub/blog/introducing-perplexity-deep-research
"Perplexity launches its own freemium deep research product." TechCrunch, February 15, 2025. https://techcrunch.com/2025/02/15/perplexity-launches-its-own-freemium-deep-research-product/
"Comet Browser: a Personal AI Assistant." Perplexity AI. https://www.perplexity.ai/comet
"Claude takes research to new places." Anthropic, April 15, 2025. https://www.anthropic.com/news/research
"How we built our multi-agent research system." Anthropic Engineering. https://www.anthropic.com/engineering/multi-agent-research-system
"Anthropic updates Claude with new Integrations feature, upgraded research tool." SiliconANGLE, May 1, 2025. https://siliconangle.com/2025/05/01/anthropic-updates-claude-new-integrations-feature-upgraded-research-tool/
"Grok 3 Beta: The Age of Reasoning Agents." xAI, February 17, 2025. https://x.ai/news/grok-3
"xAI launches DeeperSearch: a game-changer for Grok 3." The AI DB, March 19, 2025. https://www.theaidb.com/read/xai-launches-deepersearch-a-game-changer-for-grok-3
"Introducing Researcher and Analyst in Microsoft 365 Copilot." Microsoft 365 Blog, March 25, 2025. https://www.microsoft.com/en-us/microsoft-365/blog/2025/03/25/introducing-researcher-and-analyst-in-microsoft-365-copilot/
"Researcher and Analyst are now generally available in Microsoft 365 Copilot." Microsoft 365 Blog, June 2, 2025. https://www.microsoft.com/en-us/microsoft-365/blog/2025/06/02/researcher-and-analyst-are-now-generally-available-in-microsoft-365-copilot/
"You.com Launches ARI: The First Professional-Grade Research Agent for Business." BusinessWire, February 27, 2025. https://www.businesswire.com/news/home/20250227701657/en/You.com-Launches-ARI-The-Worlds-First-Professional-Grade-Research-Agent-for-Business
"You.com Introduces ARI Enterprise." BusinessWire, May 15, 2025. https://www.businesswire.com/news/home/20250515467218/en/
"BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents." OpenAI, April 2025. https://openai.com/index/browsecomp/
"Humanity's Last Exam." Center for AI Safety. https://agi.safe.ai/
"Humanity's Last Exam Benchmark Leaderboard." Artificial Analysis. https://artificialanalysis.ai/evaluations/humanitys-last-exam
"GAIA Leaderboard." Hugging Face. https://huggingface.co/spaces/gaia-benchmark/leaderboard
"Open-source DeepResearch: Freeing our search agents." Hugging Face Blog, February 2025. https://huggingface.co/blog/open-deep-research
"o3-deep-research Model." OpenAI API Documentation. https://platform.openai.com/docs/models/o3-deep-research
"Why language models hallucinate." OpenAI. https://openai.com/index/why-language-models-hallucinate/
"OpenAI Launches Deep Research API and Adds Web Search to o3, o3-Pro, and o4-Mini Models." CometAPI, June 2025. https://www.cometapi.com/openai-launches-deep-research-api-add-web-search/
"DeepSeek: A Complete Guide to Search Mode vs. Deep Thinking." Movilforum. https://en.movilforum.com/Deepseek:-A-Complete-Guide-to-Search-Mode-vs.-Deep-Thinking/

Deep Research

origin and naming

how deep research works

query planning and decomposition

iterative web browsing and source collection

iterative reasoning and adaptation

synthesis and report generation

major implementations

google gemini deep research

openai deep research

perplexity deep research

anthropic claude research

xai grok deepsearch

microsoft 365 copilot researcher and analyst

deepseek and other entrants

comparison of major deep research products

benchmarks and performance

humanity's last exam

browsecomp

gaia

simpleqa

typical use cases

market and competitive research

academic literature review

investment research and due diligence

policy and regulatory analysis

product and procurement research

scientific exploration and health information

journalism

comparison with traditional search and rag

limitations and risks

hallucination and misattribution

source quality and seo bias

recency and coverage gaps

depth versus breadth

cost and rate limits

privacy

verification burden

safety considerations

open-source alternatives

modern context (2024 to 2026)

see also

references

Improve this article

Related Articles

Context engineering

Agentic Context Engineering

Computer-use agent

OpenClaw

Hermes Agent

GPT Store

Deep Research

origin and naming

how deep research works

query planning and decomposition

iterative web browsing and source collection

iterative reasoning and adaptation

synthesis and report generation

major implementations

google gemini deep research

openai deep research

perplexity deep research

anthropic claude research

xai grok deepsearch

microsoft 365 copilot researcher and analyst

deepseek and other entrants

comparison of major deep research products

benchmarks and performance

humanity's last exam

browsecomp

gaia

simpleqa

typical use cases

market and competitive research

academic literature review

investment research and due diligence

policy and regulatory analysis

product and procurement research

scientific exploration and health information

journalism