Deep Research is a category of AI agent products that perform multi-step web research to produce comprehensive analytical reports. These systems autonomously plan a research strategy, browse the public web, read and interpret sources, evaluate credibility, synthesize findings, and deliver structured documents with citations. Unlike a conventional search query that returns links, or a single-turn large language model answer that draws on training data alone, a deep research agent executes an extended trajectory of reasoning, tool use, and refinement that may span several minutes to nearly an hour.
The term came into widespread use between December 2024 and April 2025, when nearly every major frontier AI lab launched a product or feature under this label. Google was first with Gemini Deep Research in December 2024, followed by OpenAI in February 2025, Perplexity AI and xAI within the same month, Microsoft in March, and Anthropic in April. By mid-2025 deep research had become a standard offering on the paid tier of most consumer AI products, marking a shift in how chatbots compete: not on conversational quality alone, but on the ability to do focused white-collar research work that previously took human analysts hours or days.
The phrase "deep research" was popularized by Google's December 2024 launch, but the underlying idea of agentic web browsing for research synthesis has older roots. WebGPT, an OpenAI prototype published in 2021, trained a GPT-3 descendant to use a text-based browser and produce cited long-form answers. Adept's ACT-1 (2022), Inflection's Pi research mode, and various academic systems explored similar territory. What changed in late 2024 was the convergence of three things: cheap, fast reasoning models with long context windows; reliable browser tooling exposed through tool-use APIs; and reinforcement learning recipes for training agents to plan and persist across long horizons. Once these pieces were in place, every major lab raced to ship a product under the same name, and "Deep Research" became a category rather than a brand.
The name itself is somewhat overloaded. In product UIs it usually appears as a toggle or button next to ordinary chat. In academic and developer contexts it can refer to the underlying agent architecture rather than any specific product. This article uses the capitalized form for named products and the lowercase form for the general capability.
Deep research systems combine several core capabilities: query planning, web browsing, source evaluation, iterative reasoning, and report synthesis. Implementations differ in technical detail, but the general loop is consistent.
When a user submits a complex question, the agent first formulates a research plan. It breaks the overarching question into smaller sub-tasks that can be investigated individually. A question about the competitive landscape of electric vehicle battery technology might be decomposed into sub-queries about current market leaders, recent patent filings, cost benchmarks, emerging cell chemistries, and supply chain bottlenecks. The plan is sometimes shown to the user for review before execution. Google's Gemini Deep Research, for example, presents the plan as an editable outline, and OpenAI's ChatGPT often asks one or two clarifying questions before kicking off the run.
This planning phase is what distinguishes deep research from standard information retrieval. Rather than executing a single search and returning top results, the system creates a structured investigation strategy that guides every subsequent action.
The agent then executes its plan by issuing search queries, clicking results, scrolling pages, and reading documents. Modern deep research systems handle a wide variety of content types: HTML pages, PDFs, academic papers, spreadsheets, images, and sometimes video transcripts. During a single session these agents typically perform dozens of searches and read hundreds of distinct sources.
A key capability is multi-hop traversal. If an initial search reveals a promising lead, the agent can navigate to a referenced source, read it, and continue following references deeper into the topic. This mirrors how a human researcher would trace a footnote back to a primary source. Some implementations also use a parallel architecture, where a coordinator agent spawns multiple sub-agents that each investigate a different sub-question simultaneously. Anthropic's research blueprint describes exactly this pattern: a lead Claude Opus instance plans the work and dispatches several Claude Sonnet workers in parallel, each handling one branch of the question tree.
Deep research systems do not follow a fixed script. The agent reasons about what it has found, decides which additional sources to consult, and pivots when contradictory evidence appears. If two sources disagree on a number, the agent can formulate a new query specifically aimed at resolving the conflict, often by looking for a primary source such as a regulatory filing, an academic paper, or a press release.
This behavior depends on the underlying model's reasoning ability. Most deep research products are built on a reasoning-tuned base model: o3 and o4-mini for OpenAI, Gemini 2.0 Flash Thinking and later 2.5 Pro for Google, Sonar Reasoning for Perplexity, and Claude Opus 4 for Anthropic. These models were trained with reinforcement learning on browsing and reasoning tasks, learning to plan multi-step trajectories, backtrack when a path is unproductive, and stop when enough evidence has accumulated.
After gathering sufficient information, the agent synthesizes its findings into a structured document. Reports usually include an executive summary, organized sections addressing different aspects of the question, data tables where appropriate, and inline citations that link claims to specific sources. The citation mechanism lets users verify any factual statement by following the link back to the original material.
The full process, from prompt to delivered report, takes between 2 and 45 minutes depending on the platform and the complexity of the question. Perplexity's implementation is the fastest, often finishing in under three minutes. OpenAI's typical run is 5 to 30 minutes. Anthropic's Research can run as long as 45 minutes for complex multi-source investigations.
Google announced Gemini Deep Research on December 11, 2024, alongside the broader Gemini 2.0 launch event. It was the first commercial product to ship under the "Deep Research" label and was initially available exclusively to Gemini Advanced subscribers on desktop and mobile web, with English as the only supported language at launch.
At launch the feature was powered by Gemini 1.5 Pro. Google subsequently upgraded the underlying model to Gemini 2.0 Flash Thinking (experimental), a reasoning-capable variant that improved both quality and serving efficiency. On April 8, 2025, Google upgraded again, this time to Gemini 2.5 Pro Experimental. In Google's own internal evaluations, raters preferred reports from the 2.5 Pro version over OpenAI's Deep Research roughly 69.9% of the time, with the largest margins on instruction following (60.6% to 39.4%), comprehensiveness (76.9% to 23.1%), completeness (73.3% to 26.7%), and writing quality (58.2% to 41.8%). At Google I/O in May 2025 the company added a Gemini 2.5 Flash variant for faster, lower-cost runs and rolled the feature out to more than 45 languages globally.
When a user submits a query, Gemini Deep Research presents an editable research plan before execution. The system then browses the web, gathers sources, and compiles a structured report that can be exported directly to Google Docs. As of 2025, Google offers tiered access: free users receive a small monthly allotment, Google AI Pro subscribers ($19.99 per month in the United States) receive substantially higher limits, and Google AI Ultra subscribers receive the highest quota and earliest access to model upgrades.
OpenAI launched Deep Research on February 2, 2025, initially available only to ChatGPT Pro subscribers ($200 per month). It was described as a "next-generation" agentic capability powered by an agentic version of the o3 reasoning model, optimized for web browsing and data analysis.
The model was trained end-to-end with reinforcement learning on hard browsing and reasoning tasks across many domains. Through this training the model learned core browser actions (search, click, scroll, file interpretation), Python execution in a sandboxed environment for calculations and chart generation, and the higher-level skill of reasoning across many sources to find specific facts or write a comprehensive report. The system can also browse user-uploaded files (PDFs, images, spreadsheets), generate and iterate on charts using its Python tool, and embed both generated graphs and screenshots from websites in its responses.
A typical session takes between 5 and 30 minutes. OpenAI expanded access in stages over the following months. As of April 24, 2025, the rollout reached: ChatGPT Pro at 250 queries per month, Plus, Team, Enterprise, and Edu at 25 queries per month, and Free users at 5 queries per month. A lightweight version powered by an o4-mini variant was introduced for cost-efficient queries; once a user exhausts the full quota, queries automatically fall back to this lightweight model.
On June 26, 2025, OpenAI released the Deep Research API with two named models, o3-deep-research and o4-mini-deep-research, allowing developers to integrate the capability into their own applications. The API exposes both the plan and the tool calls so that developers can audit the agent's behavior or customize the source list.
Perplexity AI launched its Deep Research feature on February 14, 2025. The company took a freemium approach: 5 queries per day are available to anyone without an account, and Pro subscribers ($20 per month) receive 500 queries per day, the most generous quota among major providers.
Perplexity's headline benchmark result was 21.1% on Humanity's Last Exam, second only to OpenAI's Deep Research at the time of launch. On the SimpleQA factuality benchmark, Perplexity Deep Research scored 93.9%, a strong result for a production system. Sessions typically complete in under three minutes, considerably faster than competing products, in part because Perplexity's Sonar model family runs on Cerebras inference hardware that produces output at roughly 1,200 tokens per second.
The underlying model evolved over 2025. The original launch used a Sonar reasoning model with a Llama 3.3 70B base. Perplexity later integrated DeepSeek R1 weights for the reasoning step and added an Advanced Deep Research update with improved accuracy, expanded source coverage, and a redesigned report layout. In parallel the company shipped a Sonar Deep Research API for programmatic access. The deep research capability is also surfaced inside the Comet browser, which Perplexity launched on July 9, 2025 for Windows and macOS. Comet's built-in assistant can run a deep research investigation across the user's open tabs, a workflow particularly popular with graduate students conducting literature reviews.
Anthropic launched Research for Claude on April 15, 2025. The feature lets Claude conduct multi-step web investigations and pull in data from the user's connected Google Workspace (Gmail, Calendar, Drive, Docs) to deliver citation-backed answers.
When Research is activated, Claude operates as a small agent team. A lead agent (Claude Opus 4 in the original implementation) parses the prompt, drafts a strategy, and dispatches several sub-agents (Claude Sonnet 4) that each search a slice of the question space in parallel. Findings stream back to the lead agent, which decides what to investigate next and ultimately writes the final report. In Anthropic's internal evaluations, this multi-agent setup outperformed a standalone Opus 4 agent by roughly 90.2% on their internal research benchmark. The trade-off is cost: a parallel research run uses about 15 times more tokens than a normal Claude conversation. Sessions typically run from 5 to 45 minutes.
On May 1, 2025, Anthropic expanded the Research feature with the Integrations system, letting Claude pull information from connected third-party services including Atlassian Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. Research is available on the Claude Max, Team, and Enterprise plans. During the initial rollout it was limited to users in the United States, Japan, and Brazil; coverage has since expanded to most regions where Claude is sold.
xAI introduced DeepSearch as part of the Grok 3 launch on February 17, 2025. DeepSearch scans the open web and the X (formerly Twitter) platform to generate detailed summaries in response to complex queries. The integration with X gives DeepSearch a distinctive capability: it can analyze real-time social discourse, breaking news, and trending topics in a way that competitors with no native social data feed cannot.
DeepSearch is available to X Premium+ subscribers and SuperGrok subscribers ($30 per month). On March 19, 2025, xAI followed up with DeeperSearch, an upgraded mode that uses extended search and additional reasoning steps for more thorough investigations. xAI positions DeepSearch as both a research tool and a real-time intelligence feed, with use cases ranging from market sentiment tracking to tracking the development of a breaking news story across multiple sources and posts.
Microsoft announced two reasoning agents for Microsoft 365 Copilot, Researcher and Analyst, on March 25, 2025. The Researcher agent combines a version of OpenAI's deep research model with Microsoft 365's enterprise data graph, letting it pull from emails, meetings, files, chats, and connected business applications such as Salesforce, ServiceNow, and Confluence in addition to the public web. The Analyst agent, built on OpenAI's o3-mini, focuses on data analysis and can run Python on user-supplied spreadsheets to produce charts and statistical summaries.
Microsoft began rolling Researcher and Analyst out to Microsoft 365 Copilot license holders in April 2025 through a "Frontier" early-access program, and reached general availability on June 2, 2025. The enterprise positioning is the key differentiator: Microsoft markets the agents as a way to bring deep research into the regulatory, security, and compliance boundary of an organization's existing Microsoft 365 tenant, rather than sending corporate data to a separate consumer chatbot.
DeepSeek added a search mode to its chat interface that performs web retrieval before answering, and a separate DeepThink (R1) reasoning mode for extended chain-of-thought. The two features cannot run together in the consumer chat product, which means DeepSeek does not yet offer a fully unified deep research mode comparable to OpenAI's. DeepSeek-V4 Preview, released in early 2026, expanded the agent capabilities and brought them closer to the unified pattern seen in the Western products.
You.com launched ARI (Advanced Research and Insights) in February 2025, then released ARI Enterprise in May 2025 with a focus on financial analysts and management consultants. ARI's distinguishing claim is breadth: the product processes more than 400 sources in a single run, roughly an order of magnitude more than the typical OpenAI or Perplexity session. You.com reports that ARI Enterprise scored 80% on a deep research adaptation of the FRAMES benchmark and won 76% of head-to-head comparisons against OpenAI's Deep Research as judged by OpenAI's own o3-mini model.
Other notable entrants include Kagi's Assistant research mode, Mistral's Le Chat agentic search, Cohere's Command R+ research integrations, and several specialized vertical products such as Elicit (academic literature) and Consensus (research synthesis from peer-reviewed papers).
| Product | Provider | Launched | Underlying model | Typical run time | Free tier | Paid tier price | Paid quota | Distinct strength |
|---|---|---|---|---|---|---|---|---|
| Gemini Deep Research | Dec 11, 2024 | Gemini 2.5 Pro Experimental | 5 to 15 min | 5 reports/month | $19.99/mo (AI Pro) | Adaptive | Workspace integration, multilingual | |
| Deep Research | OpenAI | Feb 2, 2025 | o3 (browsing tuned) | 5 to 30 min | 5 queries/month | $20/mo (Plus), $200/mo (Pro) | 25/mo Plus, 250/mo Pro | Benchmark scores, file upload, charts |
| Deep Research | Perplexity | Feb 14, 2025 | Sonar (Llama 3.3 70B base) | Under 3 min | 5 queries/day | $20/mo (Pro) | 500/day | Speed, free tier, citation precision |
| DeepSearch | xAI | Feb 17, 2025 | Grok 3 | 1 to 10 min | Limited trial | $30/mo (SuperGrok) | Not disclosed | Real-time X data, breaking news |
| Researcher | Microsoft | Mar 25, 2025 | OpenAI deep research model | 3 to 20 min | None | M365 Copilot license | Enterprise | M365 graph, business connectors |
| Research | Anthropic | Apr 15, 2025 | Claude Opus 4 + Sonnet 4 | 5 to 45 min | None | Max/Team/Enterprise | Not disclosed | Parallel sub-agents, integrations |
| ARI / ARI Enterprise | You.com | Feb 2025 / May 2025 | Multi-model ensemble | 3 to 10 min | Limited | Custom enterprise pricing | Custom | 400+ source processing, consulting focus |
| DeepThink + Search | DeepSeek | 2025 | DeepSeek R1 / V4 | Varies | Free | $0 | Generous | Cost, open weights for R1 base |
Dates and prices reflect launch and subsequent confirmed updates through April 2026.
Deep research systems are evaluated on benchmarks designed to test complex reasoning, multi-step retrieval, and factual accuracy. Three benchmarks dominate the conversation as of early 2026.
Humanity's Last Exam (HLE), released by the Center for AI Safety in January 2025, contains roughly 2,500 expert-written questions across more than 100 academic subjects, from rocket science to analytic philosophy. It was deliberately constructed to be very hard for AI systems; baseline models scored under 10% at release. Deep research agents posted significant improvements:
| System | HLE accuracy |
|---|---|
| OpenAI Deep Research (o3-based) | 26.6% |
| Perplexity Deep Research | 21.1% |
| OpenAI o3-mini (high) | 13.0% |
| OpenAI o3-mini | 10.5% |
| OpenAI o1 | ~9% |
| DeepSeek R1 | ~9% |
OpenAI's Deep Research nearly tripled the previous best score. The largest gains appeared on questions related to chemistry, the humanities and social sciences, and mathematics.
BrowseComp is a benchmark released by OpenAI in April 2025, specifically designed to evaluate AI browsing agents. It contains 1,266 problems that require an agent to persistently navigate multiple websites to retrieve hard-to-find, entangled information. The benchmark was constructed by humans who could not solve most of the questions in a few hours of browsing.
| System | BrowseComp accuracy (single attempt) |
|---|---|
| OpenAI Deep Research | 51.5% |
| OpenAI o1 | 9.9% |
| GPT-4o with browsing | 1.9% |
| GPT-4.5 | ~0% |
Deep Research solved roughly half the problems on a single attempt and reached 78% with majority voting over 64 samples. The near-zero scores from GPT-4o and GPT-4.5 highlight that without strong reasoning and persistent tool use, models simply cannot retrieve the kinds of obscure, multi-hop facts BrowseComp targets.
GAIA (General AI Assistants) is a 450-question benchmark testing reasoning, multimodality, web browsing, and tool use. OpenAI's Deep Research scored 67.36% on the GAIA validation set, setting a new state of the art. By comparison, GPT-4 without an agentic harness scored below 7%. Hugging Face's open-source Open Deep Research, built on the smolagents framework, reached 55% pass-at-1 on GAIA validation, including 47.6% on the hardest level-3 questions, and held the top open-submission slot on the GAIA leaderboard for several months in 2025.
SimpleQA, an OpenAI-built factuality benchmark of several thousand short questions, is used as a check on a deep research system's tendency to hallucinate on simple lookups. Perplexity Deep Research scored 93.9%, substantially above the standalone leading models, which suggests that the iterative search and citation step recovers many facts that a non-retrieval model would miss or misremember.
Deep research agents are now applied across a wide range of professional and academic contexts. The most common uses fall into a handful of categories.
Businesses use deep research tools to map competitive landscapes, track industry trends, evaluate potential partners or acquisition targets, and compile market sizing estimates. The ability to pull dozens of sources into a single coherent report shortens analyst work that previously took days. Strategy consultancies are among the heaviest early adopters; You.com explicitly markets ARI Enterprise to firms in this segment.
Researchers use deep research to survey existing literature on a topic, identify key papers, map the relationships between research threads, and find gaps. The product does not replace careful academic review, but it shortens the orientation phase considerably. Graduate students using Perplexity's Comet browser report literature-review time reductions of 60% to 70% compared to manual reading.
Investors use deep research for company background work, sector analysis, patent-landscape reviews, and quick technical due diligence on early-stage companies. The structured report format with explicit citations is well suited for sharing inside investment teams and producing audit trails for regulators. Sell-side analysts use the same workflow for note generation.
Policy researchers and compliance teams use deep research to track regulatory changes across jurisdictions, summarize proposed legislation, and compile government guidance on specific topics. Multi-jurisdiction comparisons that once required a small team can now be drafted in a single 30-minute run, with humans reviewing the citations and editing the final language.
Consumers and procurement teams use deep research for detailed product comparisons. Specifications, reviews, pricing, and availability data get pulled into organized tables. Microsoft's Researcher agent is heavily used for vendor comparisons inside enterprise IT.
Scientists use deep research to explore interdisciplinary connections and gather methodological notes from adjacent fields. Anthropic's Claude for Life Sciences product, launched in October 2025, applies the deep research pattern specifically to biomedical literature. Patients and clinicians also use the consumer products to gather background information on conditions and treatments, although every major provider warns against using the output as medical advice.
Reporters use deep research to gather background on a person, company, or event before drafting an article. The citation trail is helpful for fact-checking. The flip side is that newsroom editors increasingly worry about reporters relying too heavily on the synthesized summary rather than reading primary sources directly, and most major outlets have published internal guidelines on appropriate use.
Deep research occupies a distinct position relative to other ways of getting answers from the web.
| Approach | What it returns | User effort | Source visibility | Latency |
|---|---|---|---|---|
| Web search | Ranked links | High; user reads each result | Full | Sub-second |
| Retrieval augmented generation | Single answer with retrieved context | Low | Often only top citations | Seconds |
| Standard chatbot answer | Single answer from training data | Low | None | Seconds |
| Deep research | Multi-section report with many citations | Low up front, moderate verification later | Full citation trail | Minutes |
The distinction from RAG matters because the two are sometimes confused. RAG retrieves a small number of context chunks for a single answer turn; deep research runs many sequential RAG-like loops, each one informed by the result of the previous step, and stitches the results into a long-form document. Deep research is essentially RAG with an outer planning loop and a final synthesis step.
The capabilities are real, but every deep research product has well-documented failure modes that users should understand before relying on the output for important work.
Like all systems built on large language models, deep research agents can generate factually incorrect content. OpenAI's Deep Research system card, published February 25, 2025, explicitly notes that the model may produce false statements and that its chain of thought sometimes hallucinates access to tools or capabilities the agent does not have. A more subtle and more common problem is misattribution: a sentence is correct but the citation does not actually support it. Most hallucinations can be caught by clicking the citations, but doing so for an entire 5,000-word report is itself a significant effort. The system card also warns about misinformation by omission: the report may miss a crucial fact simply because the agent's searches did not surface it.
OpenAI's own published research on hallucination argues that some hallucinations stem from fundamental properties of the training process, including epistemic uncertainty when information appears rarely in training data, and that they cannot be entirely eliminated through engineering alone.
Deep research agents are downstream of web search rankings. SEO-optimized content can crowd out more authoritative but less visible sources. Independent reviewers have noted that Google's Gemini Deep Research is more susceptible to this bias than OpenAI's, presumably because it leans on Google Search results, while OpenAI's agent uses a more constrained internal browser surface that includes a tighter set of sources. None of the products fully solves the problem, and a determined SEO operator can still influence what shows up in a deep research report.
Very recent information, paywalled content, and content not indexed by the underlying search backend may simply not be visible to the agent. Events that happened after the start of the search session will be missed entirely, and pages behind authentication walls are inaccessible unless an integration explicitly grants access. The result is a report that can confidently omit the most important fact about a topic if that fact lives in a place the agent could not see.
Deep research reports cover a wide range of subtopics but rarely match a domain expert's analysis on any single point. The text reads like a strong undergraduate research paper rather than a senior analyst's memo. For high-stakes decisions, the report works best as a starting brief that a human expert then refines.
The most capable products require paid subscriptions, and even paid users hit query limits. OpenAI's $200 per month Pro plan provides 250 queries; the $20 per month Plus plan offers 25. Anthropic's parallel architecture uses about 15 times the tokens of a normal chat, which the company manages via tiered pricing rather than by exposing per-query counts. Heavy users frequently combine multiple products to stretch their quotas.
A deep research run reveals what the user is investigating and what data they consider relevant. Some queries (medical conditions, legal exposure, M&A targets, salary ranges) are sensitive in ways a normal search query is not, because they include enough context to identify the user's situation. Enterprise products such as Microsoft Researcher and Anthropic's connected Workspace integration mitigate this by keeping the queries inside a controlled tenant, but the consumer products send queries through the provider's standard logging pipeline.
Citations look authoritative, but they shift the verification burden onto the user rather than removing it. The presence of a citation does not guarantee that the source supports the claim as written. For high-stakes work (legal, medical, financial), every cited claim still needs human review, which can negate much of the time saved by the automation. Several law firms and financial institutions have published internal guidance forbidding use of deep research output in client-facing materials without independent verification.
The agentic nature of deep research raises safety questions that go beyond those of standard chatbots. Because the agent browses the open web autonomously, it may encounter adversarial content explicitly designed to manipulate it.
Prompt injection is the most prominent concern. A hostile webpage can include text that instructs the agent to ignore the user's original task and instead leak data, click a malicious link, or fabricate a particular conclusion. OpenAI's Deep Research system card classifies the model as medium risk overall and details specific training measures aimed at prompt injection resistance. Anthropic's parallel agent architecture spreads the attack surface across multiple sub-agents, which complicates the attack but does not eliminate it.
Privacy protections are a second concern. Deep research agents sometimes surface personal information published online (residential addresses, family members, employment history) in contexts where that information should arguably not be aggregated. Most providers add filtering layers around personal data, with mixed results.
The third concern is content policy. Disallowed content (malware, weaponizable instructions, CBRN information) sometimes appears in source pages, and the agent must be trained to neither reproduce it nor act on it. Each major launch has been preceded by a published system card or model card describing the red-teaming work done before release.
The arrival of commercial deep research products triggered a wave of open-source clones, often built within days of the original launches.
Hugging Face released Open Deep Research, built on the smolagents framework, days after the OpenAI launch. It reached 55% pass-at-1 on GAIA validation and held the top spot among open-source submissions on the GAIA leaderboard for an extended period. Other notable projects include:
These projects are widening access to the capability, especially for researchers and developers who need to customize the pipeline for specialized domains or run it inside an air-gapped environment.
Deep research went from a single Google product in December 2024 to a mature category by mid-2026. Three trends stand out.
First, deep research has become a baseline feature on every paid AI tier. A consumer who pays $20 per month for any major chatbot expects deep research to be included alongside voice chat and image generation. Free-tier deep research is increasingly used as a customer acquisition tool, with Perplexity's generous five-per-day allowance pressuring competitors to expand their free quotas.
Second, the enterprise market has fragmented along data-access lines. Microsoft Researcher dominates organizations standardized on Microsoft 365 because it can read inside the tenant. Anthropic's Claude Research with Integrations targets organizations standardized on Atlassian, Linear, and similar SaaS tools. Google Gemini Deep Research is the natural choice for Google Workspace shops. The choice of deep research product is increasingly downstream of the choice of productivity stack rather than a standalone decision.
Third, white-collar workflows are reorganizing around deep research. Junior analyst tasks (background memos, competitive briefings, regulatory summaries, candidate profiles) are now routinely drafted by an agent and edited by the analyst. Time savings reported by users typically range from 30% to 70% on these tasks, with the wide range reflecting how much verification the use case actually requires. The labor market consequences are still unfolding, but several large consulting firms and investment banks have publicly slowed junior hiring in research-heavy practices, citing higher per-analyst productivity.
There is genuine uncertainty about how much further the pattern can go. Some observers argue that deep research is close to its ceiling because the bottleneck is now source quality and indexing rather than agent capability. Others argue that adding browser-controlled actions (filling forms, downloading datasets, running local code on retrieved files) will turn deep research into a much broader research-assistant category. Both views are defensible, and the gap between them is shrinking as the products converge on a similar shape.