| DeepSeek | |
|---|---|
| 杭州深度求索人工智能基础技术研究有限公司 | |
| * | |
| Type | Private |
| Industry | Artificial intelligence |
| Founded | July 17, 2023 |
| Founder | Liang Wenfeng |
| Headquarters | Hangzhou, Zhejiang, China |
| Key people | Liang Wenfeng (CEO) |
| Owner | High-Flyer Capital Management (Hangzhou Huanfang Technology) |
| Products | DeepSeek-V2, DeepSeek-V3, DeepSeek-V3.1, DeepSeek-V3.2, DeepSeek-Coder, DeepSeek-Coder-V2, DeepSeek-R1, DeepSeek-VL2, DeepSeek-OCR |
| Employees | ~200 (2025) |
| Website | https://www.deepseek.com/ |
DeepSeek (Chinese: 杭州深度求索人工智能基础技术研究有限公司; commonly DeepSeek AI or simply DeepSeek) is a Chinese artificial intelligence company known for developing large language models (LLMs) and releasing several prominent open-source and research models. Founded in 2023 by hedge fund entrepreneur Liang Wenfeng, the company has gained international recognition for achieving competitive performance with leading Western AI models at dramatically lower training costs.[1][2]
DeepSeek rose to global prominence in January 2025 when its mobile app briefly topped the Apple App Store's free charts in the United States, following the release of its reasoning-focused DeepSeek-R1 models. The company's claim of training competitive models for under $6 million using Nvidia H800 GPUs, compared to over $100 million for Western equivalents, caused significant market disruption, with Nvidia losing nearly $600 billion in market capitalization.[3][4] The event, widely called "AI's Sputnik moment," triggered a broad reassessment of whether frontier AI necessarily requires massive capital expenditure.[10]
By late 2025 and into 2026, DeepSeek continued to release increasingly capable models (V3.1, V3.2, and the anticipated V4), faced accusations of distilling Western models through fake accounts, and became a focal point in the U.S.-China AI chip export control debate after reports emerged that it had trained models on banned Nvidia Blackwell chips.[50][51][52]
DeepSeek's origins trace back to High-Flyer Capital Management, a Chinese quantitative hedge fund co-founded in February 2016 by Liang Wenfeng and two classmates from Zhejiang University.[1] High-Flyer began adopting deep learning models for stock trading on October 21, 2016, transitioning from CPU-based linear models to GPU-dependent systems. By 2021, the fund relied exclusively on AI for trading operations.[5]
High-Flyer grew rapidly through its AI-driven trading strategies. By 2019, the fund had accumulated at least $10 billion in assets under management.[53] That same year, High-Flyer built its first computing cluster, Fire-Flyer (萤火一号), at a cost of 200 million yuan, equipped with 1,100 GPUs. Anticipating U.S. export restrictions on advanced chips to China, Liang acquired 10,000 Nvidia A100 units before restrictions took effect. Construction of Fire-Flyer 2 (萤火二号) began in 2021 with a 1 billion yuan budget, incorporating 5,000 PCIe A100 GPUs across 625 nodes by 2022.[5][6]
The transition from hedge fund to AI lab was not accidental. Liang and his co-founders had been exploring algorithmic trading ideas since their university days during the 2008 financial crisis, and the infrastructure High-Flyer built for quantitative finance proved directly applicable to training large language models.[53]
On April 14, 2023, High-Flyer announced the establishment of an artificial general intelligence (AGI) research lab. This lab was formally incorporated as DeepSeek on July 17, 2023, with High-Flyer serving as the principal investor. Venture capital firms were initially reluctant to invest, considering the lack of short-term exit opportunities.[1][7]
The company released its first model, DeepSeek Coder, on November 2, 2023, followed by the DeepSeek-LLM series on November 29, 2023. Throughout 2024, DeepSeek continued releasing specialized models:
January 2024: DeepSeek-MoE models (Base and Chat variants)
April 2024: DeepSeek-Math models (Base, Instruct, and RL)
May 2024: DeepSeek-V2
June 2024: DeepSeek-Coder V2 series
September 2024: DeepSeek V2.5[5]
On December 26, 2024, DeepSeek released DeepSeek-V3, a Mixture of Experts model with 671 billion total parameters and 37 billion activated per token. The model was pre-trained on 14.8 trillion tokens using 2,048 Nvidia H800 GPUs. DeepSeek reported the full training run required 2.788 million H800 GPU hours, costing approximately $5.57 million, a figure that stunned the AI industry given the model's competitive performance with GPT-4o and Claude 3.5 Sonnet on standard benchmarks.[12][13]
V3 introduced several technical innovations: FP8 mixed precision training, an auxiliary-loss-free load balancing strategy for its MoE routing, a custom DualPipe algorithm for pipeline parallelism, and a multi-token prediction training objective that improved sample efficiency. These engineering choices collectively enabled frontier-level performance at roughly one-twentieth the estimated training cost of comparable Western models.[12]
On January 20, 2025, the company announced DeepSeek-R1, a reasoning-centric model using pure reinforcement learning that matched performance of OpenAI's o1 family at significantly lower costs. R1 was released under the MIT license, making it one of the most capable fully open-source reasoning models available. DeepSeek also open-sourced DeepSeek-R1-Zero (trained entirely via RL without supervised fine-tuning), plus six distilled models based on Llama and Qwen ranging from 1.5B to 70B parameters.[8][9][25]
DeepSeek-R1-Zero was a particularly notable research contribution. It demonstrated that reasoning capabilities, including self-verification, reflection, and the generation of long chain-of-thought traces, could emerge from pure reinforcement learning without any supervised fine-tuning. This finding influenced subsequent research across the field and opened new avenues for training reasoning models.[8]
DeepSeek's mobile app reached #1 among free apps on the U.S. Apple App Store on January 27-28, 2025. This surge coincided with an 18% drop in Nvidia's share price and over $1 trillion erased from U.S. tech market capitalization, the largest single-day loss in U.S. stock market history. Prominent tech investor Marc Andreessen described this as "AI's Sputnik moment."[3][10][4]
On January 27-28, 2025, DeepSeek reported large-scale malicious attacks on its services, temporarily restricting new sign-ups.[11]
Throughout 2025, DeepSeek maintained a rapid release cadence:
In January 2026, DeepSeek and Peking University co-published two foundational research papers that signaled the architectural direction for the next generation of models. The first, on Manifold-Constrained Hyper-Connections (mHC), proposes a rethink of residual connections in transformer training by constraining connection matrices onto the Birkhoff Polytope using the Sinkhorn-Knopp algorithm. A team of 19 DeepSeek researchers, including founder Liang Wenfeng, tested mHC on models with 3 billion, 9 billion, and 27 billion parameters. The technique scaled without adding significant computational burden (roughly 6-7% training overhead) and produced models with lower loss and better performance on reasoning and language benchmarks.[40][59]
The second paper introduced Engram conditional memory, a module that decouples static pattern storage from dynamic reasoning to achieve constant-time knowledge retrieval across contexts exceeding one million tokens.[41]
In February 2026, a senior U.S. government official revealed that DeepSeek had trained its upcoming AI model on Nvidia's Blackwell chips, the company's most advanced AI accelerator. The Blackwell chips fall under strict U.S. export controls that explicitly prohibit their shipment to China. According to the official, the chips are likely located in a DeepSeek data center in Inner Mongolia. How DeepSeek obtained the chips was not disclosed, though analysts suggested they may have entered China through intermediary countries where export restrictions are less stringent. The official also noted that DeepSeek was expected to remove technical indicators that might reveal its use of American chips.[50][60]
Neither Nvidia, the U.S. Commerce Department, nor DeepSeek commented on the allegations. China's government rejected the accusations as politicizing trade and technology issues.[60]
On February 23, 2026, Anthropic published a blog post alleging that DeepSeek, along with Chinese AI companies Moonshot AI and MiniMax, had used approximately 24,000 fake accounts to generate more than 16 million exchanges with Claude. The campaigns allegedly violated Anthropic's terms of service and regional access restrictions. According to Anthropic, the labs scripted long, high-token conversations designed to extract detailed, step-by-step answers that could be fed back into their own systems as training data. The extraction campaigns reportedly focused on areas that Anthropic considers key differentiators for Claude, including complex reasoning, coding assistance, and tool use.[51][61]
Of the three labs, MiniMax was alleged to have generated the most activity (over 13 million interactions), followed by Moonshot (over 3.4 million). DeepSeek's activity reportedly focused specifically on extracting step-by-step reasoning traces. In response, Anthropic announced that it had built classifiers and behavioral fingerprinting systems to identify suspicious distillation patterns in API traffic, strengthened account verification processes, and implemented safeguards to reduce the usefulness of model outputs for illicit distillation.[51][61]
DeepSeek-V4 is expected to be a trillion-parameter MoE model with approximately 32 billion active parameters per token, native multimodal capabilities (text, image, and video), and a context window exceeding one million tokens. Internal benchmarks reportedly show scores above 80% on SWE-bench Verified, though these claims remain unverified by independent evaluators as of March 2026.[42][43]
The release has been delayed multiple times. Initial reports pointed to a mid-February 2026 launch, but that window passed without a release. Subsequent projected dates around Lunar New Year, late February, and early March 2026 also came and went. As of March 2026, Chinese tech outlet Whale Lab and other sources report an expected April 2026 launch. DeepSeek has reportedly been working with Huawei and Cambricon to optimize V4 for Chinese-made AI accelerators.[43][44]
In a notable break with industry convention, DeepSeek withheld pre-release builds of V4 from Nvidia and AMD, denying the U.S. chipmakers the opportunity to optimize their hardware for the new model. Instead, the company granted early access to domestic suppliers, including Huawei Technologies, giving Chinese processors a weeks-long head start. The move was interpreted as a deliberate effort to strengthen China's local silicon ecosystem while disadvantaging U.S. hardware vendors.[62]
In mid-March 2026, a powerful anonymous model called "Hunter Alpha" appeared on the OpenRouter platform, sparking widespread speculation that it was a stealth test of DeepSeek V4. The model was later revealed to be from Xiaomi, though some developers noted similarities to DeepSeek's reasoning patterns.[45]
DeepSeek's models employ a Mixture of Experts architecture, which allows massive parameter counts while maintaining computational efficiency. The MoE framework in DeepSeek-V3 consists of:[12][13]
671 billion total parameters
37 billion activated parameters per forward pass
256 routed experts per layer (increased from 160 in V2)
1 shared expert per layer that is always activated
3 all-experts-activated layers
The DeepSeekMoE component uses dynamic routing, with each token interacting with 8 specialized experts plus the single shared expert. This design carries forward through V3.1, V3.1-Terminus, and V3.2.[54]
DeepSeek-V2 and subsequent models incorporate Multi-head Latent Attention (MLA), a modified attention mechanism that compresses the key-value (KV) cache. MLA achieves:[2][14]
KV-cache reduction to 5-13% of traditional methods
Significant memory overhead reduction during inference
Support for 128K-164K token context windows
Lower computational cost for long-context processing
DeepSeek-R1 employs a distinctive training pipeline:[8][15]
Notably, DeepSeek-R1-Zero demonstrated that reasoning capabilities can emerge from pure reinforcement learning without any supervised fine-tuning, a finding that influenced subsequent research across the field.[8]
Introduced in DeepSeek-V3.2-Exp (September 2025), DSA is a fine-grained sparse attention mechanism optimized for long-context training and inference efficiency with minimal performance impact. DSA reduces the computational complexity of attention from O(L^2) to O(L), where L is the context length, resulting in significant end-to-end speedups in long-context scenarios.[16][57]
Starting with DeepSeek-V3.1 (August 2025), DeepSeek introduced a hybrid reasoning architecture that dynamically toggles between fast inference and deep chain-of-thought reasoning using token-controlled templates. Users can select non-thinking mode for tasks requiring fast responses or thinking mode for problems needing step-by-step analysis. In thinking mode, V3.1 achieves answer quality comparable to DeepSeek-R1 while responding significantly faster, making deep reasoning practical for production applications. DeepSeek-V3.2 extended this further by becoming the first model to integrate thinking directly into tool use, supporting tool calling in both thinking and non-thinking modes.[54][57]
Published in January 2026, mHC is a proposed replacement for conventional residual connections in transformer networks. The technique uses multiple information streams and constrains mixing steps using the Sinkhorn-Knopp algorithm so that the total information flow remains constant, preventing gradient explosion or vanishing. In experiments, mHC trained smoothly at 3B, 9B, and 27B parameter scales while unconstrained Hyper-Connections often became unstable. The resulting models achieved lower loss and better performance on reasoning and language benchmarks with only 6-7% training overhead.[40][59]
In October 2025, DeepSeek released DeepSeek-OCR, an open-source end-to-end document OCR and understanding system that explores "contexts optical compression": representing long text as images and decoding it back with a vision-language stack to save tokens for long-context LLM applications.[17][18]
Architecture: A ~380M-parameter DeepEncoder (SAM-base window attention, 16x token compression via 2-layer conv, CLIP-large global attention) feeds a 3B MoE decoder (DeepSeek-3B-MoE-A570M; ~570M active params at inference). Multiple resolution modes control vision-token budgets: Tiny (64 tokens, 512 squared), Small (100, 640 squared), Base (256, 1024 squared), Large (400, 1280 squared), plus tiled Gundam (n x 100 + 256 tokens) and Gundam-M modes for ultra-high-res pages.[17]
Reported compression/accuracy: On a Fox benchmark subset (English pages with 600-1,300 text tokens), the paper reports approximately 97% decoding precision when text tokens are less than 10x vision tokens, and ~60% accuracy around 20x compression. On OmniDocBench (edit distance; lower is better), Small (100 tokens) outperforms GOT-OCR 2.0 (256 tokens), and Gundam (<~800 tokens) surpasses MinerU-2.0 (~6,790 tokens) in the reported setup.[17]
Throughput/uses: DeepSeek positions the system as a data-engine for LLM/VLM pretraining, claiming over 200k pages/day on a single A100-40G, scalable to tens of millions per day on clusters, plus "deep parsing" of charts, chemical structures (SMILES), and planar geometry into structured outputs (for example HTML tables or dictionaries).[18]
Availability/ecosystem: Source code and weights are hosted on GitHub and Hugging Face, with examples for Transformers/vLLM inference. Community walkthroughs (for example Simon Willison) documented running the 6.6-GB model on diverse hardware and shared setup notes.[19][20][21]
DeepSeek operates two primary computing clusters:[5]
Fire-Flyer 1 (萤火一号): Built 2019, retired after 1.5 years
Fire-Flyer 2 (萤火二号): Operational since 2022, featuring: Nvidia GPUs with 200 Gbps interconnects
Fat tree topology for high bisection bandwidth
3FS distributed file system with Direct I/O and RDMA
2,048 Nvidia H800 GPUs used for R1 training
In February 2026, reports indicated that DeepSeek also operates a data center in Inner Mongolia, where Nvidia Blackwell chips were allegedly used for training newer models.[50][60]
The following table compares benchmark results across DeepSeek's major model releases and selected competitors.
| Benchmark | DeepSeek-V3 | DeepSeek-R1 | DeepSeek-R1-0528 | DeepSeek-V3.2-Speciale | GPT-4o | GPT-5 | Description |
|---|---|---|---|---|---|---|---|
| MMLU | 88.5 | 91.8 | N/A | N/A | 87.2 | N/A | Massive Multitask Language Understanding |
| HumanEval | 82.6 | 85.4 | N/A | N/A | 80.5 | N/A | Code Generation |
| MATH-500 | 90.2 | 97.3 | N/A | N/A | 74.6 | N/A | Mathematical Problem-Solving |
| Codeforces | 51.6 | 57.2 | ~1,930 (rating) | N/A | 23.6 | N/A | Complex Coding Performance |
| GPQA | 59.1 | 72.3 | N/A | N/A | N/A | N/A | Graduate-Level Question Answering |
| AIME 2024 | N/A | 79.8% | 91.4% | N/A | N/A | N/A | American Invitational Mathematics Exam |
| AIME 2025 | N/A | 70.0% | 87.5% | 96.0% | N/A | 94.6% | American Invitational Mathematics Exam |
| SWE-bench Verified | N/A | N/A | N/A | N/A | N/A | N/A | Software Engineering Tasks |
The following table tracks improvements across the V3 model family on key software engineering and agent benchmarks.
| Benchmark | V3-0324 | V3.1 | V3.1-Terminus | V3.2 |
|---|---|---|---|---|
| SWE-bench Verified | 45.4% | 66.0% | 68.4% | N/A |
| SimpleQA | N/A | 93.4 | 96.8 | N/A |
| BrowseComp | N/A | 30.0 | 38.5 | N/A |
| Aider Pass Rate | N/A | 71.6% | N/A | N/A |
| Terminal-bench | N/A | 31.3 | 36.7 | N/A |
The May 2025 update to R1 brought substantial gains over the original January 2025 release:[39]
| Benchmark | R1 (Jan 2025) | R1-0528 (May 2025) | Change |
|---|---|---|---|
| AIME 2024 | 79.8% | 91.4% | +11.6% |
| AIME 2025 | 70.0% | 87.5% | +17.5% |
| Codeforces Rating | ~1,530 | ~1,930 | +400 points |
| Model | Type/Focus | Parameters | Context Length | Release Date | Key Features |
|---|---|---|---|---|---|
| DeepSeek-V2 | General LLM (MoE) | 236B total; 21B active | 128K | May 2024 | MLA; DeepSeekMoE routing[2] |
| DeepSeek-V3 | General LLM (MoE) | 671B total; 37B active | 131K | Dec 2024 | Enhanced MoE; FP8 training; $5.6M training cost[12] |
| DeepSeek-V3-0324 | General LLM (MoE) | 671B total; 37B active | 131K | Mar 2025 | Top open-source non-reasoning model[37] |
| DeepSeek-V3.1 | Hybrid MoE | 671B total; 37B active | 128K | Aug 2025 | Hybrid thinking/non-thinking modes; 66.0% SWE-bench Verified[26][54] |
| DeepSeek-V3.1-Terminus | Hybrid MoE | 671B total; 37B active | 128K | Sep 2025 | Enhanced tool use; reduced language mixing; 68.4% SWE-bench[55] |
| DeepSeek-V3.2-Exp | Experimental | 671B total; 37B active | 128K+ | Sep 2025 | Sparse attention (DSA); Huawei Ascend support[16][27] |
| DeepSeek-V3.2 | Hybrid MoE | 671B total; 37B active | 128K+ | Dec 2025 | Thinking integrated into tool use; GPT-5-class performance[57] |
| DeepSeek-V3.2-Speciale | High-compute reasoning | 671B total; 37B active | 128K+ | Dec 2025 | IMO gold-level; 96.0% AIME 2025; surpasses GPT-5[57][58] |
| DeepSeek-Coder | Code LLMs | Various sizes | 16K | Nov 2023 | 2T tokens; 87% code / 13% NL; infilling[23] |
| DeepSeek-Coder-V2 | Code LLMs (MoE) | 236B total; 21B active | 128K | June 2024 | +6T tokens; GPT-4-Turbo comparable[24] |
| DeepSeek-R1 | Reasoning post-training | 671B total; 37B active | 164K | Jan 2025 | Pure RL training; MIT license; o1-level performance[8][25] |
| DeepSeek-R1-0528 | Reasoning (updated) | 671B total; 37B active | 164K | May 2025 | +17.5% AIME 2025; function calling; JSON output[39] |
| DeepSeek-Prover-V2 | Theorem Proving | 671B (also 7B variant) | 32K | Apr 2025 | 88.9% MiniF2F-test; Lean 4 formal proofs[38] |
| DeepSeekMath-V2 | Mathematics | N/A | N/A | Nov 2025 | 118/120 on Putnam Competition[56] |
| DeepSeek-VL2 | Vision-Language (MoE) | 27B total; 4.5B active | 4K | 2025 | Multimodal understanding[28] |
| DeepSeek-OCR | Document OCR | ~380M encoder + 3B MoE decoder | N/A | Oct 2025 | 200k+ pages/day on single A100[17][18] |
DeepSeek has created smaller, efficient models through knowledge distillation:[29]
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1-0528-Qwen3-8B (state-of-the-art among open-source 8B models on AIME 2024)
Various models from 1.5B to 70B parameters
Within days of R1's release under the MIT license, more than 700 open-source derivative models appeared on Hugging Face. By February 2026, DeepSeek's models had accumulated over 75 million downloads on the platform. Microsoft, AWS, and Nvidia AI platforms all onboarded DeepSeek models for their cloud inference services.[46][62]
DeepSeek provides API access through the DeepSeek Open Platform with competitive pricing:[30]
Input costs: $0.07-$0.27 per million tokens (vs $2.50 for GPT-4o)
Output costs: $1.10 per million tokens (vs $10.00 for GPT-4o)
50%+ price reduction following V3.2-Exp release (September 2025)
Pre-paid billing model
DeepSeek has claimed a 545% cost-profit ratio on its API services, suggesting the company is generating significant revenue from inference despite its low prices. By mid-2025, the annual revenue run rate had reportedly reached $220 million, primarily from API and enterprise services.[47][63]
DeepSeek's commitment to open-source releases has been a defining characteristic of the company and a major factor in its influence. The company releases model weights, training code, and technical papers under permissive licenses, most notably the MIT license for DeepSeek-R1 and its derivatives.[8][25]
This approach has had measurable effects on the broader AI ecosystem. When R1 launched in January 2025, it became the top free app in U.S. app stores within days. Hundreds of derivative models were built on top of its open weights, and major cloud providers integrated it into their platforms. Stanford HAI faculty noted that DeepSeek's open releases represented "a significant step in democratizing AI," enabling smaller companies and individual developers to build on frontier-capable models without needing to spend hundreds of millions on training.[46]
DeepSeek's open-source strategy also forced competitive responses. Multiple Chinese AI companies cut API prices by up to 97% in the weeks following V3's release, while Western labs faced renewed pressure to justify the cost gap between their proprietary models and DeepSeek's freely available alternatives.[4][46]
The scale of DeepSeek's open-source impact continued to grow through 2025 and 2026. By early 2026, DeepSeek models had logged over 75 million downloads on Hugging Face, outpacing open-source releases from other countries on the platform. The V3.1-Terminus, V3.2, and associated models were all released under the MIT license, maintaining the company's pattern of permissive licensing even as geopolitical tensions around its models increased.[62]
Liang Wenfeng (梁文锋), born 1985 in Guangdong province, is the founder and CEO of DeepSeek. He grew up in a fifth-tier city where his father was a primary school teacher. He graduated from Zhejiang University with:[31][53]
Bachelor of Engineering in electronic information engineering (2007)
Master of Engineering in information and communication engineering (2010)
Liang and his co-founders began exploring algorithmic trading ideas as university students during the 2008 financial crisis. He co-founded High-Flyer in 2015 and began acquiring Nvidia GPUs in 2021, purchasing 10,000 A100 chips before U.S. export restrictions. The computing infrastructure originally built for High-Flyer's quantitative trading operations proved directly transferable to training large language models, providing the foundation for DeepSeek's formation in 2023.[6][53]
84% owned by Liang Wenfeng through shell corporations (Liang personally holds 1% directly, with 99% held through Ningbo High-Flyer Quantitative Investment Management Partnership)
16% owned by High-Flyer affiliated individuals
No external venture capital funding as of 2025
Approximately 200 employees (2025)
Estimated valuation undisclosed; Liang's personal wealth: $4.5 billion (2025)[30][53]
DeepSeek operates with an unconventional structure:[6][5]
Bottom-up organization with natural division of labor
No preassigned roles or rigid hierarchy
Unrestricted computing resource access for researchers
Emphasis on fresh graduates and non-CS backgrounds
Recruitment from poetry, advanced mathematics, and other fields
The company's lean staffing stands in sharp contrast to competitors. OpenAI employs an estimated 3,500 people, while Google DeepMind has over 2,000. DeepSeek's ability to produce competitive models with roughly 200 employees has been cited as evidence that small, focused teams can rival much larger organizations in AI research when given sufficient compute access.[63]
DeepSeek experienced explosive growth following its January 2025 releases:[30]
30 million daily active users within weeks of launch
33.7 million monthly active users (4th largest AI application globally)
Briefly surpassed ChatGPT in daily users (21.6M vs 14.6M)
Geographic distribution (January 2025): China: 30.7%
India: 13.6%
Indonesia: 6.9%
United States: 4.3%
France: 3.2%
The "DeepSeek shock" of January 27, 2025 was the single largest day of market capitalization destruction in U.S. stock market history:[4][10]
Over $1 trillion erased from U.S. tech market capitalization in one trading session
Nvidia stock declined 17% in a single day, losing approximately $589 billion in value
The Nasdaq fell 3.1%; the S&P 500 dropped 1.5%
Microsoft, Alphabet, and semiconductor suppliers also saw sharp declines
Micron and Arm Holdings dropped more than 11% and 10%, respectively
Broadcom and Advanced Micro Devices lost more than 17% and 6%, respectively
Triggered "Sputnik moment" discussions across the U.S. AI industry and in Congress
AI price war in China with competitors cutting prices up to 97%
The market reaction reflected a sudden reassessment of a core assumption in the AI investment thesis: that building frontier models required hundreds of millions or billions of dollars in compute. DeepSeek's demonstration that competitive results could be achieved for under $6 million raised questions about the return on investment for massive GPU buildouts by companies like Microsoft, Google, and Amazon.[4][46]
Nvidia CEO Jensen Huang later argued that investors had overreacted, contending that DeepSeek's efficiency gains would actually increase overall demand for AI compute by making it accessible to more users. Nvidia's stock price partially recovered in subsequent weeks, though the "DeepSeek shock" remained a reference point in debates about AI capital expenditure throughout 2025.[64]
DeepSeek's efficiency breakthroughs forced a rethinking of AI economics across the industry. Before V3's release, the prevailing view held that each new generation of frontier models would require exponentially more compute, a trend sometimes called "scaling laws." DeepSeek showed that architectural innovation (MoE routing, MLA, FP8 training) could substitute for raw compute in many cases.[12][46]
The practical effects were immediate. DeepSeek's V3.2-Exp model, released in September 2025, reportedly matched GPT-5-class performance at roughly one-tenth the cost per token. Smaller companies and startups began building on DeepSeek's open weights rather than training their own models from scratch, lowering the barrier to entry for AI product development. Research institutions that previously could not afford to work with frontier models gained access through the open-source releases.[46][47]
By December 2025, DeepSeek-V3.2-Speciale demonstrated that the cost-performance gap was continuing to close. The model surpassed GPT-5 on elite math benchmarks while remaining freely available under the MIT license. This pattern, where each DeepSeek release matched or exceeded the previous generation of proprietary Western models at a fraction of the cost, became a recurring theme throughout 2025.[57][58]
| Model | Training Cost | Hardware Used |
|---|---|---|
| DeepSeek-V3 | $5.6 million | 2,048 Nvidia H800 GPUs |
| DeepSeek-R1 | $5.6 million (base model) | 2,048 Nvidia H800 GPUs |
| GPT-4 | $100+ million (est.) | Unknown (likely H100s) |
| Claude 3 | $100+ million (est.) | Unknown |
| Gemini Ultra | $191 million (est.) | TPU v5p |
Multiple governments and organizations have restricted DeepSeek usage, citing concerns over data storage on servers in China and potential exposure to surveillance under Chinese data laws:[32]
Australian government agencies
India central government
South Korea industry ministry
Taiwan government agencies
Texas state government
U.S. Congress and Pentagon
NASA (memo issued January 31, 2025 prohibiting use on government devices)
New York, Virginia, and Tennessee state governments
Potential EU-wide ban under consideration
A third-party security analysis found that DeepSeek's chatbot application could capture login information and share data with China Mobile, a state-owned telecommunications firm. All data collected by DeepSeek is stored on servers in China, which under Chinese law could be subject to government access requests.[32][48]
In February 2025, Representatives Josh Gottheimer (NJ-5) and Darin LaHood (IL-16) introduced the bipartisan "No DeepSeek on Government Devices Act" (H.R. 1121 in the House, S. 765 in the Senate). The bill directs the Office of Management and Budget to require removal of DeepSeek from all federal agency information technology, including any successor applications developed by High-Flyer or its subsidiaries.[48]
The Fiscal Year 2026 National Defense Authorization Act (P.L. 119-60), signed on December 18, 2025, included provisions restricting DeepSeek usage within the Department of Defense and the Intelligence Community.[49]
Export control concerns have been a persistent issue surrounding DeepSeek:
February 2025: Arrests in Singapore for illegally exporting Nvidia chips to DeepSeek
April 2025: Trump administration considered blocking DeepSeek from U.S. technology purchases
February 2026: U.S. officials alleged that DeepSeek trained its upcoming model on Nvidia Blackwell chips, the most advanced AI accelerator restricted from export to China. The chips were reportedly located in a DeepSeek data center in Inner Mongolia.[50][60]
March 2026: DeepSeek withheld V4 pre-release builds from Nvidia and AMD, granting early access exclusively to Chinese chip suppliers including Huawei and Cambricon.[62]
The broader geopolitical context adds complexity. The U.S. government has imposed increasingly strict export controls on advanced AI chips to China since 2022, yet DeepSeek's competitive results with H800 GPUs (a chip designed to comply with earlier export restrictions) demonstrated that Chinese AI labs could work around hardware limitations through software and architectural innovation. The Blackwell chip allegations in 2026 raised the stakes further, as they suggested that export controls were being circumvented entirely rather than merely worked around.[4][5][60]
A policy debate has emerged within the U.S. government. White House AI Czar David Sacks and Nvidia CEO Jensen Huang have argued for easing export controls, contending that restrictions push rivals like Huawei to accelerate their own chip development. National security hawks take the opposite view, arguing that tighter enforcement is needed.[60]
In February 2026, Anthropic accused DeepSeek, Moonshot AI, and MiniMax of systematically extracting capabilities from Claude through approximately 24,000 fake accounts and over 16 million exchanges. The campaigns allegedly violated Anthropic's terms of service and targeted areas including complex reasoning, coding assistance, and tool use. OpenAI had previously raised similar concerns about distillation from its models.[51][61]
These accusations added a new dimension to the competitive dynamics between Chinese and Western AI labs, raising questions about intellectual property protection in the context of large language models where the boundary between legitimate benchmarking and unauthorized distillation can be difficult to define.
DeepSeek-R1-0528 and later models have been noted for alignment with Chinese government policies and content restrictions. The models decline to discuss certain politically sensitive topics, including the Tiananmen Square protests and Taiwanese independence, consistent with China's regulations on generative AI services.[33]
A September 2025 NIST evaluation found:[33]
Performance shortcomings compared to U.S. models in certain safety categories
Security vulnerabilities in certain implementations
Cost calculations disputed by independent analysts
As of March 2026, DeepSeek occupies a unique position in the global AI industry. The company remains a team of roughly 200 employees with no external venture capital, yet its models compete with those produced by organizations spending billions of dollars annually. Its open-source releases have reshaped how the industry thinks about the cost and accessibility of frontier AI.
DeepSeek's model lineup at the start of 2026 includes DeepSeek-V3.2 for general use, DeepSeek-R1-0528 for reasoning tasks, and a range of specialized models for mathematics, coding, vision-language understanding, and document OCR. The company's API revenue run rate reportedly reached $220 million by mid-2025, and it claimed a 545% cost-profit ratio on inference services.[47][63]
The most anticipated near-term development is DeepSeek-V4, expected in April 2026. If the model's claimed performance on coding and software engineering benchmarks holds up to independent evaluation, it would represent another significant step forward. The model's optimization for Huawei Ascend and Cambricon chips, rather than exclusively Nvidia hardware, signals a shift in DeepSeek's infrastructure strategy as U.S. export controls continue to tighten.[42][43][44]
The company faces ongoing regulatory headwinds in Western markets, with government bans on its products expanding and legislative restrictions moving through multiple jurisdictions. The Blackwell chip training allegations and Anthropic's distillation accusations have added new dimensions to these challenges. At the same time, its influence on the open-source AI community continues to grow, with over 75 million downloads on Hugging Face and thousands of derivative models and integrations built on its publicly available weights.[50][51][62]
DeepSeek-V4: Trillion-parameter multimodal model with Engram memory and mHC architecture, targeting April 2026 release[42][43]
Huawei Ascend optimization: Reducing dependence on Nvidia hardware amid export restrictions[44]
DeepSeek Coder 2.0: Expanded language support for Rust, Swift, Kotlin, Go
Multimodal DeepSeek-VL 3.0: Integration of text, vision, and audio
Private Model Hosting: Enterprise deployment solutions
Edge AI Models: Sub-1B parameter models for edge devices
AI Agent Systems: Multi-step task completion capabilities[34]
AGI Research: Investment in consciousness-mapping and advanced reasoning research
Global Expansion: Operations in 50+ countries by 2028
AI Ethics Framework: Open-source accountability frameworks
Energy Efficiency: Reduction in training energy via optimized algorithms[35]
DeepSeek operates under comprehensive service agreements addressing:[36]
Content management per China's Interim Measures for the Management of Generative Artificial Intelligence Services
Data security compliance with China's Data Security Law and Personal Information Protection Law
Technical standards for AI-generated content identification
Developer responsibilities for content filtering and monitoring
High-Flyer Capital Management
Liang Wenfeng