DeepSeek

DeepSeek
杭州深度求索人工智能基础技术研究有限公司
*
Type	Private
Industry	Artificial intelligence
Founded	July 17, 2023
Founder	Liang Wenfeng
Headquarters	Hangzhou, Zhejiang, China
Key people	Liang Wenfeng (CEO)
Owner	High-Flyer Capital Management (Hangzhou Huanfang Technology)
Products	DeepSeek-V2, DeepSeek-V3, DeepSeek-V3.1, DeepSeek-V3.2, DeepSeek-Coder, DeepSeek-Coder-V2, DeepSeek-R1, DeepSeek-VL2, DeepSeek-OCR
Employees	~200 (2025)
Website	https://www.deepseek.com/

DeepSeek (Chinese: 杭州深度求索人工智能基础技术研究有限公司; commonly DeepSeek AI or simply DeepSeek) is a Chinese artificial intelligence company known for developing large language models (LLMs) and releasing several prominent open-source and research models. Founded in 2023 by hedge fund entrepreneur Liang Wenfeng, the company has gained international recognition for achieving competitive performance with leading Western AI models at dramatically lower training costs.^[1]^[2]

DeepSeek rose to global prominence in January 2025 when its mobile app briefly topped the Apple App Store's free charts in the United States, following the release of its reasoning-focused DeepSeek-R1 models. The company's claim of training competitive models for under $6 million using Nvidia H800 GPUs, compared to over $100 million for Western equivalents, caused significant market disruption, with Nvidia losing nearly $600 billion in market capitalization.^[3]^[4] The event, widely called "AI's Sputnik moment," triggered a broad reassessment of whether frontier AI necessarily requires massive capital expenditure.^[10]

By late 2025 and into 2026, DeepSeek continued to release increasingly capable models (V3.1, V3.2, and the anticipated V4), faced accusations of distilling Western models through fake accounts, and became a focal point in the U.S.-China AI chip export control debate after reports emerged that it had trained models on banned Nvidia Blackwell chips.^[50]^[51]^[52]

History

Background and Origins (2016-2023)

DeepSeek's origins trace back to High-Flyer Capital Management, a Chinese quantitative hedge fund co-founded in February 2016 by Liang Wenfeng and two classmates from Zhejiang University.^[1] High-Flyer began adopting deep learning models for stock trading on October 21, 2016, transitioning from CPU-based linear models to GPU-dependent systems. By 2021, the fund relied exclusively on AI for trading operations.^[5]

High-Flyer grew rapidly through its AI-driven trading strategies. By 2019, the fund had accumulated at least $10 billion in assets under management.^[53] That same year, High-Flyer built its first computing cluster, Fire-Flyer (萤火一号), at a cost of 200 million yuan, equipped with 1,100 GPUs. Anticipating U.S. export restrictions on advanced chips to China, Liang acquired 10,000 Nvidia A100 units before restrictions took effect. Construction of Fire-Flyer 2 (萤火二号) began in 2021 with a 1 billion yuan budget, incorporating 5,000 PCIe A100 GPUs across 625 nodes by 2022.^[5]^[6]

The transition from hedge fund to AI lab was not accidental. Liang and his co-founders had been exploring algorithmic trading ideas since their university days during the 2008 financial crisis, and the infrastructure High-Flyer built for quantitative finance proved directly applicable to training large language models.^[53]

Founding and Early Development (2023-2024)

On April 14, 2023, High-Flyer announced the establishment of an artificial general intelligence (AGI) research lab. This lab was formally incorporated as DeepSeek on July 17, 2023, with High-Flyer serving as the principal investor. Venture capital firms were initially reluctant to invest, considering the lack of short-term exit opportunities.^[1]^[7]

The company released its first model, DeepSeek Coder, on November 2, 2023, followed by the DeepSeek-LLM series on November 29, 2023. Throughout 2024, DeepSeek continued releasing specialized models:

January 2024: DeepSeek-MoE models (Base and Chat variants)
April 2024: DeepSeek-Math models (Base, Instruct, and RL)
May 2024: DeepSeek-V2
June 2024: DeepSeek-Coder V2 series
September 2024: DeepSeek V2.5^[5]

DeepSeek-V3 and the December 2024 Release

On December 26, 2024, DeepSeek released DeepSeek-V3, a Mixture of Experts model with 671 billion total parameters and 37 billion activated per token. The model was pre-trained on 14.8 trillion tokens using 2,048 Nvidia H800 GPUs. DeepSeek reported the full training run required 2.788 million H800 GPU hours, costing approximately $5.57 million, a figure that stunned the AI industry given the model's competitive performance with GPT-4o and Claude 3.5 Sonnet on standard benchmarks.^[12]^[13]

V3 introduced several technical innovations: FP8 mixed precision training, an auxiliary-loss-free load balancing strategy for its MoE routing, a custom DualPipe algorithm for pipeline parallelism, and a multi-token prediction training objective that improved sample efficiency. These engineering choices collectively enabled frontier-level performance at roughly one-twentieth the estimated training cost of comparable Western models.^[12]

Global Breakthrough (January 2025)

On January 20, 2025, the company announced DeepSeek-R1, a reasoning-centric model using pure reinforcement learning that matched performance of OpenAI's o1 family at significantly lower costs. R1 was released under the MIT license, making it one of the most capable fully open-source reasoning models available. DeepSeek also open-sourced DeepSeek-R1-Zero (trained entirely via RL without supervised fine-tuning), plus six distilled models based on Llama and Qwen ranging from 1.5B to 70B parameters.^[8]^[9]^[25]

DeepSeek-R1-Zero was a particularly notable research contribution. It demonstrated that reasoning capabilities, including self-verification, reflection, and the generation of long chain-of-thought traces, could emerge from pure reinforcement learning without any supervised fine-tuning. This finding influenced subsequent research across the field and opened new avenues for training reasoning models.^[8]

DeepSeek's mobile app reached #1 among free apps on the U.S. Apple App Store on January 27-28, 2025. This surge coincided with an 18% drop in Nvidia's share price and over $1 trillion erased from U.S. tech market capitalization, the largest single-day loss in U.S. stock market history. Prominent tech investor Marc Andreessen described this as "AI's Sputnik moment."^[3]^[10]^[4]

On January 27-28, 2025, DeepSeek reported large-scale malicious attacks on its services, temporarily restricting new sign-ups.^[11]

Continued Development (2025)

Throughout 2025, DeepSeek maintained a rapid release cadence:

March 2025: DeepSeek-V3-0324, a non-reasoning model update that topped open-source benchmarks for non-reasoning models.^[37]
April 2025: DeepSeek-Prover-V2, a 671B-parameter model for formal mathematical theorem proving in Lean 4, achieving 88.9% pass ratio on MiniF2F-test and solving 49 out of 658 PutnamBench problems.^[38]
May 2025: DeepSeek-R1-0528, billed as a "minor upgrade" but delivering major improvements. AIME 2025 accuracy rose from 70% to 87.5%. The Codeforces rating jumped roughly 400 points to ~1930. The update also added function calling and JSON output support.^[39]
August 2025: DeepSeek-V3.1, a hybrid MoE model with 671B total parameters (37B active), supporting both thinking and non-thinking modes. V3.1 achieved 66.0% on SWE-bench Verified (up from 45.4% on V3-0324, a 45% improvement) and a 71.6% Aider pass rate. The model was released under the MIT license.^[26]^[54]
September 2025: DeepSeek-V3.1-Terminus, an update focused on enhanced Code Agent and Search Agent frameworks, with improved tool use and reduced language mixing errors. Terminus scored 68.4% on SWE-bench Verified and 96.8 on SimpleQA.^[55]
September 2025: DeepSeek-V3.2-Exp, introducing DeepSeek Sparse Attention (DSA) and Huawei Ascend chip support. API prices were cut by more than 50% following this release.^[16]^[27]
October 2025: DeepSeek-OCR, an open-source end-to-end document OCR system with a ~380M-parameter encoder and 3B MoE decoder, capable of processing over 200,000 pages per day on a single A100-40G GPU.^[17]^[18]
November 2025: DeepSeekMath-V2, which scored 118 out of 120 on the William Lowell Putnam Mathematical Competition, surpassing the top human score of 90.^[56]
December 2025: DeepSeek-V3.2, the full release following V3.2-Exp, along with a high-compute variant called DeepSeek-V3.2-Speciale. V3.2-Speciale surpassed GPT-5 on AIME (96.0% vs. 94.6%) and achieved gold-level results in the 2025 International Mathematical Olympiad, Chinese Mathematical Olympiad, ICPC World Finals, and IOI 2025. V3.2 was the first model to integrate thinking directly into tool use.^[57]^[58]

2026 Developments

Foundational Research Papers (January 2026)

In January 2026, DeepSeek and Peking University co-published two foundational research papers that signaled the architectural direction for the next generation of models. The first, on Manifold-Constrained Hyper-Connections (mHC), proposes a rethink of residual connections in transformer training by constraining connection matrices onto the Birkhoff Polytope using the Sinkhorn-Knopp algorithm. A team of 19 DeepSeek researchers, including founder Liang Wenfeng, tested mHC on models with 3 billion, 9 billion, and 27 billion parameters. The technique scaled without adding significant computational burden (roughly 6-7% training overhead) and produced models with lower loss and better performance on reasoning and language benchmarks.^[40]^[59]

The second paper introduced Engram conditional memory, a module that decouples static pattern storage from dynamic reasoning to achieve constant-time knowledge retrieval across contexts exceeding one million tokens.^[41]

Blackwell Chip Controversy (February 2026)

In February 2026, a senior U.S. government official revealed that DeepSeek had trained its upcoming AI model on Nvidia's Blackwell chips, the company's most advanced AI accelerator. The Blackwell chips fall under strict U.S. export controls that explicitly prohibit their shipment to China. According to the official, the chips are likely located in a DeepSeek data center in Inner Mongolia. How DeepSeek obtained the chips was not disclosed, though analysts suggested they may have entered China through intermediary countries where export restrictions are less stringent. The official also noted that DeepSeek was expected to remove technical indicators that might reveal its use of American chips.^[50]^[60]

Neither Nvidia, the U.S. Commerce Department, nor DeepSeek commented on the allegations. China's government rejected the accusations as politicizing trade and technology issues.^[60]

Anthropic Distillation Accusations (February 2026)

On February 23, 2026, Anthropic published a blog post alleging that DeepSeek, along with Chinese AI companies Moonshot AI and MiniMax, had used approximately 24,000 fake accounts to generate more than 16 million exchanges with Claude. The campaigns allegedly violated Anthropic's terms of service and regional access restrictions. According to Anthropic, the labs scripted long, high-token conversations designed to extract detailed, step-by-step answers that could be fed back into their own systems as training data. The extraction campaigns reportedly focused on areas that Anthropic considers key differentiators for Claude, including complex reasoning, coding assistance, and tool use.^[51]^[61]

Of the three labs, MiniMax was alleged to have generated the most activity (over 13 million interactions), followed by Moonshot (over 3.4 million). DeepSeek's activity reportedly focused specifically on extracting step-by-step reasoning traces. In response, Anthropic announced that it had built classifiers and behavioral fingerprinting systems to identify suspicious distillation patterns in API traffic, strengthened account verification processes, and implemented safeguards to reduce the usefulness of model outputs for illicit distillation.^[51]^[61]

DeepSeek-V4 Anticipation

DeepSeek-V4 is expected to be a trillion-parameter MoE model with approximately 32 billion active parameters per token, native multimodal capabilities (text, image, and video), and a context window exceeding one million tokens. Internal benchmarks reportedly show scores above 80% on SWE-bench Verified, though these claims remain unverified by independent evaluators as of March 2026.^[42]^[43]

The release has been delayed multiple times. Initial reports pointed to a mid-February 2026 launch, but that window passed without a release. Subsequent projected dates around Lunar New Year, late February, and early March 2026 also came and went. As of March 2026, Chinese tech outlet Whale Lab and other sources report an expected April 2026 launch. DeepSeek has reportedly been working with Huawei and Cambricon to optimize V4 for Chinese-made AI accelerators.^[43]^[44]

In a notable break with industry convention, DeepSeek withheld pre-release builds of V4 from Nvidia and AMD, denying the U.S. chipmakers the opportunity to optimize their hardware for the new model. Instead, the company granted early access to domestic suppliers, including Huawei Technologies, giving Chinese processors a weeks-long head start. The move was interpreted as a deliberate effort to strengthen China's local silicon ecosystem while disadvantaging U.S. hardware vendors.^[62]

In mid-March 2026, a powerful anonymous model called "Hunter Alpha" appeared on the OpenRouter platform, sparking widespread speculation that it was a stealth test of DeepSeek V4. The model was later revealed to be from Xiaomi, though some developers noted similarities to DeepSeek's reasoning patterns.^[45]

Technology

Architecture

Mixture of Experts (MoE)

DeepSeek's models employ a Mixture of Experts architecture, which allows massive parameter counts while maintaining computational efficiency. The MoE framework in DeepSeek-V3 consists of:^[12]^[13]

671 billion total parameters
37 billion activated parameters per forward pass
256 routed experts per layer (increased from 160 in V2)
1 shared expert per layer that is always activated
3 all-experts-activated layers

The DeepSeekMoE component uses dynamic routing, with each token interacting with 8 specialized experts plus the single shared expert. This design carries forward through V3.1, V3.1-Terminus, and V3.2.^[54]

Multi-head Latent Attention (MLA)

DeepSeek-V2 and subsequent models incorporate Multi-head Latent Attention (MLA), a modified attention mechanism that compresses the key-value (KV) cache. MLA achieves:^[2]^[14]

KV-cache reduction to 5-13% of traditional methods
Significant memory overhead reduction during inference
Support for 128K-164K token context windows
Lower computational cost for long-context processing

Training Methodology

DeepSeek-R1 employs a distinctive training pipeline:^[8]^[15]

Cold Start Phase: Fine-tuning base model with curated chain-of-thought reasoning examples
Reasoning-Oriented Reinforcement Learning: Large-scale RL focusing on rule-based evaluation tasks
Supervised Fine-Tuning: Combining reasoning and non-reasoning data
RL for All Scenarios: Final refinement for helpfulness and harmlessness

Notably, DeepSeek-R1-Zero demonstrated that reasoning capabilities can emerge from pure reinforcement learning without any supervised fine-tuning, a finding that influenced subsequent research across the field.^[8]

DeepSeek Sparse Attention (DSA)

Introduced in DeepSeek-V3.2-Exp (September 2025), DSA is a fine-grained sparse attention mechanism optimized for long-context training and inference efficiency with minimal performance impact. DSA reduces the computational complexity of attention from O(L^2) to O(L), where L is the context length, resulting in significant end-to-end speedups in long-context scenarios.^[16]^[57]

Hybrid Thinking/Non-Thinking Modes

Starting with DeepSeek-V3.1 (August 2025), DeepSeek introduced a hybrid reasoning architecture that dynamically toggles between fast inference and deep chain-of-thought reasoning using token-controlled templates. Users can select non-thinking mode for tasks requiring fast responses or thinking mode for problems needing step-by-step analysis. In thinking mode, V3.1 achieves answer quality comparable to DeepSeek-R1 while responding significantly faster, making deep reasoning practical for production applications. DeepSeek-V3.2 extended this further by becoming the first model to integrate thinking directly into tool use, supporting tool calling in both thinking and non-thinking modes.^[54]^[57]

Manifold-Constrained Hyper-Connections (mHC)

Published in January 2026, mHC is a proposed replacement for conventional residual connections in transformer networks. The technique uses multiple information streams and constrains mixing steps using the Sinkhorn-Knopp algorithm so that the total information flow remains constant, preventing gradient explosion or vanishing. In experiments, mHC trained smoothly at 3B, 9B, and 27B parameter scales while unconstrained Hyper-Connections often became unstable. The resulting models achieved lower loss and better performance on reasoning and language benchmarks with only 6-7% training overhead.^[40]^[59]

DeepSeek-OCR (2025)

In October 2025, DeepSeek released DeepSeek-OCR, an open-source end-to-end document OCR and understanding system that explores "contexts optical compression": representing long text as images and decoding it back with a vision-language stack to save tokens for long-context LLM applications.^[17]^[18]

Architecture: A ~380M-parameter DeepEncoder (SAM-base window attention, 16x token compression via 2-layer conv, CLIP-large global attention) feeds a 3B MoE decoder (DeepSeek-3B-MoE-A570M; ~570M active params at inference). Multiple resolution modes control vision-token budgets: Tiny (64 tokens, 512 squared), Small (100, 640 squared), Base (256, 1024 squared), Large (400, 1280 squared), plus tiled Gundam (n x 100 + 256 tokens) and Gundam-M modes for ultra-high-res pages.^[17]
Reported compression/accuracy: On a Fox benchmark subset (English pages with 600-1,300 text tokens), the paper reports approximately 97% decoding precision when text tokens are less than 10x vision tokens, and ~60% accuracy around 20x compression. On OmniDocBench (edit distance; lower is better), Small (100 tokens) outperforms GOT-OCR 2.0 (256 tokens), and Gundam (<~800 tokens) surpasses MinerU-2.0 (~6,790 tokens) in the reported setup.^[17]
Throughput/uses: DeepSeek positions the system as a data-engine for LLM/VLM pretraining, claiming over 200k pages/day on a single A100-40G, scalable to tens of millions per day on clusters, plus "deep parsing" of charts, chemical structures (SMILES), and planar geometry into structured outputs (for example HTML tables or dictionaries).^[18]
Availability/ecosystem: Source code and weights are hosted on GitHub and Hugging Face, with examples for Transformers/vLLM inference. Community walkthroughs (for example Simon Willison) documented running the 6.6-GB model on diverse hardware and shared setup notes.^[19]^[20]^[21]

Infrastructure

DeepSeek operates two primary computing clusters:^[5]

Fire-Flyer 1 (萤火一号): Built 2019, retired after 1.5 years
Fire-Flyer 2 (萤火二号): Operational since 2022, featuring: Nvidia GPUs with 200 Gbps interconnects
Fat tree topology for high bisection bandwidth
3FS distributed file system with Direct I/O and RDMA
2,048 Nvidia H800 GPUs used for R1 training

In February 2026, reports indicated that DeepSeek also operates a data center in Inner Mongolia, where Nvidia Blackwell chips were allegedly used for training newer models.^[50]^[60]

Performance Benchmarks

The following table compares benchmark results across DeepSeek's major model releases and selected competitors.

Benchmark	DeepSeek-V3	DeepSeek-R1	DeepSeek-R1-0528	DeepSeek-V3.2-Speciale	GPT-4o	GPT-5	Description
MMLU	88.5	91.8	N/A	N/A	87.2	N/A	Massive Multitask Language Understanding
HumanEval	82.6	85.4	N/A	N/A	80.5	N/A	Code Generation
MATH-500	90.2	97.3	N/A	N/A	74.6	N/A	Mathematical Problem-Solving
Codeforces	51.6	57.2	~1,930 (rating)	N/A	23.6	N/A	Complex Coding Performance
GPQA	59.1	72.3	N/A	N/A	N/A	N/A	Graduate-Level Question Answering
AIME 2024	N/A	79.8%	91.4%	N/A	N/A	N/A	American Invitational Mathematics Exam
AIME 2025	N/A	70.0%	87.5%	96.0%	N/A	94.6%	American Invitational Mathematics Exam
SWE-bench Verified	N/A	N/A	N/A	N/A	N/A	N/A	Software Engineering Tasks

^[12]^[8]^[22]^[39]^[57]^[58]

V3.x Series Benchmark Progression

The following table tracks improvements across the V3 model family on key software engineering and agent benchmarks.

Benchmark	V3-0324	V3.1	V3.1-Terminus	V3.2
SWE-bench Verified	45.4%	66.0%	68.4%	N/A
SimpleQA	N/A	93.4	96.8	N/A
BrowseComp	N/A	30.0	38.5	N/A
Aider Pass Rate	N/A	71.6%	N/A	N/A
Terminal-bench	N/A	31.3	36.7	N/A

^[26]^[54]^[55]

R1-0528 Benchmark Improvements

The May 2025 update to R1 brought substantial gains over the original January 2025 release:^[39]

Benchmark	R1 (Jan 2025)	R1-0528 (May 2025)	Change
AIME 2024	79.8%	91.4%	+11.6%
AIME 2025	70.0%	87.5%	+17.5%
Codeforces Rating	~1,530	~1,930	+400 points

Models and Products

Major Model Releases

Model	Type/Focus	Parameters	Context Length	Release Date	Key Features
DeepSeek-V2	General LLM (MoE)	236B total; 21B active	128K	May 2024	MLA; DeepSeekMoE routing^[2]
DeepSeek-V3	General LLM (MoE)	671B total; 37B active	131K	Dec 2024	Enhanced MoE; FP8 training; $5.6M training cost^[12]
DeepSeek-V3-0324	General LLM (MoE)	671B total; 37B active	131K	Mar 2025	Top open-source non-reasoning model^[37]
DeepSeek-V3.1	Hybrid MoE	671B total; 37B active	128K	Aug 2025	Hybrid thinking/non-thinking modes; 66.0% SWE-bench Verified^[26]^[54]
DeepSeek-V3.1-Terminus	Hybrid MoE	671B total; 37B active	128K	Sep 2025	Enhanced tool use; reduced language mixing; 68.4% SWE-bench^[55]
DeepSeek-V3.2-Exp	Experimental	671B total; 37B active	128K+	Sep 2025	Sparse attention (DSA); Huawei Ascend support^[16]^[27]
DeepSeek-V3.2	Hybrid MoE	671B total; 37B active	128K+	Dec 2025	Thinking integrated into tool use; GPT-5-class performance^[57]
DeepSeek-V3.2-Speciale	High-compute reasoning	671B total; 37B active	128K+	Dec 2025	IMO gold-level; 96.0% AIME 2025; surpasses GPT-5^[57]^[58]
DeepSeek-Coder	Code LLMs	Various sizes	16K	Nov 2023	2T tokens; 87% code / 13% NL; infilling^[23]
DeepSeek-Coder-V2	Code LLMs (MoE)	236B total; 21B active	128K	June 2024	+6T tokens; GPT-4-Turbo comparable^[24]
DeepSeek-R1	Reasoning post-training	671B total; 37B active	164K	Jan 2025	Pure RL training; MIT license; o1-level performance^[8]^[25]
DeepSeek-R1-0528	Reasoning (updated)	671B total; 37B active	164K	May 2025	+17.5% AIME 2025; function calling; JSON output^[39]
DeepSeek-Prover-V2	Theorem Proving	671B (also 7B variant)	32K	Apr 2025	88.9% MiniF2F-test; Lean 4 formal proofs^[38]
DeepSeekMath-V2	Mathematics	N/A	N/A	Nov 2025	118/120 on Putnam Competition^[56]
DeepSeek-VL2	Vision-Language (MoE)	27B total; 4.5B active	4K	2025	Multimodal understanding^[28]
DeepSeek-OCR	Document OCR	~380M encoder + 3B MoE decoder	N/A	Oct 2025	200k+ pages/day on single A100^[17]^[18]

Distilled Models

DeepSeek has created smaller, efficient models through knowledge distillation:^[29]

DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1-0528-Qwen3-8B (state-of-the-art among open-source 8B models on AIME 2024)
Various models from 1.5B to 70B parameters

Within days of R1's release under the MIT license, more than 700 open-source derivative models appeared on Hugging Face. By February 2026, DeepSeek's models had accumulated over 75 million downloads on the platform. Microsoft, AWS, and Nvidia AI platforms all onboarded DeepSeek models for their cloud inference services.^[46]^[62]

API and Pricing

DeepSeek provides API access through the DeepSeek Open Platform with competitive pricing:^[30]

Input costs: $0.07-$0.27 per million tokens (vs $2.50 for GPT-4o)
Output costs: $1.10 per million tokens (vs $10.00 for GPT-4o)
50%+ price reduction following V3.2-Exp release (September 2025)
Pre-paid billing model

DeepSeek has claimed a 545% cost-profit ratio on its API services, suggesting the company is generating significant revenue from inference despite its low prices. By mid-2025, the annual revenue run rate had reportedly reached $220 million, primarily from API and enterprise services.^[47]^[63]

Open-Source Philosophy

DeepSeek's commitment to open-source releases has been a defining characteristic of the company and a major factor in its influence. The company releases model weights, training code, and technical papers under permissive licenses, most notably the MIT license for DeepSeek-R1 and its derivatives.^[8]^[25]

This approach has had measurable effects on the broader AI ecosystem. When R1 launched in January 2025, it became the top free app in U.S. app stores within days. Hundreds of derivative models were built on top of its open weights, and major cloud providers integrated it into their platforms. Stanford HAI faculty noted that DeepSeek's open releases represented "a significant step in democratizing AI," enabling smaller companies and individual developers to build on frontier-capable models without needing to spend hundreds of millions on training.^[46]

DeepSeek's open-source strategy also forced competitive responses. Multiple Chinese AI companies cut API prices by up to 97% in the weeks following V3's release, while Western labs faced renewed pressure to justify the cost gap between their proprietary models and DeepSeek's freely available alternatives.^[4]^[46]

The scale of DeepSeek's open-source impact continued to grow through 2025 and 2026. By early 2026, DeepSeek models had logged over 75 million downloads on Hugging Face, outpacing open-source releases from other countries on the platform. The V3.1-Terminus, V3.2, and associated models were all released under the MIT license, maintaining the company's pattern of permissive licensing even as geopolitical tensions around its models increased.^[62]

Organization and Leadership

Liang Wenfeng

Liang Wenfeng (梁文锋), born 1985 in Guangdong province, is the founder and CEO of DeepSeek. He grew up in a fifth-tier city where his father was a primary school teacher. He graduated from Zhejiang University with:^[31]^[53]

Bachelor of Engineering in electronic information engineering (2007)
Master of Engineering in information and communication engineering (2010)

Liang and his co-founders began exploring algorithmic trading ideas as university students during the 2008 financial crisis. He co-founded High-Flyer in 2015 and began acquiring Nvidia GPUs in 2021, purchasing 10,000 A100 chips before U.S. export restrictions. The computing infrastructure originally built for High-Flyer's quantitative trading operations proved directly transferable to training large language models, providing the foundation for DeepSeek's formation in 2023.^[6]^[53]

Corporate Structure

84% owned by Liang Wenfeng through shell corporations (Liang personally holds 1% directly, with 99% held through Ningbo High-Flyer Quantitative Investment Management Partnership)
16% owned by High-Flyer affiliated individuals
No external venture capital funding as of 2025
Approximately 200 employees (2025)
Estimated valuation undisclosed; Liang's personal wealth: $4.5 billion (2025)^[30]^[53]

Organizational Philosophy

DeepSeek operates with an unconventional structure:^[6]^[5]

Bottom-up organization with natural division of labor
No preassigned roles or rigid hierarchy
Unrestricted computing resource access for researchers
Emphasis on fresh graduates and non-CS backgrounds
Recruitment from poetry, advanced mathematics, and other fields

The company's lean staffing stands in sharp contrast to competitors. OpenAI employs an estimated 3,500 people, while Google DeepMind has over 2,000. DeepSeek's ability to produce competitive models with roughly 200 employees has been cited as evidence that small, focused teams can rival much larger organizations in AI research when given sufficient compute access.^[63]

Market Impact and Adoption

User Growth

DeepSeek experienced explosive growth following its January 2025 releases:^[30]

30 million daily active users within weeks of launch
33.7 million monthly active users (4th largest AI application globally)
Briefly surpassed ChatGPT in daily users (21.6M vs 14.6M)
Geographic distribution (January 2025): China: 30.7%
India: 13.6%
Indonesia: 6.9%
United States: 4.3%
France: 3.2%

The "DeepSeek Shock" (January 2025)

The "DeepSeek shock" of January 27, 2025 was the single largest day of market capitalization destruction in U.S. stock market history:^[4]^[10]

Over $1 trillion erased from U.S. tech market capitalization in one trading session
Nvidia stock declined 17% in a single day, losing approximately $589 billion in value
The Nasdaq fell 3.1%; the S&P 500 dropped 1.5%
Microsoft, Alphabet, and semiconductor suppliers also saw sharp declines
Micron and Arm Holdings dropped more than 11% and 10%, respectively
Broadcom and Advanced Micro Devices lost more than 17% and 6%, respectively
Triggered "Sputnik moment" discussions across the U.S. AI industry and in Congress
AI price war in China with competitors cutting prices up to 97%

The market reaction reflected a sudden reassessment of a core assumption in the AI investment thesis: that building frontier models required hundreds of millions or billions of dollars in compute. DeepSeek's demonstration that competitive results could be achieved for under $6 million raised questions about the return on investment for massive GPU buildouts by companies like Microsoft, Google, and Amazon.^[4]^[46]

Nvidia CEO Jensen Huang later argued that investors had overreacted, contending that DeepSeek's efficiency gains would actually increase overall demand for AI compute by making it accessible to more users. Nvidia's stock price partially recovered in subsequent weeks, though the "DeepSeek shock" remained a reference point in debates about AI capital expenditure throughout 2025.^[64]

Impact on AI Industry Cost Assumptions

DeepSeek's efficiency breakthroughs forced a rethinking of AI economics across the industry. Before V3's release, the prevailing view held that each new generation of frontier models would require exponentially more compute, a trend sometimes called "scaling laws." DeepSeek showed that architectural innovation (MoE routing, MLA, FP8 training) could substitute for raw compute in many cases.^[12]^[46]

The practical effects were immediate. DeepSeek's V3.2-Exp model, released in September 2025, reportedly matched GPT-5-class performance at roughly one-tenth the cost per token. Smaller companies and startups began building on DeepSeek's open weights rather than training their own models from scratch, lowering the barrier to entry for AI product development. Research institutions that previously could not afford to work with frontier models gained access through the open-source releases.^[46]^[47]

By December 2025, DeepSeek-V3.2-Speciale demonstrated that the cost-performance gap was continuing to close. The model surpassed GPT-5 on elite math benchmarks while remaining freely available under the MIT license. This pattern, where each DeepSeek release matched or exceeded the previous generation of proprietary Western models at a fraction of the cost, became a recurring theme throughout 2025.^[57]^[58]

Cost Comparison

Model	Training Cost	Hardware Used
DeepSeek-V3	$5.6 million	2,048 Nvidia H800 GPUs
DeepSeek-R1	$5.6 million (base model)	2,048 Nvidia H800 GPUs
GPT-4	$100+ million (est.)	Unknown (likely H100s)
Claude 3	$100+ million (est.)	Unknown
Gemini Ultra	$191 million (est.)	TPU v5p

^[4]^[30]

Controversies and Challenges

Security and Privacy Concerns

Multiple governments and organizations have restricted DeepSeek usage, citing concerns over data storage on servers in China and potential exposure to surveillance under Chinese data laws:^[32]

Australian government agencies
India central government
South Korea industry ministry
Taiwan government agencies
Texas state government
U.S. Congress and Pentagon
NASA (memo issued January 31, 2025 prohibiting use on government devices)
New York, Virginia, and Tennessee state governments
Potential EU-wide ban under consideration

A third-party security analysis found that DeepSeek's chatbot application could capture login information and share data with China Mobile, a state-owned telecommunications firm. All data collected by DeepSeek is stored on servers in China, which under Chinese law could be subject to government access requests.^[32]^[48]

U.S. Legislative Response

In February 2025, Representatives Josh Gottheimer (NJ-5) and Darin LaHood (IL-16) introduced the bipartisan "No DeepSeek on Government Devices Act" (H.R. 1121 in the House, S. 765 in the Senate). The bill directs the Office of Management and Budget to require removal of DeepSeek from all federal agency information technology, including any successor applications developed by High-Flyer or its subsidiaries.^[48]

The Fiscal Year 2026 National Defense Authorization Act (P.L. 119-60), signed on December 18, 2025, included provisions restricting DeepSeek usage within the Department of Defense and the Intelligence Community.^[49]

Export Control Issues

Export control concerns have been a persistent issue surrounding DeepSeek:

February 2025: Arrests in Singapore for illegally exporting Nvidia chips to DeepSeek
April 2025: Trump administration considered blocking DeepSeek from U.S. technology purchases
February 2026: U.S. officials alleged that DeepSeek trained its upcoming model on Nvidia Blackwell chips, the most advanced AI accelerator restricted from export to China. The chips were reportedly located in a DeepSeek data center in Inner Mongolia.^[50]^[60]
March 2026: DeepSeek withheld V4 pre-release builds from Nvidia and AMD, granting early access exclusively to Chinese chip suppliers including Huawei and Cambricon.^[62]

The broader geopolitical context adds complexity. The U.S. government has imposed increasingly strict export controls on advanced AI chips to China since 2022, yet DeepSeek's competitive results with H800 GPUs (a chip designed to comply with earlier export restrictions) demonstrated that Chinese AI labs could work around hardware limitations through software and architectural innovation. The Blackwell chip allegations in 2026 raised the stakes further, as they suggested that export controls were being circumvented entirely rather than merely worked around.^[4]^[5]^[60]

A policy debate has emerged within the U.S. government. White House AI Czar David Sacks and Nvidia CEO Jensen Huang have argued for easing export controls, contending that restrictions push rivals like Huawei to accelerate their own chip development. National security hawks take the opposite view, arguing that tighter enforcement is needed.^[60]

Distillation and Model Theft Accusations

In February 2026, Anthropic accused DeepSeek, Moonshot AI, and MiniMax of systematically extracting capabilities from Claude through approximately 24,000 fake accounts and over 16 million exchanges. The campaigns allegedly violated Anthropic's terms of service and targeted areas including complex reasoning, coding assistance, and tool use. OpenAI had previously raised similar concerns about distillation from its models.^[51]^[61]

These accusations added a new dimension to the competitive dynamics between Chinese and Western AI labs, raising questions about intellectual property protection in the context of large language models where the boundary between legitimate benchmarking and unauthorized distillation can be difficult to define.

Content Alignment

DeepSeek-R1-0528 and later models have been noted for alignment with Chinese government policies and content restrictions. The models decline to discuss certain politically sensitive topics, including the Tiananmen Square protests and Taiwanese independence, consistent with China's regulations on generative AI services.^[33]

NIST Evaluation

A September 2025 NIST evaluation found:^[33]

Performance shortcomings compared to U.S. models in certain safety categories
Security vulnerabilities in certain implementations
Cost calculations disputed by independent analysts

Current State (March 2026)

As of March 2026, DeepSeek occupies a unique position in the global AI industry. The company remains a team of roughly 200 employees with no external venture capital, yet its models compete with those produced by organizations spending billions of dollars annually. Its open-source releases have reshaped how the industry thinks about the cost and accessibility of frontier AI.

DeepSeek's model lineup at the start of 2026 includes DeepSeek-V3.2 for general use, DeepSeek-R1-0528 for reasoning tasks, and a range of specialized models for mathematics, coding, vision-language understanding, and document OCR. The company's API revenue run rate reportedly reached $220 million by mid-2025, and it claimed a 545% cost-profit ratio on inference services.^[47]^[63]

The most anticipated near-term development is DeepSeek-V4, expected in April 2026. If the model's claimed performance on coding and software engineering benchmarks holds up to independent evaluation, it would represent another significant step forward. The model's optimization for Huawei Ascend and Cambricon chips, rather than exclusively Nvidia hardware, signals a shift in DeepSeek's infrastructure strategy as U.S. export controls continue to tighten.^[42]^[43]^[44]

The company faces ongoing regulatory headwinds in Western markets, with government bans on its products expanding and legislative restrictions moving through multiple jurisdictions. The Blackwell chip training allegations and Anthropic's distillation accusations have added new dimensions to these challenges. At the same time, its influence on the open-source AI community continues to grow, with over 75 million downloads on Hugging Face and thousands of derivative models and integrations built on its publicly available weights.^[50]^[51]^[62]

Future Roadmap

Near-term Priorities (2026)

DeepSeek-V4: Trillion-parameter multimodal model with Engram memory and mHC architecture, targeting April 2026 release^[42]^[43]
Huawei Ascend optimization: Reducing dependence on Nvidia hardware amid export restrictions^[44]
DeepSeek Coder 2.0: Expanded language support for Rust, Swift, Kotlin, Go
Multimodal DeepSeek-VL 3.0: Integration of text, vision, and audio
Private Model Hosting: Enterprise deployment solutions
Edge AI Models: Sub-1B parameter models for edge devices
AI Agent Systems: Multi-step task completion capabilities^[34]

Long-term Vision (2027-2030)

AGI Research: Investment in consciousness-mapping and advanced reasoning research
Global Expansion: Operations in 50+ countries by 2028
AI Ethics Framework: Open-source accountability frameworks
Energy Efficiency: Reduction in training energy via optimized algorithms^[35]

Legal and Compliance

DeepSeek operates under comprehensive service agreements addressing:^[36]

Content management per China's Interim Measures for the Management of Generative Artificial Intelligence Services
Data security compliance with China's Data Security Law and Personal Information Protection Law
Technical standards for AI-generated content identification
Developer responsibilities for content filtering and monitoring

References

"DeepSeek AI: Company Overview." Various sources.
"DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model." DeepSeek-AI, 2024.
"DeepSeek app tops U.S. App Store charts." Various news sources, January 2025.
"DeepSeek market disruption: $1 trillion erased from U.S. tech stocks." NPR, CNN, NBC News, January 27, 2025.
"DeepSeek infrastructure and history." South China Morning Post, various sources.
"Liang Wenfeng and High-Flyer Capital GPU acquisitions." Various sources.
"High-Flyer announces AGI lab, April 2023." Company announcement.
"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." DeepSeek-AI, January 2025.
"DeepSeek-R1 Release." DeepSeek API Docs, January 20, 2025.
"Marc Andreessen describes DeepSeek as AI's Sputnik moment." Various sources, January 2025.
"DeepSeek reports large-scale malicious attacks." January 27-28, 2025.
"DeepSeek-V3 Technical Report." arXiv:2412.19437, December 2024.
"DeepSeek-V3: Training 671 billion parameters with a $6 million budget." Weights & Biases, 2025.
"Multi-head Latent Attention in DeepSeek-V2." DeepSeek technical documentation.
"DeepSeek-R1 training pipeline." DeepSeek-AI technical report.
"DeepSeek-V3.2-Exp and DeepSeek Sparse Attention." September 2025.
"DeepSeek-OCR technical report." DeepSeek-AI, October 2025.
"DeepSeek-OCR: end-to-end document understanding." DeepSeek-AI, October 2025.
"DeepSeek-OCR on GitHub." github.com/deepseek-ai.
"DeepSeek-OCR on Hugging Face." huggingface.co/deepseek-ai.
Simon Willison, "Running DeepSeek-OCR." simonwillison.net, 2025.
"DeepSeek benchmark comparisons." Various sources.
"DeepSeek Coder: Code LLMs." DeepSeek-AI, November 2023.
"DeepSeek-Coder-V2: GPT-4-Turbo comparable performance." DeepSeek-AI, June 2024.
"DeepSeek-R1 released under MIT license." GitHub, Hugging Face, January 2025.
"DeepSeek-V3.1 Aider pass rate." August 2025.
"DeepSeek-V3.2-Exp Huawei Ascend support." September 2025.
"DeepSeek-VL2: Vision-Language model." DeepSeek-AI, 2025.
"DeepSeek-R1 distilled models." DeepSeek-AI, January 2025.
"DeepSeek corporate overview and pricing." Various sources, 2025.
"Liang Wenfeng biography." Various sources.
"Government restrictions on DeepSeek." Various news sources, 2025.
"NIST evaluation of DeepSeek models." NIST, September 2025.
"DeepSeek roadmap priorities." Various sources.
"DeepSeek long-term vision." Various sources.
"DeepSeek service agreements and legal compliance." DeepSeek terms of service.
"DeepSeek-V3-0324 tops open-source non-reasoning benchmarks." Hugging Face, March 2025.
"DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition." arXiv:2504.21801, April 2025.
"DeepSeek-R1-0528 Release." DeepSeek API Docs, May 28, 2025. VentureBeat, May 2025.
"mHC: Manifold-Constrained Hyper-Connections." Hugging Face Papers, arXiv:2512.24880, January 2026.
"Engram Conditional Memory Architecture." DeepSeek-AI and Peking University, January 12, 2026.
"DeepSeek V4: Everything We Know." NxCode, 2026.
"DeepSeek V4 Status Report: March 2026 Timeline." PromptZone, March 2026.
"DeepSeek V4 and Tencent's Hunyuan model to launch in April." Dataconomy, March 16, 2026.
"Mystery AI model suspected to be DeepSeek V4 revealed to be from Xiaomi." The Japan Times, March 19, 2026.
"How disruptive is DeepSeek? Stanford HAI faculty discuss." Stanford Report, February 2025. World Economic Forum, February 2025.
"DeepSeek claims 545% cost-profit ratio." Computerworld, 2025.
"No DeepSeek on Government Devices Act." H.R. 1121, S. 765, Congress.gov, February 2025.
"FY 2026 National Defense Authorization Act (P.L. 119-60)." Signed December 18, 2025.
"DeepSeek trained latest AI model on Nvidia Blackwell chips despite US ban." Reuters via Investing.com, February 2026.
"Anthropic accuses Chinese AI labs of mining Claude as US debates AI chip exports." TechCrunch, February 23, 2026.
"Anthropic claims 3 Chinese companies ripped it off, using its AI tools to train their models." Fortune, February 24, 2026.
"Meet DeepSeek founder Liang Wenfeng, a hedge fund manager." Fortune, January 27, 2025.
"DeepSeek-V3.1: Hybrid Thinking Model." Together AI, August 2025. Hugging Face model card.
"DeepSeek-V3.1-Terminus Release." DeepSeek API Docs, September 22, 2025. TechNode, September 2025.
"DeepSeekMath-V2 scores 118/120 on Putnam Competition." Various sources, November 2025.
"DeepSeek-V3.2 Release." DeepSeek API Docs, December 1, 2025.
"DeepSeek-V3.2 Outperforms GPT-5 on Reasoning Tasks." InfoQ, January 2026.
"DeepSeek kicks off 2026 with paper signalling push to train bigger models for less." South China Morning Post, January 2026.
"U.S. Says DeepSeek Trained New Model on Nvidia Blackwell Chips, Raising Export Control Alarm." Various sources, February 2026.
"Anthropic alleges Chinese AI labs including DeepSeek, Moonshot and MiniMax used fake accounts to distill Claude." VentureBeat, February 2026.
"DeepSeek's Open Source Strategy Sidelines Nvidia And AMD Ahead Of V4 Launch." Open Source For You, March 2026.
"DeepSeek AI Statistics 2026." DemandSage, 2026.
"Jensen Huang says investors got it wrong over DeepSeek stock selloff that wiped $600B from Nvidia." Fortune, February 21, 2025.

External Links

History

Background and Origins (2016-2023)

Founding and Early Development (2023-2024)

DeepSeek-V3 and the December 2024 Release

Global Breakthrough (January 2025)

Continued Development (2025)

2026 Developments

Foundational Research Papers (January 2026)

Blackwell Chip Controversy (February 2026)

Anthropic Distillation Accusations (February 2026)

DeepSeek-V4 Anticipation

Technology

Architecture

Mixture of Experts (MoE)

Multi-head Latent Attention (MLA)

Training Methodology

DeepSeek Sparse Attention (DSA)

Hybrid Thinking/Non-Thinking Modes

Manifold-Constrained Hyper-Connections (mHC)

DeepSeek-OCR (2025)

Infrastructure

Performance Benchmarks

V3.x Series Benchmark Progression

R1-0528 Benchmark Improvements

Models and Products

Major Model Releases

Distilled Models

API and Pricing

Open-Source Philosophy

Organization and Leadership

Liang Wenfeng

Corporate Structure

Organizational Philosophy

Market Impact and Adoption

User Growth

The "DeepSeek Shock" (January 2025)

Impact on AI Industry Cost Assumptions

Cost Comparison

Controversies and Challenges

Security and Privacy Concerns

U.S. Legislative Response

Export Control Issues

Distillation and Model Theft Accusations

Content Alignment

NIST Evaluation

Current State (March 2026)

Future Roadmap

Near-term Priorities (2026)

Long-term Vision (2027-2030)

Legal and Compliance

See Also

References

External Links

Improve this article

Related Articles

DeepSeek 3.0

OpenClaw

Machine learning terms/Fairness

Mistral AI

Mistral AI

Perplexity AI

History

Background and Origins (2016-2023)

Founding and Early Development (2023-2024)

DeepSeek-V3 and the December 2024 Release

Global Breakthrough (January 2025)

Continued Development (2025)

2026 Developments

Foundational Research Papers (January 2026)

Blackwell Chip Controversy (February 2026)

Anthropic Distillation Accusations (February 2026)

DeepSeek-V4 Anticipation

Technology

Architecture

Mixture of Experts (MoE)

Multi-head Latent Attention (MLA)

Training Methodology

DeepSeek Sparse Attention (DSA)

Hybrid Thinking/Non-Thinking Modes

Manifold-Constrained Hyper-Connections (mHC)