DeepSeek-R1

DeepSeek-R1
Developer
Release date
Type
Architecture
Parameters
Context length
Training method
License
Updated version

DeepSeek-R1 is an open-source reasoning-focused large language model developed by DeepSeek, a Chinese artificial intelligence company. Released on January 20, 2025 under the MIT license, R1 was one of the first open-weight models to match the reasoning performance of OpenAI's proprietary o1 model across mathematics, coding, and scientific reasoning tasks. The model uses a Mixture of Experts architecture with 671 billion total parameters, of which 37 billion are activated per forward pass, keeping computational costs manageable during inference.^[1]^[2]

DeepSeek-R1's release triggered what became known as the "DeepSeek shock," a market event on January 27, 2025 that erased over $1 trillion from U.S. technology stocks in a single trading session. Nvidia alone lost approximately $589 billion in market capitalization, the largest single-day loss for any company in stock market history. The shock stemmed from the revelation that a small Chinese startup with roughly 160 employees had trained a reasoning model competitive with the world's most expensive AI systems, using an estimated $5.6 million in compute, a fraction of the hundreds of millions typically spent by Western labs.^[3]^[4]

Beyond its market impact, DeepSeek-R1 was scientifically significant for demonstrating that complex reasoning behaviors could emerge from pure reinforcement learning without supervised fine-tuning. The companion model DeepSeek-R1-Zero, trained entirely through RL, developed chain-of-thought reasoning, self-reflection, and error correction spontaneously during training, a finding that challenged assumptions about how reasoning capabilities must be instilled in language models.^[1]^[5]

Background

DeepSeek had been building toward R1 throughout 2024. The company released DeepSeek-V2 in May 2024 and DeepSeek-V3 in December 2024, both using Mixture of Experts architectures that prioritized computational efficiency. V3 served as the base model for R1's training, providing a strong foundation of general language capabilities and world knowledge.^[2]^[6]

The broader context for R1's development was the emergence of inference-time reasoning as a new paradigm in AI. OpenAI's o1, released in September 2024, had demonstrated that training models with reinforcement learning to "think before answering" could dramatically improve performance on difficult tasks. DeepSeek's contribution was to show that this approach could be replicated with open-source models at a fraction of the cost, and that the reasoning behaviors could emerge more naturally than previously assumed.^[1]

Architecture

Base Model

DeepSeek-R1 is built on top of DeepSeek-V3, which uses a Mixture of Experts transformer architecture. The key architectural features include:^[2]^[6]

671 billion total parameters across all experts
37 billion active parameters per forward pass (only a subset of experts are activated for each token)
256 routed experts per layer with 1 shared expert that is always active
Multi-head Latent Attention (MLA), which compresses the key-value cache to 5-13% of standard attention, enabling efficient long-context processing
164,000-token context window

The MoE architecture is central to R1's efficiency story. By activating only 37 billion of its 671 billion parameters for each token, the model achieves inference costs comparable to a much smaller dense model while maintaining the knowledge capacity of its full parameter count.

Training Pipeline

DeepSeek-R1's training followed a four-stage pipeline that combined supervised learning and reinforcement learning:^[1]^[5]

Stage 1: Cold Start. The DeepSeek-V3 base model was fine-tuned on a small set of curated chain-of-thought reasoning examples. This "cold start" data provided the model with initial examples of structured reasoning, addressing issues like repetitive loops and poor readability that occurred when applying RL directly to the base model.

Stage 2: Reasoning-Oriented Reinforcement Learning. Large-scale RL was applied using Group Relative Policy Optimization (GRPO), focused on tasks with verifiable answers (mathematics, coding, logic problems). The model learned to generate extended chains of thought and was rewarded based solely on the correctness of its final answers.

Stage 3: Rejection Sampling and Supervised Fine-Tuning. The RL-trained model generated a large set of reasoning traces. High-quality traces were selected through rejection sampling and combined with non-reasoning data (general conversation, writing, etc.) for an additional round of supervised fine-tuning.

Stage 4: Reinforcement Learning for All Scenarios. A final round of RL was applied across both reasoning and general tasks, optimizing for helpfulness and harmlessness using a combination of rule-based and model-based reward signals.

Group Relative Policy Optimization (GRPO)

GRPO is the reinforcement learning algorithm used to train both R1-Zero and R1. Originally proposed by DeepSeek for their earlier DeepSeek-Math model in a February 2024 paper, GRPO simplifies the RL training process compared to Proximal Policy Optimization (PPO), which had been the standard approach for language model RL training (as used in RLHF).^[1]^[5]^[16]

The key innovation of GRPO is eliminating the need for a separate critic (value) model. In standard PPO-based RLHF, two models must be maintained during training: the policy model being optimized and a value model that estimates expected returns. The value model alone can be as large as the policy model, effectively doubling the computational requirements of training. GRPO removes this requirement by using a simpler baseline.^[1]^[16]

The algorithm works as follows:^[1]^[5]^[16]

For each prompt in the training batch, GRPO samples a group of G responses (typically G = 16 to 64) from the current policy.
Each response is scored by a reward function (for math, whether the final numerical answer is correct; for code, whether the output passes test cases).
The mean and standard deviation of rewards within the group are computed.
The advantage of each response is calculated as its normalized reward relative to the group average: advantage_i = (reward_i - mean) / std.
The policy is updated to increase the probability of high-advantage responses and decrease the probability of low-advantage responses, subject to a KL divergence constraint that prevents the policy from moving too far from its previous version.

This group-relative approach has several advantages. By normalizing rewards within each group, GRPO reduces the impact of reward scale differences across different problem types. The elimination of the value model cuts training memory requirements roughly in half, allowing the same hardware to train larger models. The algorithm is also simpler to implement and tune than PPO with a learned value function.^[16]

Since R1's release, GRPO has become widely adopted in the language model community. Hugging Face's TRL library added native GRPO support, and numerous research groups have used it to train their own reasoning models. The algorithm's combination of simplicity, efficiency, and effectiveness made it particularly attractive for smaller teams and academic researchers.^[16]

The reward signals used in R1's training were deliberately simple. For math problems, the reward was based on whether the final numerical answer matched the ground truth. For coding tasks, it was based on whether the generated code passed test cases. This simplicity was part of the research insight: complex reasoning behaviors could emerge from optimizing for straightforward correctness metrics, without needing elaborate reward shaping.

DeepSeek-R1-Zero

Before training R1, DeepSeek conducted an experiment called R1-Zero that became one of the most discussed results in the paper. R1-Zero was trained by applying reinforcement learning directly to the DeepSeek-V3 base model, without any supervised fine-tuning or curated reasoning examples. The model was simply given problems and rewarded for producing correct answers.^[1]^[5]

Emergent Reasoning Behaviors

Despite receiving no explicit training on how to reason, R1-Zero spontaneously developed several sophisticated reasoning behaviors during RL training:^[1]^[5]

Chain-of-thought reasoning: The model learned to break problems into steps and work through them sequentially.
Self-reflection: R1-Zero began inserting phrases like "wait, let me reconsider" into its reasoning chains, pausing to re-evaluate its approach.
Error detection and correction: The model developed the ability to spot mistakes in its own intermediate steps and correct them.
Strategy exploration: When an initial approach failed, the model learned to try alternative methods.

Detailed Emergence Timeline

Researchers tracked the emergence of reflective reasoning behaviors across training by measuring the frequency of specific terms in the model's outputs. The results showed a clear phase transition:^[1]^[5]^[17]

Training Stage	Reflective Term Frequency	Behavior
Steps 0-4,000	Virtually absent	Model generates linear, non-reflective solutions
Steps 4,000-7,000	Sporadic appearance	Occasional use of "wait," "but," "however"
Steps 8,000+	Marked increase	Systematic self-monitoring and error correction

Specific reflective terms tracked included "wait," "mistake," "however," "but," "retry," "error," "verify," "wrong," "evaluate," and "check." These terms were virtually absent in the early stages of training, appeared sporadically in the middle stages, and showed a marked increase after step 8,000, suggesting the emergence of temporal reasoning or self-monitoring behavior.^[17]

The model also showed a clear increase in the length of its reasoning chains over the course of training. Early in RL training, the model generated short, direct answers. As training progressed, the average response length grew steadily, with the model learning to allocate more "thinking time" to harder problems. This adaptive allocation of compute was not explicitly trained but emerged naturally from the RL optimization process.^[1]^[5]

The "Aha Moment"

DeepSeek's paper highlighted what they called an "aha moment" during R1-Zero's training. At a certain point in RL training, the model showed a sudden increase in the use of reflective language (particularly the word "wait") during its reasoning chains. This marked a qualitative shift in the model's reasoning patterns, where it began systematically re-evaluating and correcting its own work rather than simply proceeding linearly through a solution.^[1]^[5]

The aha moment became widely discussed in the AI research community. DeepSeek described it as evidence of "the self-evolution process" of the model, suggesting that reinforcement learning could induce genuinely emergent cognitive strategies. However, subsequent research by other groups has debated whether these behaviors were truly emergent or whether traces of reflective reasoning were already present in the base model's pre-training data. A study by Sea AI Lab titled "There May Not be Aha Moment in R1-Zero-like Training" argued that the observed behaviors could be attributed to pre-existing patterns in the training data rather than genuine emergence.^[5]^[7]

Limitations of R1-Zero

Despite its impressive emergent behaviors, R1-Zero had practical limitations that motivated the development of the full R1 model. Its outputs often suffered from poor readability, with reasoning chains that mixed languages, repeated phrases endlessly, or failed to clearly delineate the final answer. The model also struggled with tasks outside of mathematics and coding, where the lack of supervised fine-tuning left it without appropriate response formats. These issues were addressed in R1 through the cold-start data and multi-stage training pipeline.^[1]

Benchmarks

DeepSeek-R1 achieved performance competitive with OpenAI's o1 across major reasoning benchmarks.

Benchmark	DeepSeek-R1	OpenAI o1	GPT-4o	Description
AIME 2024	79.8% (pass@1)	74.3% (pass@1)	13.4%	American Invitational Mathematics Exam
MATH-500	97.3%	96.4%	60.3%	Mathematical problem solving
GPQA Diamond	71.5%	78.0%	53.6%	Graduate-level science questions
Codeforces Elo	2,029	1,891	-	Competitive programming rating
MMLU	91.8%	92.3%	87.2%	Multitask language understanding
LiveCodeBench	65.9%	-	-	Real-world coding tasks
SWE-bench Verified	49.2%	48.9%	33.2%	Software engineering tasks
HumanEval	85.4%	92.4%	90.2%	Code generation

^[1]^[2]^[8]

The results showed that R1 matched or exceeded o1 on most mathematical and coding benchmarks while trailing slightly on general knowledge and code generation tasks. The fact that an open-source model could achieve these results, trained at a fraction of the cost, was the central claim that drove both the scientific interest and the market reaction.

Distilled Models

Alongside R1, DeepSeek released six smaller "distilled" models created through knowledge distillation, where R1's reasoning capabilities were transferred to smaller, more efficient base models. DeepSeek used R1 as a teacher model to generate approximately 800,000 high-quality reasoning traces, which were then used to fine-tune smaller models from the Qwen2.5 and Llama 3 families.^[1]^[2]

Distilled Model	Base Model	Parameters	AIME 2024	MATH-500	GPQA Diamond
DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-1.5B	1.5B	28.9%	83.9%	-
DeepSeek-R1-Distill-Qwen-7B	Qwen2.5-7B	7B	55.5%	92.8%	49.1%
DeepSeek-R1-Distill-Llama-8B	Llama 3.1-8B	8B	50.4%	89.1%	49.0%
DeepSeek-R1-Distill-Qwen-14B	Qwen2.5-14B	14B	69.7%	93.9%	59.1%
DeepSeek-R1-Distill-Qwen-32B	Qwen2.5-32B	32B	72.6%	94.3%	62.1%
DeepSeek-R1-Distill-Llama-70B	Llama 3.3-70B	70B	70.0%	94.5%	65.2%

^[1]^[2]

The distilled models were a major part of R1's impact. The smallest model, Qwen-1.5B, outperformed GPT-4o and Claude 3.5 Sonnet on math benchmarks despite being small enough to run on consumer hardware. The 32B and 70B distilled models set new state-of-the-art results among dense (non-MoE) open-source models on reasoning benchmarks, outperforming the contemporaneous QwQ-32B-Preview by substantial margins.^[1]

Consumer Hardware Deployment

The distilled models enabled local deployment on consumer hardware, which became a major driver of community adoption. The hardware requirements for running different distilled models are as follows:^[18]

Distilled Model	Minimum VRAM	Recommended GPU	Performance Notes
1.5B / 7B / 8B	8 GB	NVIDIA RTX 3060 12GB	Runs efficiently at standard quantization
14B	12-16 GB	NVIDIA RTX 4070 Ti 16GB	Fits in VRAM at 4-bit quantization
32B	20-24 GB	NVIDIA RTX 3090/4090 24GB	Smooth performance at 4-bit quantization
70B	40-48 GB	2x NVIDIA RTX 3090 or A100	Requires multi-GPU or offloading

The 32B distilled model hit a particularly attractive sweet spot, offering performance comparable to OpenAI's o1-mini on several benchmarks while running on a single consumer-grade RTX 4090 GPU. Running locally eliminated API costs, kept data private, removed rate limits, and provided offline access to explicit chain-of-thought reasoning.^[18]

The distilled models proved especially popular with the open-source community. Within weeks of release, hundreds of derivative models were created on Hugging Face, fine-tuned for specific use cases ranging from medical reasoning to financial analysis.

MIT License and Open-Source Release

DeepSeek released R1, R1-Zero, and all six distilled models under the MIT license, one of the most permissive open-source licenses available. This meant that anyone could download, modify, redistribute, and commercially deploy the models without restriction. The full model weights, training code, and technical paper were all made publicly available on GitHub and Hugging Face.^[1]^[2]

The open-source strategy was a deliberate choice that amplified R1's impact far beyond what a proprietary release would have achieved. Within a month of launch, over 700 community-built models derived from R1 appeared on Hugging Face, collectively downloaded more than 5 million times. Major cloud providers including Microsoft Azure, Amazon Web Services, and Nvidia's inference platforms quickly added support for R1, making it accessible through familiar enterprise interfaces.^[2]^[9]

Community Adoption

DeepSeek-R1 became the most-liked model on Hugging Face among nearly 1.5 million models on the platform, surpassing 10,000 likes. The variant versions collectively exceeded 10 million downloads. R1's release also catalyzed a broader shift in the open-source AI ecosystem: the number of competitive Chinese organizations releasing models increased dramatically, with Baidu going from zero releases on Hugging Face in 2024 to over 100 in 2025, and ByteDance and Tencent each increasing releases by eight to nine times.^[19]

The MIT license also enabled a wave of academic research building on R1's approach. Researchers at universities and smaller labs could study the model's reasoning traces, replicate the RL training methodology, and test hypotheses about emergent reasoning that would have been impossible with a proprietary model.

The "DeepSeek Shock"

The market reaction to R1's release became a defining financial event of early 2025. On January 27, 2025, one week after R1's public release, U.S. technology stocks experienced their steepest single-day decline in history.^[3]^[4]

What Happened

The sell-off was triggered by a sudden reassessment of the AI investment thesis. For years, the market had priced technology companies, especially chipmakers and cloud providers, on the assumption that building frontier AI required massive and growing capital expenditures. DeepSeek's demonstration that a 160-person Chinese startup could produce competitive results for $5.6 million in compute undermined that assumption.^[3]^[4]

Nvidia's stock fell nearly 17% in a single session, losing approximately $589 billion in market capitalization. This was the largest single-day market value loss for any company in history. Other semiconductor companies including Broadcom, Marvell, Micron, and TSMC also fell sharply. The Nasdaq composite lost roughly $1 trillion in value by the end of the day. Meta and Alphabet (Google's parent company) also declined significantly.^[3]^[4]

Broader Implications

Marc Andreessen, the prominent technology investor, described the event as "AI's Sputnik moment," drawing a parallel to the 1957 Soviet satellite launch that shocked the United States into accelerating its space program. The comparison captured the sense that a competitor working with far fewer resources had achieved something that the established players, with their billions in investment, had assumed only they could do.^[4]^[10]

The market impact extended beyond the immediate sell-off. Chinese AI companies entered an aggressive price war, with some cutting API prices by up to 97% in the weeks following R1's release. In the United States, the event forced a public debate about whether the hundreds of billions being invested in AI data centers and chip manufacturing were truly necessary, or whether architectural innovation could substitute for raw compute.^[3]^[4]

Stanford HAI faculty noted that DeepSeek's open releases represented "a significant step in democratizing AI," enabling smaller companies and individual developers to build on frontier-capable models without massive compute budgets.^[9]

DeepSeek-R1-0528

On May 28, 2025, DeepSeek released a major update to R1 designated R1-0528. Despite being described as a "minor upgrade" in official communications, the update delivered substantial improvements across all major benchmarks.^[11]^[12]

Performance Improvements

Benchmark	R1 (Jan 2025)	R1-0528 (May 2025)	Change
AIME 2024	79.8%	91.4%	+11.6%
AIME 2025	70.0%	87.5%	+17.5%
Codeforces Rating	~1,530	~1,930	+400 points

^[11]^[12]

The AIME 2025 improvement from 70% to 87.5% was particularly notable, bringing R1-0528 into competitive range with OpenAI's o3 (88.9% on AIME 2025). The Codeforces rating jump of approximately 400 points reflected dramatically improved code generation and problem-solving ability.^[11]

Technical Changes

R1-0528 demonstrated deeper chain-of-thought reasoning than its predecessor. On challenging problems, the model averaged approximately 23,000 thinking tokens per query, compared to roughly 12,000 for the original R1. This near-doubling of reasoning depth, enabled by additional algorithmic optimization during post-training, contributed to the accuracy improvements.^[12]

DeepSeek also reported that the rate of hallucinations (false or misleading outputs) was reduced by approximately 45-50% in scenarios such as rewriting and summarization.^[12]

New Developer Features

The update added several capabilities requested by the developer community:^[12]

Function calling: Enabling integration with external tools and APIs
JSON output support: Structured output for programmatic use
System prompt support: No longer requiring manual formatting to trigger the thinking mode
Improved multilingual performance: Better handling of non-English reasoning tasks

DeepSeek also released a distilled model from R1-0528: DeepSeek-R1-0528-Qwen3-8B, which achieved state-of-the-art performance among open-source 8B models on AIME 2024, surpassing the base Qwen3 8B by 10 percentage points.^[12]

API Pricing

DeepSeek offered R1 through its API at prices dramatically lower than competing reasoning models.

Model	Input (per 1M tokens, cache miss)	Input (per 1M tokens, cache hit)	Output (per 1M tokens)
DeepSeek-R1	$0.55	$0.14	$2.19
OpenAI o1	$15.00	$7.50	$60.00
OpenAI o3	$2.00	$0.50	$8.00

^[13]^[14]

The pricing differential was stark: R1 was roughly 27 times cheaper than o1 for input tokens and 27 times cheaper for output tokens. Even after OpenAI's June 2025 price cuts brought o3 down to $2/$8, R1 remained approximately 3-4 times cheaper. Combined with the MIT license allowing self-hosting (eliminating API costs entirely for organizations with their own compute), R1's economics were a core part of its appeal.

Impact on the AI Industry

DeepSeek-R1's impact extended well beyond its benchmark scores and market disruption. The model fundamentally changed several assumptions about AI development.

Reasoning Does Not Require Massive Compute

Before R1, the prevailing assumption was that training frontier reasoning models required resources available only to a handful of well-funded Western labs. DeepSeek showed that a combination of architectural innovation (MoE, MLA), efficient training algorithms (GRPO, FP8 mixed precision), and clever engineering could produce competitive results at dramatically lower cost. This finding had practical consequences: smaller companies and research institutions began building on R1's open weights rather than training models from scratch.^[1]^[9]

Open Source Can Compete at the Frontier

R1 demonstrated that open-source AI models could match proprietary frontier models in at least one important capability dimension (reasoning). This intensified the debate within the AI industry about the relative merits of open and closed development approaches. Labs that had been reluctant to open-source their models faced renewed pressure to justify keeping weights proprietary, while the open-source community gained a powerful new proof point for their approach.^[9]

The Emergence Argument

R1-Zero's spontaneous development of reasoning strategies through pure RL, without being shown any examples of reasoning, contributed to the ongoing scientific debate about emergent abilities in language models. The result suggested that reasoning was not something that needed to be explicitly taught through supervised learning but could arise naturally from optimization pressure on task performance. This finding influenced subsequent research across multiple labs exploring RL-based training for reasoning.^[1]^[5]

Competitive Response

R1's release accelerated development timelines across the industry. Several months after R1, multiple labs released improved reasoning models: OpenAI shipped o3-mini in January 2025 and o3 in April 2025; Google released Gemini 2.5 Pro with extended thinking; and Anthropic enhanced Claude's reasoning capabilities. The competitive dynamic R1 created pushed the entire field forward at a faster pace than might otherwise have occurred.

Security and Regulatory Concerns

As a model developed by a Chinese company, DeepSeek-R1 faced regulatory scrutiny in multiple Western countries. Concerns centered on data privacy (DeepSeek's servers are located in China, subject to Chinese data laws), potential content alignment with Chinese government positions, and national security implications of a widely deployed Chinese AI model.^[15]

US Legislative Response

The US government response to DeepSeek was swift and multi-pronged. On February 6, 2025, Representatives Josh Gottheimer and Darin LaHood introduced the bipartisan "No DeepSeek on Government Devices Act," which specifically targeted the DeepSeek mobile application and API for prohibition on federal government devices. The bill passed in August 2025, banning federal employees from using the app.^[15]^[20]

Additional legislative efforts included Representative Mark Green's China Technology Transfer Control Act and Senator Josh Hawley's "Decoupling America's Artificial Intelligence Capabilities from China Act," introduced on January 29, 2025. Gottheimer and LaHood also wrote to all 50 US governors urging them to implement similar bans at the state level.^[15]^[20]

Government Agency Restrictions

Multiple government agencies independently restricted or banned use of DeepSeek products:^[15]

Agency / Government	Action	Date
U.S. Navy	Issued warnings against use	Late January 2025
NASA	Reinforced security concerns	January 31, 2025
Texas	First state ban on government systems	February 2025
Virginia	Banned on government systems	February 2025
New York	Banned on government systems	February 2025
U.S. Congress	Restricted on congressional devices	February 2025
Pentagon	Restricted usage	February 2025
Australia	Government agency restrictions	February 2025
South Korea	Government restrictions	February 2025
Taiwan	Government restrictions	February 2025
India	Government restrictions	February 2025

The Fiscal Year 2026 National Defense Authorization Act, signed in December 2025, included provisions restricting DeepSeek usage within the Department of Defense and Intelligence Community.^[15]

Security Vulnerabilities

Technical security analyses identified several concerns with DeepSeek's infrastructure, including reports of hidden code linking to China Mobile servers, the collection of keystroke data, data storage on Chinese servers subject to Chinese government access, and multiple cybersecurity test failures. Content analysis also documented instances of censorship related to politically sensitive topics, including Tiananmen Square and Taiwan, with DeepSeek consistently avoiding inquiries related to these subjects.^[15]^[20]

These restrictions applied primarily to government use. Commercial and individual use of R1's open weights remained unrestricted in most jurisdictions, and the model continued to be widely deployed through cloud providers and self-hosted infrastructure. The open-source nature of the model weights meant that security concerns about DeepSeek's servers could be entirely mitigated by self-hosting.

Current State

As of March 2026, DeepSeek-R1 and its derivatives remain among the most widely used open-source reasoning models. The R1-0528 update brought the model closer to the performance of proprietary alternatives like OpenAI's o3, while maintaining its cost and accessibility advantages.

DeepSeek has continued releasing new models beyond R1, including DeepSeek-V3.1 (August 2025), DeepSeek-V3.2-Exp (September 2025), and DeepSeek-OCR (October 2025). The company's anticipated DeepSeek-V4 release, expected in April 2026, is likely to include a next-generation reasoning component that builds on R1's approach.

The model's legacy is perhaps best measured by its influence on the field. R1 proved that reasoning-capable language models could be built openly and cheaply, that reinforcement learning could induce genuine reasoning behaviors without supervised examples, and that a small team with limited resources could compete with the largest AI labs in the world. These demonstrations reshaped the economics, strategy, and research direction of the AI industry in ways that continue to play out.

References

"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning." DeepSeek-AI, arXiv:2501.12948, January 2025. https://arxiv.org/abs/2501.12948
"DeepSeek-R1." DeepSeek-AI, Hugging Face. https://huggingface.co/deepseek-ai/DeepSeek-R1
"DeepSeek's $1 trillion stock market crash." Fast Company, January 2025. https://www.fastcompany.com/91268132/deepseek-ai-stock-market-crash-today-nvidia-tsmc-gain-ground
"A shocking Chinese AI advancement called DeepSeek is sending US stocks plunging." CNN Business, January 27, 2025. https://www.cnn.com/2025/01/27/tech/deepseek-stocks-ai-china
"DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning." Nature, 2025. https://www.nature.com/articles/s41586-025-09422-z
"DeepSeek-R1 Release." DeepSeek API Docs, January 20, 2025. https://api-docs.deepseek.com/news/news250120
"There May Not be Aha Moment in R1-Zero-like Training: A Pilot Study." Sea AI Lab, 2025. https://sail.sea.com/blog/articles/62
"DeepSeek-R1: Features, o1 Comparison, Distilled Models & More." DataCamp, 2025. https://www.datacamp.com/blog/deepseek-r1
"How disruptive is DeepSeek? Stanford HAI faculty discuss." Stanford Report, February 2025.
"Why the Release of China's DeepSeek AI Software Triggered a Stock Market Panic and Trillion Dollar Loss." EDRM, January 2025. https://edrm.net/2025/01/why-the-release-of-chinas-deepseek-ai-software-triggered-a-stock-market-panic-and-trillion-dollar-loss/
"DeepSeek-R1-0528 Release." DeepSeek API Docs, May 28, 2025. https://api-docs.deepseek.com/news/news250528
"DeepSeek R1-0528 arrives in powerful open source challenge to OpenAI o3 and Google Gemini 2.5 Pro." VentureBeat, May 2025. https://venturebeat.com/ai/deepseek-r1-0528-arrives-in-powerful-open-source-challenge-to-openai-o3-and-google-gemini-2-5-pro
"Models & Pricing." DeepSeek API Docs. https://api-docs.deepseek.com/quick_start/pricing
"Pricing." OpenAI. https://openai.com/api/pricing/
"Government restrictions on DeepSeek." Various sources, 2025.
"Group Relative Policy Optimization (GRPO)." Cameron R. Wolfe, Substack. https://cameronrwolfe.substack.com/p/grpo
"From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning." Hugging Face Blog. https://huggingface.co/blog/NormalUhr/deepseek-r1-explained
"Step-by-Step: Running DeepSeek-R1 Distilled Models on Consumer GPUs 2025." MarkAICode. https://markaicode.com/deepseek-r1-distilled-models-consumer-gpu-guide-2025/
"State of Open Source on Hugging Face: Spring 2026." Hugging Face Blog. https://huggingface.co/blog/huggingface/state-of-os-hf-spring-2026
"DeepSeek in the Crosshairs: Legislative Actions, International Bans, and Censorship Reports." ComplexDiscovery, 2025. https://complexdiscovery.com/deepseek-in-the-crosshairs-legislative-actions-international-bans-and-censorship-reports/

Background

Architecture

Base Model

Training Pipeline

Group Relative Policy Optimization (GRPO)

DeepSeek-R1-Zero

Emergent Reasoning Behaviors

Detailed Emergence Timeline

The "Aha Moment"

Limitations of R1-Zero

Benchmarks

Distilled Models

Consumer Hardware Deployment

MIT License and Open-Source Release

Community Adoption

The "DeepSeek Shock"

What Happened

Broader Implications

DeepSeek-R1-0528

Performance Improvements

Technical Changes

New Developer Features

API Pricing

Impact on the AI Industry

Reasoning Does Not Require Massive Compute

Open Source Can Compete at the Frontier

The Emergence Argument

Competitive Response

Security and Regulatory Concerns

US Legislative Response

Government Agency Restrictions

Security Vulnerabilities

Current State

See Also

References

Related Articles

OpenAI o1

OpenAI o3

GPT-5

GPT-3.5

DeepSeek-OCR

Liang Wenfeng

Background

Architecture

Base Model

Training Pipeline

Group Relative Policy Optimization (GRPO)

DeepSeek-R1-Zero

Emergent Reasoning Behaviors

Detailed Emergence Timeline

The "Aha Moment"

Limitations of R1-Zero

Benchmarks

Distilled Models

Consumer Hardware Deployment

MIT License and Open-Source Release

Community Adoption

The "DeepSeek Shock"

What Happened

Broader Implications

DeepSeek-R1-0528

Performance Improvements

Technical Changes

New Developer Features

API Pricing

Impact on the AI Industry

Reasoning Does Not Require Massive Compute

Open Source Can Compete at the Frontier

The Emergence Argument

Competitive Response

Security and Regulatory Concerns

US Legislative Response

Government Agency Restrictions

Security Vulnerabilities

Current State

See Also

References

Related Articles

OpenAI o1

OpenAI o3