# Nemotron 3

> Source: https://aiwiki.ai/wiki/nemotron_3
> Updated: 2026-06-25
> Categories: AI Models, Large Language Models, NVIDIA, Open Source AI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Nemotron 3** is a family of open-weights [large language model](/wiki/large_language_model) systems released by [NVIDIA](/wiki/nvidia) on December 15, 2025, built for agentic AI and consisting of three sparse mixture-of-experts variants named Nano, Super, and Ultra. [1] The family uses a hybrid [Mamba 2](/wiki/mamba_2) and Transformer architecture combined with a LatentMoE expert design on the larger models, and ships with context windows of up to 1 million tokens. [1][2] Nemotron 3 Nano launched at 30 billion total parameters with up to 3 billion active per token, while Nemotron 3 Super at roughly 100 billion total with up to 10 billion active and Nemotron 3 Ultra at roughly 500 billion total with up to 50 billion active were announced for release in the first half of 2026. [1] NVIDIA reported that Nemotron 3 Nano delivers up to 4 times higher token throughput than its predecessor Nemotron 2 Nano while reducing reasoning-token generation by up to 60 percent. [1] The release also included a separate multimodal variant, Nemotron 3 Nano Omni, which adds vision and audio encoders to the same base. [7]

At launch NVIDIA founder and chief executive Jensen Huang framed the family as an open platform, stating: "Open innovation is the foundation of AI progress. With Nemotron, we're transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale." [1] The accompanying white paper summarized the lineup as models that "deliver strong agentic, reasoning, and conversational capabilities" across the three sizes. [3]

A naming clarification matters here. NVIDIA has used the "Nemotron-3" label twice. The first use was Nemotron-3 8B, a dense 8 billion parameter family released in November 2023 for the NeMo framework, with a 4,096 token context window and training on 3.8 trillion tokens spanning 53 human languages and 37 programming languages. [21] That earlier model is a separate product and is not the subject of this article. The Nemotron 3 family discussed here is the December 2025 series of agentic [Open Source AI](/wiki/open_source_ai) models, which NVIDIA branded simply as "Nemotron 3" without any version suffix on the individual model names. NVIDIA's research site, press release, and technical report all use this convention. [1][2][4] The reuse of the version number across two unrelated product lines is unusual but consistent across NVIDIA's official communications for the 2025 release.

The family was positioned as NVIDIA's answer to the wave of open-weights reasoning and agent models that defined 2025, including [Llama 3.1](/wiki/llama_3_1), Qwen 3, GPT-OSS, and DeepSeek V3. NVIDIA emphasized three things at launch: throughput efficiency on its own Blackwell hardware, openness of weights and training data, and suitability for multi-agent workflows. [1][2] The Nano variant was released the same day as the announcement, with weights, a technical report, and complete training recipes published on Hugging Face and GitHub under the NVIDIA Open Model License. [4][5] Super and Ultra were promised in the months following, with the technical report covering all three.

## What is NVIDIA Nemotron 3?

Nemotron 3 is NVIDIA's December 2025 family of open-weights large language models designed for building agentic AI applications, released in Nano, Super, and Ultra sizes. [1] The models combine a Mixture-of-Experts feedforward design with a hybrid stack of Mamba 2 state-space layers and Transformer attention layers, an approach NVIDIA describes as delivering "best-in-class throughput while having better or on-par accuracy than standard Transformers." [2] NVIDIA released not only the model weights but also the training data, training recipes, evaluation harness, and reinforcement learning libraries, going substantially beyond a typical open-weights release. [3][9]

The defining capabilities are long context, throughput efficiency, and agent readiness. All three sizes support context windows of up to 1 million tokens, and every model is post-trained with multi-environment reinforcement learning for reasoning, multi-step tool use, and granular reasoning-budget control. [1][3] The smallest model, Nano, is positioned as outperforming comparable open models on accuracy while remaining highly cost-efficient for inference; Super is optimized for high-volume agent workloads such as IT ticket automation; and Ultra is positioned as a state-of-the-art reasoning engine for complex planning and research. [3]

## Background: how did the Nemotron line evolve?

The Nemotron name traces back to November 2023, when NVIDIA released Nemotron-3 8B for enterprise chatbot and copilot development through the NeMo framework. [21] That first Nemotron was a dense decoder-only Transformer family (base, chat, and question-answering variants) aimed at customer service and internal assistant deployments rather than at competing with OpenAI or Anthropic frontier systems. The base model carried 8 billion parameters, a 4,096 token context window, and was trained on 3.8 trillion tokens covering 53 human languages and 37 programming languages. [21] It received modest attention and was followed in February 2024 by Nemotron-4 15B, a 15 billion parameter decoder-only Transformer trained on 8 trillion tokens and designed to fit on a single A100 or H100 GPU. [22]

The family's profile changed in June 2024 with the release of Nemotron-4 340B, a triple of Base, Instruct, and Reward models with 340 billion parameters each. Nemotron-4 340B was specifically positioned as a synthetic data generator for training other models, and NVIDIA disclosed that more than 98 percent of the alignment data for the family had itself been synthetically generated. The release fit into a broader pattern in 2024 where labs began publishing models intended as building blocks for downstream training rather than as products in their own right.

In August 2025 NVIDIA shifted architectural direction with [Nemotron Nano 2](/wiki/nemotron_nano_2), a hybrid Mamba-Transformer model trained on roughly 20 trillion tokens. Nemotron Nano 2 was the first member of the family to use state-space layers alongside attention, an approach NVIDIA had been exploring through internal research and academic collaborations. The hybrid design draws on the [Mamba 2](/wiki/mamba_2) architecture introduced by Albert Gu and Tri Dao in 2024, which uses structured state-space models with selective state updates to achieve linear-time sequence processing where standard attention scales quadratically. Nemotron Nano 2 served as the direct architectural ancestor of the December 2025 Nemotron 3 release.

Four months later, on December 15, 2025, NVIDIA debuted what it called the Nemotron 3 family. [1] The launch was accompanied by a press release from NVIDIA's investor relations group, a research microsite at research.nvidia.com, an arXiv preprint, and a 3-trillion-token data release. [1][2][3] NVIDIA's framing emphasized that the models were built for agentic AI rather than chat, with explicit support for tool calling, structured output, and reasoning trace generation. Early enterprise partners named at launch included Accenture, Cadence, CrowdStrike, Cursor, Deloitte, EY, Oracle, Palantir, Perplexity, ServiceNow, Siemens, Synopsys, and Zoom. [1]

## What models are in the Nemotron 3 family?

The Nemotron 3 family at launch consisted of three text models plus one multimodal variant. The three sizes are differentiated by total parameter count, active parameter count, and intended use case rather than by architectural family. All three text models share the same hybrid Mamba-Transformer MoE skeleton and the same tokenizer.

| Model | Total parameters | Active parameters | Context window | Status at launch |
|---|---|---|---|---|
| Nemotron 3 Nano | 30B (31.6B with embeddings) | 3.5B (3.2B excluding embeddings) | 1M tokens | Released December 15, 2025 |
| Nemotron 3 Super | approximately 100B | approximately 10B | 1M tokens | Announced for H1 2026 release |
| Nemotron 3 Ultra | approximately 500B | approximately 50B | 1M tokens | Announced for H1 2026 release |
| Nemotron 3 Nano Omni | 30B with audio and vision encoders | 3B | not specified | Released December 15, 2025 |

Nemotron 3 Nano is the only fully text model that was available for download at the time of the announcement. [1] It launched in two precision formats on Hugging Face: BF16 weights at NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 and FP8 weights at NVIDIA-Nemotron-3-Nano-30B-A3B-FP8. [5][6] The naming convention "A3B" indicates 3 billion active parameters per forward pass.

Nemotron 3 Super and Nemotron 3 Ultra were described in the technical report but the weights were not released in December 2025. NVIDIA committed to publishing both later, with the Super model targeted at high-volume agent workloads such as IT ticket automation and Ultra positioned as a state-of-the-art reasoning engine for complex planning and research. [3] The two larger models also use the NVFP4 training format introduced with NVIDIA's Blackwell GPUs, which represents weights in 4-bit floating point during training rather than in a higher precision format that is later quantized. [3]

Nemotron 3 Nano Omni shares the 30B-A3B base with the text Nano model but adds a Parakeet audio encoder and a C-RADIOv4-H vision encoder, with 3D convolutions for capturing motion between video frames. [8] The Omni variant targets document intelligence, video understanding, and multimodal agent reasoning. NVIDIA reported best-in-class results for the Omni model on MMlongbench-Doc, OCRBenchV2, WorldSense, DailyOmni, and VoiceBench, claiming up to 9.2 times greater effective system capacity for video reasoning compared with other open omni models at matched interactivity thresholds. [7][8]

## How is Nemotron 3 built? Architecture

The core architectural choice across the Nemotron 3 family is a hybrid stack of state-space layers, attention layers, and mixture-of-experts feedforward layers. NVIDIA describes the design as using a "Mixture-of-Experts hybrid Mamba-Transformer architecture to provide best-in-class throughput and context lengths of up to 1M tokens." [3] The text Nano variant is the most thoroughly documented version of this design and has been described in detail in both the Hugging Face model card and the technical report. [4][5]

Nemotron 3 Nano has 52 layers in total. The composition includes 23 [Mamba 2](/wiki/mamba_2) layers handling most of the sequence mixing work, 6 attention layers using Grouped Query Attention with 2 groups, and 23 mixture-of-experts feedforward layers. [5] Each MoE layer contains 128 routed experts plus 1 shared expert, with 6 experts activated per token. [5] Total active parameters land at 3.5 billion out of 30 billion total, giving an activation ratio of roughly 12 percent. The model also includes Multi-Token Prediction layers to speed up generation, predicting several tokens ahead during inference rather than one at a time.

The hybrid approach traces to a research observation that pure attention models become inefficient at very long contexts because of the quadratic memory and compute cost of attention. State-space models like Mamba 2 process sequences in linear time and retain selective memory through structured updates, but they tend to underperform attention on tasks that require precise lookup over recent tokens. The Nemotron 3 family interleaves the two so that attention layers handle the precision-critical work while Mamba layers carry the sequence-mixing load. The same general pattern appears in other 2025 hybrid releases including [Jamba 2](/wiki/jamba_2) and [Jet-Nemotron](/wiki/jet_nemotron), though the specific layer ratios and routing strategies differ.

For the larger Super and Ultra variants, NVIDIA introduced a technique called LatentMoE, described in the research microsite as "a novel hardware-aware expert design for improved accuracy." [2] LatentMoE projects expert computation into a lower-dimensional latent space before applying the routed experts, which reduces the parameter count required for a given level of capacity. The technical report frames this as an extension of the Multi-head Latent Attention idea popularized by DeepSeek, applied to feedforward rather than attention layers. [3]

The tokenizer is shared across the family. Context length tops out at 1 million tokens, with a default deployment window of 256K tokens for most inference engines. NVIDIA reports retention of accuracy across context length using the RULER benchmark, with the Nano variant scoring 91.3 on RULER-100 at 512K tokens and 86.3 at the full 1 million token window. [5] Supported languages are English, Spanish, French, German, Italian, and Japanese for natural language, plus 43 programming languages. [5]

The Nano model is unified across reasoning and non-reasoning use cases. A single set of weights handles both modes, controlled by an enable_thinking flag in the chat template. With thinking enabled, the model emits a reasoning trace before its final answer; with thinking disabled, it responds directly. NVIDIA also exposes a budget control parameter for limiting the number of reasoning tokens generated, an interface choice that has become common across 2025 reasoning releases. [3]

## How was Nemotron 3 trained?

The Nano variant was pre-trained on 25 trillion tokens, with the training distribution broken down across multiple categories. [5] The total training mix listed on the Hugging Face model card sums to 13.34 trillion tokens, indicating that some sources contributed multiple epochs to the full 25 trillion token count. Synthetic data made up 3.53 trillion tokens of the mix, the largest single category. English Common Crawl contributed 3.46 trillion tokens, multilingual data 1.74 trillion, code 1.04 trillion, and STEM supervised fine-tuning data 359.8 billion. The pre-training cutoff date was June 25, 2025, with post-training data extending to November 28, 2025. [5]

NVIDIA released the underlying training data alongside the model. The main pretraining corpus is Nemotron-CC-v2.1, a 2.5 trillion token English web crawl derivative, paired with Nemotron-CC-Code-v1, a 428 billion token code corpus drawn primarily from GitHub. Additional sources include arXiv papers, Wikipedia and Wikimedia content, OpenStax textbooks, PubMed abstracts, NIH ExPorter records, and SEC EDGAR filings. NVIDIA also released three trillion tokens of new Nemotron pretraining, post-training, and reinforcement learning datasets as part of the launch package, along with the Nemotron Agentic Safety Dataset derived from real-world telemetry. [1]

The training schedule used Warmup-Stable-Decay learning rate scheduling with an 8 billion token warmup, peak learning rate of 1e-3, and minimum learning rate of 1e-5. Training batch size was 3,072. The Super and Ultra variants were trained using NVFP4, a 4-bit floating point format introduced with NVIDIA's Blackwell GPU architecture. [3] NVFP4 stores model weights and activations in 4-bit blocks during the training run itself, rather than in BF16 or FP8 and quantizing afterward, which reduces memory pressure and improves throughput on the supported hardware.

Post-training was structured around multi-environment reinforcement learning, covering reasoning, tool calling, instruction following, and multilingual response quality. [3] NVIDIA released the underlying RL libraries, called NeMo Gym and NeMo RL, as open-source projects on GitHub at the same time as the model launch. [1] NeMo Evaluator, a companion library for validating model performance and safety, was also released. [1] The post-training mix included synthetic code, math, and science data, along with tool calling traces and multilingual reasoning examples.

NVIDIA has not published a complete count of GPU hours used to train the family, but the Hugging Face model card and the technical report describe training infrastructure as Blackwell-based for the Super and Ultra runs. [3][5] The Nano model can run inference on a wide range of NVIDIA hardware including A100, H100, B200, RTX PRO 6000, Jetson Thor, and DGX Spark systems.

## How well does Nemotron 3 perform on benchmarks?

NVIDIA published a substantial benchmark table for Nemotron 3 Nano at launch, focusing on the standard reasoning, coding, math, and agentic suites that have become the de facto comparison points for open-weights models. The numbers below come from the official Hugging Face model card. [5]

| Benchmark | Category | Nemotron 3 Nano score |
|---|---|---|
| MMLU-Pro | General knowledge | 78.3 |
| AIME 2025 (with tools) | Math reasoning | 99.2 |
| LiveCodeBench v6 | Code generation | 68.3 |
| MiniF2F pass@32 | Formal math proofs | 79.9 |
| SWE-Bench | Software engineering | 38.8 |
| Arena-Hard-V2 average | Chat quality | 67.7 |
| RULER-100 @ 512K | Long context retention | 91.3 |
| RULER-100 @ 1M | Long context retention | 86.3 |

The AIME 2025 result of 99.2 percent is one of the highest published scores on that benchmark for any open or closed model, though the tools-enabled qualifier is important. [5] AIME with tools allows the model to use a calculator or code interpreter for arithmetic, which removes a class of failure modes that would otherwise hit any language model on hard math contests. Most labs report both tool-enabled and tool-free numbers; the tool-free AIME score for Nemotron 3 Nano was not the headline figure in NVIDIA's marketing.

The LiveCodeBench v6 score of 68.3 ranks Nano ahead of Qwen3-30B-A3B-Thinking-2507 at 66.0 and GPT-OSS-20B at 61.0, according to comparisons published by NVIDIA and reproduced by third-party trackers. [5][19] The Arena-Hard-V2 average of 67.7 also leads the same two models, which scored 57.8 and 48.5 respectively in NVIDIA's reported comparison. On MMLU-Pro the model trails Qwen3-30B's 80.9 by roughly 2.5 points, an interesting result given Nano's lead on most other benchmarks in the same comparison set.

The long-context numbers are particularly strong. RULER is a benchmark suite that tests retrieval, multi-hop reasoning, and aggregation across long contexts, with each task scored at multiple context lengths. Nemotron 3 Nano holds 91.3 percent at the 512K mark and 86.3 percent at the full 1 million token window, indicating that the hybrid Mamba-Transformer architecture maintains reasonable accuracy well past the point where pure attention models typically begin to degrade noticeably. [5] NVIDIA states that Nano outperforms both GPT-OSS-20B and Qwen3-30B-A3B-Thinking-2507 "on RULER across different context lengths." [2]

On throughput, NVIDIA reports that Nemotron 3 Nano runs 2.2 times faster than GPT-OSS-20B and 3.3 times faster than Qwen3-30B-A3B-Thinking-2507 on the same H200 hardware in an 8K input, 16K output configuration. [2] Compared with its own predecessor Nemotron 2 Nano, NVIDIA reports up to 4 times higher token throughput and a 60 percent reduction in reasoning-token generation for equivalent task completion. [1] The reasoning-token reduction is meaningful because it directly maps to inference cost in production deployments, where a model that produces fewer thinking tokens per task is cheaper to serve.

Nemotron 3 Nano Omni was reported separately on multimodal benchmarks. The Omni model claimed best-in-class document intelligence results on MMlongbench-Doc and OCRBenchV2, leading scores on video and audio benchmarks including WorldSense, DailyOmni, and VoiceBench, and the 9.2x effective capacity figure for video reasoning relative to other open omni models. [7][8] Detailed score tables for Omni were published alongside the announcement on the NVIDIA developer blog. [8]

## Is Nemotron 3 open source? Licensing

Nemotron 3 ships under the NVIDIA Nemotron Open Model License. [10] This license is part of NVIDIA's broader Open Model License framework introduced earlier with Nemotron-4 340B and refined across subsequent releases. The license permits both commercial and non-commercial use of the model weights and derivatives, including for synthetic data generation, fine-tuning, and deployment of fine-tuned versions. [10]

The license includes use restrictions that distinguish it from a pure permissive license such as MIT or Apache 2.0. NVIDIA retains the right to terminate the license if the model is used in ways that violate listed restrictions, which cover categories like critical infrastructure, weapons development, illegal activity, and certain regulated domains. Derivative models trained on outputs from Nemotron 3 are permitted but must maintain attribution and may not remove the license terms. [10]

NVIDIA explicitly committed to releasing the training data, training code, evaluation harness, and reinforcement learning libraries alongside the weights, stating in its white paper that it would "openly release the model weights, pre- and post-training software, recipes, and all data for which we hold redistribution rights." [3] The release package therefore goes substantially beyond a typical open weights release. NeMo Gym, NeMo RL, and NeMo Evaluator are all published as separate open-source repositories on GitHub. [1][11] The Nemotron-CC-v2.1 and Nemotron-CC-Code-v1 datasets are hosted on Hugging Face under their own data licenses, which generally follow the Creative Commons or similar permissive frameworks used by their underlying sources.

The model is listed as "Ready for commercial use" on the official Hugging Face card. [5] Commercial deployment requires acceptance of the Nemotron Open Model License terms and adherence to NVIDIA's Trustworthy AI framework, which covers content filtering, bias mitigation, explainability requirements, and privacy considerations.

## How does Nemotron 3 compare to peer models?

Nemotron 3 Nano competes most directly with the 20 to 30 billion parameter mixture-of-experts releases from other labs, including Qwen3-30B-A3B from Alibaba, GPT-OSS-20B from OpenAI, and the open variants of Llama 3.1 from Meta. The comparison table below uses NVIDIA's reported numbers for Nemotron 3 Nano and the developers' own reported numbers for the other models. Cross-lab benchmark comparison is always imperfect because labs use different prompts and evaluation harnesses, but the table gives a useful order-of-magnitude comparison.

| Model | Developer | Total / active params | Context | License | AIME 2025 | LiveCodeBench v6 |
|---|---|---|---|---|---|---|
| Nemotron 3 Nano | NVIDIA | 30B / 3.5B (MoE) | 1M | NVIDIA Open Model License | 99.2 (with tools) | 68.3 |
| Qwen3-30B-A3B | Alibaba | 30B / 3B (MoE) | 256K | Apache 2.0 | 70.9 | 66.0 |
| GPT-OSS-20B | OpenAI | 20B (dense) | 128K | Apache 2.0 | not reported | 61.0 |
| Llama 3.1 70B | Meta | 70B (dense) | 128K | Llama 3.1 Community | not reported | not reported |
| Nemotron 2 Nano | NVIDIA | similar scale | 128K | NVIDIA Open Model License | trails Nemotron 3 Nano | trails Nemotron 3 Nano |

Versus Qwen3-30B-A3B, Nemotron 3 Nano has the same total parameter scale and a comparable active parameter count, but extends the context window from 256K to 1M and adds the hybrid Mamba-Transformer architecture. Qwen3 holds the lead on MMLU-Pro by roughly 2.5 points but trails on AIME, LiveCodeBench, Arena-Hard, and throughput at NVIDIA's reported settings. [2][5] The Apache 2.0 license on Qwen3 is more permissive than NVIDIA's Open Model License for some use cases, particularly research where the use restrictions in NVIDIA's license could be a complication.

Versus GPT-OSS-20B, Nemotron 3 Nano is larger in total parameters but lower in active parameters and uses a less common architecture. OpenAI's open release is a pure dense Transformer with a much shorter 128K context. NVIDIA's reported benchmarks place Nemotron 3 Nano ahead on every shared benchmark, with the throughput gap of 2.2x being the headline efficiency claim. [2]

Versus [Llama 3.1](/wiki/llama_3_1) from Meta, the comparison is harder to make cleanly because Llama 3.1 is a year older and was not designed for the agentic and reasoning evaluation suites that dominate late-2025 model marketing. The 70 billion parameter Llama 3.1 Instruct remains a baseline reference point for the open-weights field, and NVIDIA does not include it in its primary Nemotron 3 benchmark tables. On the kinds of tasks that Llama 3.1 was tuned for, such as general chat and instruction following, the gap between it and Nemotron 3 Nano is narrower than the headline math and code benchmarks would suggest.

Versus its own predecessor Nemotron 2 Nano, Nemotron 3 Nano is the headline efficiency story for the family. NVIDIA reports up to 4 times higher token throughput and 60 percent fewer reasoning tokens generated per task, achieved through a combination of the new LatentMoE design, the Mamba-Transformer ratio adjustments, and the move to FP8 inference precision by default. [1] The context window extension from 128K to 1M tokens is a meaningful capability change, particularly for code agent applications where the model needs to hold large project contexts in working memory.

The Super and Ultra variants, when released, will compete in different weight classes. Super at roughly 100 billion total parameters falls into the same neighborhood as Llama 3.1 70B Instruct, Qwen2.5-72B, and Mistral Large 2. Ultra at roughly 500 billion total parameters competes with the frontier-scale open releases, particularly DeepSeek V3, Llama 3.1 405B, and the open variants of Mistral Large 3. Public benchmark numbers for Super and Ultra were limited at the December 2025 announcement; NVIDIA published preliminary internal results in the technical report but warned that final numbers might shift slightly with the full release. [3]

## How was Nemotron 3 received?

Reception of Nemotron 3 Nano was broadly positive among open-source AI developers and frontier-model trackers, with a focus on three things: the throughput claims, the data release, and the unusually complete openness of the package.

HyperFRAME Research described the release as "a meaningful escalation in NVIDIA's open-source posture," noting that the company had moved from publishing models primarily as building blocks for synthetic data generation to publishing them as competitive end-user models. [14] Several writers connected this shift to growing competitive pressure from Chinese open-weights labs, particularly Alibaba's Qwen team and DeepSeek, both of which had been publishing increasingly capable models on permissive licenses through 2024 and 2025.

The New Stack covered the launch under the headline "Nvidia Launches the Next Generation of Its Nemotron Models," emphasizing the data release and the agent-oriented framing rather than the raw benchmark numbers. [15] Constellation Research's coverage was more strategic, describing Nemotron as "a much-needed open-source model champion in the US," against the backdrop of a perception that the leading open-weights frontier had shifted to Chinese labs over 2025. [17]

Independent benchmark reproduction was generally consistent with NVIDIA's claims, though with the usual caveats. Medium technical reviewer Barnacle Goose wrote a detailed walk-through of the Nano model that confirmed the long-context retention claims on RULER and the throughput advantage over Qwen3 on H100 hardware, while noting that the SWE-Bench score of 38.8 lags behind specialized code models. [20] LLM-stats.com published side-by-side comparison tables placing Nemotron 3 Nano ahead of Qwen3 on most agentic and math benchmarks but behind on raw MMLU-Pro and on some coding tasks. [19]

The local-deployment community, which runs models on consumer or prosumer hardware, was particularly enthusiastic about the Nano variant. Compared with prior NVIDIA releases that targeted enterprise H100 deployments, Nano's 30B-A3B configuration runs well on the RTX PRO 6000 workstation cards and on DGX Spark units, both of which had been released earlier in 2025. The FP8 quantization variant in particular drew attention as fitting comfortably in 48 GB of VRAM with full 1M token context. [6]

The agentic-AI community focused on the tool calling and reasoning trace features. NVIDIA's bundled vllm reasoning parser plugin for the Nano model became a reference implementation that other labs began copying for their own agentic releases. Several developer surveys in early 2026 listed Nemotron 3 Nano alongside Qwen3 and DeepSeek as the most-used open-weights models for agent prototyping work.

Critical reactions clustered around two concerns. The first was the NVIDIA Open Model License itself. While the license permits commercial use, the use restrictions and the absence of a clean MIT or Apache 2.0 grant raised concerns among some open-source advocates who argued that the restrictions made the model less "open" than the marketing implied. [10] The second was the dependency on NVIDIA-specific tooling for optimal inference, with NVFP4 quantization on Super and Ultra effectively requiring Blackwell hardware to realize the full throughput claims. [3]

NVIDIA's investor and enterprise framing also drew some commentary. Several writers noted that the Nemotron release was timed to coincide with broader NVIDIA messaging around agentic AI infrastructure, with the models functioning partly as a demonstration of the company's hardware capabilities and partly as standalone products. The investor relations press release on December 15, 2025 explicitly framed the release as part of NVIDIA's enterprise AI software strategy. [1]

In the months following the launch, NVIDIA published incremental updates to the Nano model, including extended language support and improved tool calling. The Super and Ultra releases were tracked by the community against the announced H1 2026 timeline. As of May 2026 the Super model had been previewed but the Ultra model had not yet shipped publicly, though NVIDIA had repeatedly confirmed both were on the roadmap. [3]

## ELI5: what is Nemotron 3 in plain terms?

Think of Nemotron 3 as a free-to-use AI brain that NVIDIA built and then handed out, including the instructions for how it was made. It comes in three sizes: a small one called Nano you can run on a powerful desktop, a medium one called Super, and a very large one called Ultra. The clever trick inside is that it only switches on a small part of itself for each word it reads or writes, like a big office where only a few people work on any one task, so it stays fast and cheap to run. It can also read or write the equivalent of a very long book (up to a million words' worth of text) at once without forgetting the beginning, and it is good at doing step-by-step reasoning and using tools like a calculator or a code runner to get tasks done.

## See also

- [NVIDIA](/wiki/nvidia)
- [Mamba 2](/wiki/mamba_2)
- [Nemotron Nano 2](/wiki/nemotron_nano_2)
- [Llama 3.1](/wiki/llama_3_1)
- [Open Source AI](/wiki/open_source_ai)
- [Jamba 2](/wiki/jamba_2)
- [Jet-Nemotron](/wiki/jet_nemotron)
- [Large Language Model](/wiki/large_language_model)

## References

1. NVIDIA. "NVIDIA Debuts Nemotron 3 Family of Open Models." NVIDIA Newsroom, December 15, 2025. https://nvidianews.nvidia.com/news/nvidia-debuts-nemotron-3-family-of-open-models
2. NVIDIA Research. "NVIDIA Nemotron 3 Family of Models." https://research.nvidia.com/labs/nemotron/Nemotron-3/
3. NVIDIA. "NVIDIA Nemotron 3: Efficient and Open Intelligence." arXiv:2512.20856, December 2025. https://arxiv.org/abs/2512.20856
4. NVIDIA. "Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning." Technical Report. https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Nano-Technical-Report.pdf
5. NVIDIA. "NVIDIA-Nemotron-3-Nano-30B-A3B-BF16." Hugging Face model card. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
6. NVIDIA. "NVIDIA-Nemotron-3-Nano-30B-A3B-FP8." Hugging Face model card. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8
7. NVIDIA. "NVIDIA Launches Nemotron 3 Nano Omni Model." NVIDIA Blog, December 15, 2025. https://blogs.nvidia.com/blog/nemotron-3-nano-omni-multimodal-ai-agents/
8. NVIDIA. "NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning." NVIDIA Technical Blog. https://developer.nvidia.com/blog/nvidia-nemotron-3-nano-omni-powers-multimodal-agent-reasoning-in-a-single-efficient-open-model/
9. NVIDIA. "Nemotron AI Models." NVIDIA Developer. https://developer.nvidia.com/nemotron
10. NVIDIA. "NVIDIA Nemotron Open Model License." https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/
11. GitHub. "NVIDIA-NeMo/Nemotron Developer Asset Hub." https://github.com/NVIDIA-NeMo/Nemotron
12. Wikipedia. "Nemotron." https://en.wikipedia.org/wiki/Nemotron
13. AIwire. "Nvidia Releases Nemotron 3, Expanding Its Open Models for Agentic AI." December 17, 2025. https://www.hpcwire.com/aiwire/2025/12/17/nvidia-releases-nemotron-3-expanding-its-open-models-for-agentic-ai/
14. HyperFRAME Research. "NVIDIA Releases Nemotron 3: A New Family of Open Models." December 18, 2025. https://hyperframeresearch.com/2025/12/18/nvidia-releases-nemotron-3-a-new-family-of-open-models/
15. The New Stack. "Nvidia Launches the Next Generation of Its Nemotron Models." https://thenewstack.io/nvidias-launches-the-next-generation-of-its-nemotron-models/
16. CIO Dive. "Nvidia Nemotron 3 seeks to power scalable, multi-agent systems." https://www.ciodive.com/news/nvidia-nemotron-3-power-multi-agent-systems/807952/
17. Constellation Research. "Nvidia Nemotron: Much needed open-source model champion in US." https://www.constellationr.com/insights/news/nvidia-nemotron-much-needed-open-source-model-champion-us
18. The Neuron. "NVIDIA Nemotron 3 Nano: An Open LLM With a 1-Million-Token Context Window for Agents." https://www.theneuron.ai/explainer-articles/nvidia-nemotron-3-nano-open-llm/
19. LLM-stats. "Nemotron 3 Nano (30B A3B) vs Qwen3 30B A3B Comparison." https://llm-stats.com/models/compare/nemotron-3-nano-30b-a3b-vs-qwen3-30b-a3b
20. Medium (Barnacle Goose). "A Technical Review of NVIDIA's Nemotron 3 Nano 30B A3B." https://medium.com/@leucopsis/a-technical-review-of-nvidias-nemotron-3-nano-30b-a3b-e91673f22df4
21. NVIDIA. "nemotron-3-8b-base-4k." Hugging Face model card. https://huggingface.co/nvidia/nemotron-3-8b-base-4k
22. NVIDIA. "Nemotron-4 15B Technical Report." arXiv:2402.16819, February 2024. https://arxiv.org/abs/2402.16819

