Nous Research is an open-source artificial intelligence research organization known for producing some of the most widely used fine-tuned large language models in the open-source AI community. The group's flagship Hermes series of models has been downloaded over 33 million times from Hugging Face, making Nous Research one of the most influential contributors to the open-weight model ecosystem. Headquartered in New York City, Nous Research focuses on model architecture, data synthesis, fine-tuning, and reasoning, with a growing emphasis on decentralized AI training infrastructure.
Nous Research traces its origins to 2022, when a group of AI researchers and engineers began collaborating informally through social media platforms, including Discord, GitHub, and Twitter (now X). The founding members met online and started experimenting with existing open-source language models, primarily Meta's LLaMA series. They began releasing fine-tuned versions of these models under the name "Hermes," which quickly gained traction in the open-source community.
The organization was formally incorporated in 2023 by four co-founders:
Nous Research started as an all-volunteer project. Early contributors ranged from former physics Ph.D. holders and mathematicians to biologists and software engineers, all collaborating without pay to advance open-source AI. External investment later enabled the organization to compensate its most dedicated members.
Nous Research has raised approximately $65 million in total funding across multiple rounds.
| Round | Date | Amount | Lead Investor | Notable Participants |
|---|---|---|---|---|
| Seed | January 2024 | $5.2M | Distributed Global, OSS Capital | Balaji Srinivasan |
| Additional Seed | June 2024 | ~$15M (cumulative seed total ~$20M) | Undisclosed | Together AI, North Island Ventures, Delphi Digital |
| Series A | April 2025 | $50M | Paradigm | Solana co-founder Raj Gokal |
The Series A round, led by crypto venture capital firm Paradigm, valued Nous Research at a token valuation of $1 billion. The investment reflected growing interest in the intersection of blockchain technology and AI, particularly around decentralized training infrastructure.
The Hermes series is the flagship product line of Nous Research. Each generation has been fine-tuned on top of a different base model, typically from Meta's LLaMA family or other prominent open-source architectures. The models are known for their strong instruction-following capabilities, low hallucination rates, and neutrally-aligned behavior that avoids excessive content filtering.
The original Nous-Hermes-13b was released in mid-2023 as a fine-tune of Meta's LLaMA 13B. It was trained on over 300,000 instructions, consisting primarily of synthetic GPT-4 outputs. Training data sources included GPTeacher, CodeAlpaca, Evol-Instruct Uncensored, GPT4-LLM, Unnatural Instructions, Camel-AI science datasets (Biology, Physics, Chemistry, Math), and Airoboros' GPT-4 dataset. The model was fine-tuned on 8x A100 80GB GPUs over more than 50 hours.
At the time of release, Nous-Hermes-13b ranked first on several popular benchmarks including ARC-Challenge, ARC-Easy, HellaSwag, and OpenBookQA when compared against the GPT4All benchmark suite. The model used a simple Alpaca-style instruction/response prompt format.
Following the release of Meta's Llama 2, Nous Research quickly produced Nous-Hermes-Llama2-7b and Nous-Hermes-Llama2-13b, updating the fine-tune to the newer base model.
The Hermes 2 generation represented a significant step forward. Nous Research adopted the ChatML prompt format, which provides a structured system for multi-turn chat dialogue with clear role delineation between system, user, and assistant messages. ChatML enables system prompts that allow users to guide the model's behavior, tone, and persona.
Key Hermes 2 releases included:
Nous Research also released the Hermes Function Calling V1 dataset, making public the data mix that gave Hermes 2 Pro its tool use and structured output capabilities.
Released on August 14, 2024, Hermes 3 was the first full-parameter fine-tune of Meta's Llama 3.1 405B model. The release was made in partnership with Lambda Labs, which provided compute through its 1-Click Cluster infrastructure. Hermes 3 was made available in three sizes: 8B, 70B, and 405B parameters.
The training methodology involved three stages:
For the 405B variant, training was performed on 16 HGX nodes (each containing 8 GPUs) with an effective batch size of 128. The team used the AdamW optimizer with weight decay of 0.01 and a peak learning rate of 3.5 x 10^-6 following a cosine decay schedule after 300 warmup steps over four epochs.
Hermes 3 introduced several improvements over Hermes 2, including advanced agentic capabilities, improved roleplaying and internal monologue support, stronger multi-turn conversation coherence, enhanced long-context reasoning, and better overall instruction following. Benchmark results showed performance comparable to or exceeding Meta's official Llama 3.1 Instruct across standard evaluations.
Released in February 2025, DeepHermes 3 Preview was one of the first models to unify both traditional "intuitive" response mode and long chain-of-thought reasoning into a single model. Users could toggle between fast responses and deeper step-by-step reasoning through a system prompt. The model was built on Meta's Llama 3 8B and trained on 1 million non-chain-of-thought responses plus 150,000 chain-of-thought outputs, incorporating 390 million tokens across multiple domains.
Hermes 4, released on August 26, 2025, is a family of open-weight, hybrid-reasoning models. The family includes models at multiple parameter scales built on different base architectures:
| Model | Parameters | Base Model | Key Highlights |
|---|---|---|---|
| Hermes 4 14B | 14B | Qwen 3 14B | Smallest variant, optimized for reasoning |
| Hermes 4 70B | 70B | Llama 3.1 70B | Mid-range, strong general performance |
| Hermes 4 405B | 405B | Llama 3.1 405B | Flagship, frontier-level performance |
Hermes 4 introduced a toggleable hybrid reasoning mode via <think>...</think> tags, allowing the model to switch between fast responses and detailed chain-of-thought reasoning. The training dataset expanded dramatically compared to Hermes 3, incorporating 50 times more data tokens. Training was conducted on 192 NVIDIA B200 GPUs using a modified TorchTitan stack with flex attention and efficient packing that achieved over 99.9% batch efficiency.
Benchmark results for the 405B model (reasoning mode):
| Benchmark | Score |
|---|---|
| MATH-500 | 96.3% |
| AIME 2024 | 81.9% |
| AIME 2025 | 78.1% |
| GPQA Diamond | 70.5% |
| LiveCodeBench | 61.3% |
| RefusalBench | 57.1% |
The RefusalBench score of 57.1% was the highest among all evaluated models, significantly outperforming GPT-4o (17.67%) and Claude Sonnet 4 (17%). The release was accompanied by a 94-page technical report.
Released on December 2, 2025, Hermes 4.3 36B is based on ByteDance's Seed 36B architecture. This model is notable for being the first Hermes model trained in a decentralized manner over the internet using the Psyche network. The post-training corpus was scaled from approximately 1 million samples and 1.2 billion tokens to roughly 5 million samples and 60 billion tokens. Hermes 4.3 supports an extended context length of up to 512K tokens and delivers performance roughly equivalent to Hermes 4 70B at half the parameter count.
| Model | Release | Base Model | Parameters | Training Data |
|---|---|---|---|---|
| Nous-Hermes-13b | Mid-2023 | LLaMA 13B | 13B | ~300K instructions |
| Nous-Hermes-Llama2-7b | Mid-2023 | Llama 2 7B | 7B | ~300K instructions |
| Nous-Hermes-Llama2-13b | Mid-2023 | Llama 2 13B | 13B | ~300K instructions |
| Nous-Hermes-2-Mixtral-8x7B-DPO | Jan 2024 | Mixtral 8x7B | ~47B (MoE) | ~1M instructions |
| Nous-Hermes-2-Yi-34B | Early 2024 | Yi-34B | 34B | ~1M instructions |
| Hermes-2-Pro-Mistral-7B | Early 2024 | Mistral 7B | 7B | ~1M+ instructions |
| Hermes-2-Pro-Llama-3-8B | Mid-2024 | Llama 3 8B | 8B | ~1M+ instructions |
| Hermes 3 (8B, 70B, 405B) | Aug 2024 | Llama 3.1 | 8B/70B/405B | Synthetic SFT + RLHF |
| DeepHermes 3 Preview | Feb 2025 | Llama 3 8B | 8B | 1.15M samples (390M tokens) |
| Hermes 4 (14B, 70B, 405B) | Aug 2025 | Qwen 3 / Llama 3.1 | 14B/70B/405B | ~5M samples (19B tokens) |
| Hermes 4.3 36B | Dec 2025 | ByteDance Seed 36B | 36B | ~5M samples (60B tokens) |
Alongside the Hermes series, Nous Research developed the Capybara line of models, which took a distinct approach to training data. The Capybara series used a novel data synthesis technique called Amplify-Instruct, which combined several established data synthesis methods including Airoboros, Evol-Instruct, Orca, Vicuna, Know_Logic, Lamini, and FLASK.
The Capybara training dataset was notably compact, containing only 20,000 training examples, roughly 10 times smaller than datasets used for comparable models. Over 60% of the dataset consisted of multi-turn conversations averaging more than 1,000 tokens per example. Seed instructions were drawn from highly regarded datasets like Airoboros, Know Logic, EverythingLM, and GPTeacher, supplemented with new instructions derived from posts on the website LessWrong and in-house multi-turn datasets like Dove.
The most prominent release was Nous-Capybara-34B, fine-tuned on 01.AI's Yi-34B with 200K context length. Smaller versions at 3B and 7B parameters were also released.
Nous Research has developed and refined a distinctive training approach across its model releases, centered on several key principles.
From its earliest models, Nous Research has relied heavily on synthetic data generated by frontier models, particularly GPT-4. The original Hermes model used roughly 300,000 synthetic instructions. By the Hermes 2 generation, training sets grew to over 1,000,000 entries. For Hermes 4, the team employed a graph-based synthetic data generation pipeline that produced approximately 5 million samples totaling 19 billion tokens.
The organization has consistently released its training datasets publicly, including the OpenHermes dataset (242,000 GPT-4 generated entries) and the Hermes Function Calling V1 dataset, enabling the broader community to replicate and build upon their work.
Starting with Hermes 2, Nous Research standardized on the ChatML (Chat Markup Language) prompt format. ChatML uses special tokens to clearly delineate system instructions, user messages, and assistant responses in multi-turn conversations. This structured format enables robust system prompt adherence, clear role boundaries, and reliable parsing of complex interaction patterns including function calling and structured output.
The Hermes 2 Mixtral release marked Nous Research's first use of RLHF, specifically through Direct Preference Optimization (DPO). Hermes 3 further incorporated RLHF as a dedicated training stage following supervised fine-tuning. By the Hermes 4 generation, the training pipeline included rejection sampling and length control techniques to manage the tendency of reasoning models to produce excessively long outputs.
A defining characteristic of Hermes models is their "neutrally-aligned" positioning. Unlike many commercial models that implement aggressive content filtering and refusal behavior, Hermes models are trained to follow user instructions with minimal unnecessary refusal. This design philosophy has made Hermes models particularly popular among developers, researchers, and hobbyists who value flexibility and controllability.
Beyond fine-tuned models, Nous Research has published several notable research papers and technologies.
YaRN (Yet another RoPE extensioN method) is a compute-efficient technique for extending the context window of transformer-based language models that use Rotary Position Embeddings (RoPE). Published by Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole, the paper was first posted as an arXiv preprint in September 2023 and later presented at ICLR 2024.
YaRN integrates an "NTK-by-parts" interpolation scheme that selectively scales frequencies to preserve both high-frequency details and local relationships. The method requires only about 0.1% of the original pre-training data for fine-tuning, making it 10x more data-efficient and 2.5x more training-step-efficient than previous context extension methods. YaRN achieved state-of-the-art performance in context window extension and has been cited by over 100 academic papers. The technique has been adopted by major model providers including Meta and DeepSeek.
DeMo (Decoupled Momentum Optimization), published in November 2024 by Bowen Peng, Jeffrey Quesnelle, and Diederik P. Kingma (co-creator of the Adam optimizer and OpenAI co-founder), is an algorithm designed to reduce inter-GPU communication during distributed training. DeMo uses the Discrete Cosine Transform (DCT) to isolate and share only the most critical components of optimizer momentum, reducing communication overhead by several orders of magnitude while maintaining training performance comparable to AdamW.
DisTrO (Distributed Training Over-The-Internet) extends the DeMo algorithm into a full system-level stack for large-scale AI training over the internet. Released in August 2024, DisTrO reduces inter-GPU communication bandwidth requirements by up to 10,000x during pre-training, enabling model training on connections as slow as 100 Mbps download and 10 Mbps upload while maintaining competitive convergence rates.
DisTrO combines low-latency distributed optimizers with a runtime that applies DeMo's momentum-decomposition approach to real-world conditions, supporting compression, selective synchronization, and load balancing across geographically dispersed GPUs. The technology was tested in collaboration with Oracle, Lambda Labs, Northern Data Group, Crusoe Cloud, and the Andromeda Cluster.
Psyche is Nous Research's decentralized AI training orchestration platform, built on the Solana blockchain. The network applies DisTrO technology to enable collaborative model training across underutilized hardware distributed around the world.
Solana serves as the coordination hub for Psyche. Smart contracts on the blockchain store training metadata, participant lists, and random task assignments, providing transparency, tamper-proofing, and censorship resistance. The design creates a fault-tolerant system where individual nodes can join or leave training runs without disrupting the overall process.
In December 2024, Nous Research ran a successful test of the Psyche network, training a 15 billion parameter language model through 11,000 training steps using hardware distributed across multiple locations. Hermes 4.3 36B, released in December 2025, was the first Hermes model post-trained entirely on the Psyche network.
As of early 2025, Nous Research had not yet launched a dedicated NOUS token but was evaluating whether to reward participants with a proprietary token or with Solana's native cryptocurrency.
The Forge Reasoning API, introduced in November 2024, is a configurable step-based planning engine that combines multiple open and closed LLMs with Nous Research's reasoning techniques. The system employs Monte Carlo Tree Search (MCTS), Chain-of-Code (CoC) reasoning, and a Mixture of Agents (MoA) architecture to enable flexible, multi-step problem solving.
Forge is distinct from the fine-tuned model releases; it functions as a reasoning framework that orchestrates multiple models to tackle complex tasks that require planning and iterative refinement.
Nous Research maintains an active Discord community with over 76,000 members, where researchers, developers, and enthusiasts discuss open-source AI development. The community has been central to the organization's identity since its founding as a volunteer-driven project.
All Hermes models and most training datasets are released publicly through the NousResearch organization on Hugging Face. Models are also available through inference providers such as OpenRouter, Lambda Labs, Fireworks.AI, and Together AI, and are compatible with local inference tools like Ollama and LM Studio through GGUF quantized versions.
Nous Research also operates a fine-tuning subnet on the Bittensor network, where miners are rewarded for fine-tuning language models using continuously generated synthetic data. This subnet serves as an ongoing benchmark for fine-tuning quality.
Nous Research occupies a distinct niche in the open-source AI ecosystem. While organizations like Meta, Mistral AI, and 01.AI release base models and official instruction-tuned variants, Nous Research specializes in community-driven post-training that often pushes these base models beyond their official fine-tunes in specific capabilities. The Hermes series has frequently been among the top-performing models on the Hugging Face Open LLM Leaderboard and other community benchmarks.
The organization's approach differs from other fine-tuning groups in several ways. Its emphasis on synthetic data pipelines, ChatML standardization, function calling support, and neutrally-aligned behavior has created a recognizable "Hermes" style that many users prefer for agentic applications, creative writing, and unconstrained research tasks.
With its Series A funding and Psyche network development, Nous Research has also positioned itself at the intersection of AI and decentralized computing, pursuing a vision where model training is not concentrated in a handful of large data centers but distributed across globally contributed hardware.
| Name | Role | Background |
|---|---|---|
| Jeffrey Quesnelle | Co-Founder, CEO | M.S. Computer Science, University of Michigan-Dearborn; former Director of Software Development at Intrepid Control Systems; Principal Engineer at Eden Network |
| Karan Malhotra | Co-Founder, Head of Behavior | Former ML Researcher at Stanford Brain Stimulation Lab; studied Religion and Philosophy at Emory University |
| Teknium | Co-Founder, Head of Post-Training | Pseudonymous AI researcher; former Stability AI engineer; creator of GPTeacher dataset |
| Shivani Mitra | Co-Founder | CEO from founding through Series A |
| Bowen Peng | Researcher | Lead author of YaRN and DeMo papers |