Reasoning Models

61 articlesRSS

Showing 1-60 of 61 articles

ARC-AGI 1

Abstraction and Reasoning Corpus for Artificial General Intelligence, version 1 Abbreviation A benchmark testing fluid intelligence and abstract reasoning...

AI BenchmarksArtificial Intelligence

Adaptive thinking

Adaptive thinking is an inference-time reasoning mode in the Anthropic Messages API in which a claude model decides, on a per-request basis, whether to use...

AI InferenceAnthropic

Agent planning

Agent planning refers to the process by which an AI agent determines a sequence of actions to accomplish a goal. In the context of large language models (LLMs)...

AI AgentsArtificial Intelligence

AlphaGeometry

AlphaGeometry is a neuro-symbolic artificial intelligence system developed by Google DeepMind that solves competition-level geometry problems at a standard...

AI ModelsAI for Science

AlphaProof

AlphaProof is a reinforcement-learning system from Google DeepMind that finds and verifies formal mathematical proofs in the Lean 4 theorem prover, and in July...

AI ModelsAI for Science

BIG-Bench Extra Hard

BIG-Bench Extra Hard (BBEH) is a reasoning benchmark released by Google DeepMind in February 2025 that replaces each of the 23 tasks in BIG-Bench Hard (BBH)...

AI BenchmarksGoogle DeepMind

Best AI Models for Reasoning and Math

As of July 2026, the strongest general reasoning models are Anthropic's Claude Opus 4.8 and Claude Fable 5, OpenAI's GPT-5.5, and Google's Gemini 3.1 Pro and...

Large Language ModelsModel Evaluation

Command A Reasoning

Command A Reasoning is an enterprise reasoning model released by Cohere on August 21, 2025, as part of the Command A family of large language models. It pairs...

AI CompaniesLarge Language Models

Commonsense reasoning

Commonsense reasoning is the ability to make the everyday, mostly tacit assumptions that ordinary humans take for granted, the implicit knowledge about how the...

Artificial Intelligence

DeepSeek V3.1

DeepSeek V3.1 is a large language model developed by DeepSeek, released on August 19, 2025 and made broadly available via the official API on August 21,...

AI ModelsChinese AI

DeepSeek-Prover

DeepSeek-Prover is a family of open-weight large language models developed by Chinese AI laboratory DeepSeek for formal theorem proving in the Lean 4 proof...

Chinese AILarge Language Models

DeepSeek-R1

DeepSeek Release date Large language model (reasoning model) Architecture DeepSeek-V3-Base (671B / 37B MoE) Parameters 128,000 tokens Training...

Chinese AILarge Language Models

DeepSeek-R1-Distill

DeepSeek-R1-Distill is a family of six open-weight reasoning language models released by DeepSeek on January 20, 2025, alongside the flagship DeepSeek-R1...

AI ModelsChinese AI

DeepSeekMath

DeepSeekMath is a family of open-weight large language models specialized for mathematical reasoning, released by Chinese AI laboratory DeepSeek in February...

Chinese AILarge Language Models

Extended thinking

Extended thinking is the product name Anthropic gives to the reasoning mode in its Claude family of large language models, in which the model produces a...

AI Tools & ProductsAnthropic

GPQA

GPQA (Graduate-Level Google-Proof Q&A) is a benchmark of 448 expert-written multiple-choice questions in biology, physics, and chemistry, designed to test...

AI Benchmarks

GPT-5 Pro

GPT-5 Pro is a large language model developed by OpenAI and the highest-capability variant of the GPT-5 model family. It is positioned above the standard GPT-5...

Large Language ModelsOpenAI

GRPO

Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm for fine-tuning large language models that eliminates the separate critic...

AI InferenceChinese AI

GSM8K

GSM8K (Grade School Math 8K) is a benchmark dataset of 8,792 grade-school-level math word problems created by researchers at OpenAI to evaluate the multi-step...

AI BenchmarksLarge Language Models

Gemini 2.0 Flash Thinking

Gemini 2.0 Flash Thinking is an experimental reasoning model released by Google as part of the Gemini 2.0 family. First made available on December 19, 2024, it...

Google DeepMindLarge Language Models

Gemini 2.5 Deep Think

Gemini 2.5 Deep Think is Google DeepMind's enhanced reasoning mode for the Gemini 2.5 Pro model that uses a technique called "parallel thinking" to explore...

Google DeepMindLarge Language Models

Goedel-Prover

Goedel-Prover is an open-source large language model designed for automated formal theorem proving in Lean 4. Developed by researchers at Princeton Language...

Large Language ModelsOpen Source AI

Grok 4

Grok 4 is a large language model developed by xAI and released on July 9, 2025.[1] It is the fourth major generation of the Grok model family and was...

AI CompaniesAI Models

Inference-time scaling

Inference-time scaling (also called test-time compute scaling) is the practice of improving an AI model's output quality by allocating more computational...

AI InferenceAI Research

Interleaved thinking

Interleaved thinking is a feature of the Anthropic Messages API that lets Claude produce extended reasoning blocks between tool calls inside a single...

AI AgentsAnthropic

Kimi K2 Thinking

Kimi K2 Thinking is a reasoning and agentic large language model released by the Chinese startup Moonshot AI on November 6, 2025. It is a thinking variant...

Chinese AILarge Language Models

Llama Nemotron

Llama Nemotron is a family of open reasoning large language models built by Nvidia by post-training Meta's Llama models for math, coding, and agentic tasks....

Large Language ModelsNVIDIA

MAI-Thinking-1

MAI-Thinking-1 is a reasoning model developed by Microsoft AI, unveiled at Microsoft Build 2026 on June 2, 2026 as the company's first in-house flagship...

Microsoft

MATH

MATH is a benchmark of 12,500 competition mathematics problems used to evaluate the mathematical problem-solving ability of machine learning systems,...

AI BenchmarksModel Evaluation

Magistral

Magistral is the first family of reasoning models from Mistral AI, the French AI company, first released on June 10, 2025. [1] Mistral describes it as "the...

Large Language ModelsOpen Source AI

Marco-o1

Marco-o1 is an open reasoning model released in November 2024 by the MarcoPolo team at Alibaba International Digital Commerce (AIDC). It was introduced in the...

Chinese AIOpen Source AI

MathArena

MathArena: Evaluating LLMs on Uncontaminated Math Competitions Description March 2025 (USAMO 2025 evaluation), May 2025 (paper) Latest expansion Mislav...

AI BenchmarksArtificial Intelligence

MiniMax M1

MiniMax M1 (stylised MiniMax-M1) is an open-weight large language reasoning model released on 16 June 2025 by the Shanghai-based artificial-intelligence...

Chinese AILarge Language Models

MiniMax M2.7

MiniMax M2.7 is a large language model released by the Chinese AI company MiniMax on March 18, 2026.[1] It is an agentic coding and reasoning model built on a...

Chinese AILarge Language Models

MuSR

MuSR (Multistep Soft Reasoning) is a benchmark for evaluating multistep reasoning in large language models, built around long free-text narratives such as...

AI BenchmarksModel Evaluation

Muse Spark

Muse Spark is a proprietary multimodal reasoning model developed by Meta Superintelligence Labs (MSL), the artificial intelligence division Meta reorganized in...

Meta AIMultimodal AI

Natural language inference (NLI)

Natural language inference (NLI), also known as recognising textual entailment (RTE), is the natural language processing task of deciding whether a hypothesis...

AI BenchmarksNatural Language Processing

OLMo 3

OLMo 3 is the third generation of fully open language models released by the Allen Institute for AI (Ai2). The family was announced on November 20, 2025 and...

AI ModelsLarge Language Models

OpenAI o-series

The OpenAI o-series is a family of large language models developed by OpenAI that are trained with reinforcement learning to reason through an internal...

Artificial IntelligenceLarge Language Models

OpenAI o1

OpenAI Codename September 12, 2024 (o1-preview, o1-mini) Release date Large language model (reasoning model) Architecture o1-preview, o1-mini, o1,...

AI ModelsLarge Language Models

OpenAI o1-mini

OpenAI o1-mini is a smaller, faster, and cheaper reasoning model released by OpenAI on September 12, 2024, alongside o1-preview, and optimized for science,...

Large Language ModelsOpenAI

OpenAI o1-pro

OpenAI o1-pro is the highest-compute variant of OpenAI's o1 reasoning model, designed to spend more inference-time compute so it "thinks harder" and returns...

Large Language ModelsOpenAI

OpenAI o3

OpenAI Announced January 31, 2025 (o3-mini); April 16, 2025 (o3, o4-mini); June 10, 2025 (o3-pro); June 26, 2025 (o3-deep-research, o4-mini-deep-research) ...

AI ModelsLarge Language Models

OpenAI o3-mini

OpenAI o3-mini is a reasoning-focused large language model released by OpenAI on January 31, 2025, the second commercial member of the o-series after OpenAI o1...

Large Language ModelsOpenAI

OpenAI o3-pro

OpenAI o3-pro is a high-compute reasoning large language model released by OpenAI on June 10, 2025, designed as the professional, higher-reliability variant of...

Large Language ModelsOpenAI

Phi-4 Reasoning

Phi-4-reasoning is a 14 billion parameter open weight reasoning model released by Microsoft Research on April 30, 2025. It was launched together with...

AI ModelsLarge Language Models

Phi-4-mini-flash-reasoning

Phi-4-mini-flash-reasoning is a 3.8 billion parameter open weight reasoning model released by Microsoft in July 2025.[1] It is a latency optimized sibling of...

AI ModelsLarge Language Models

ProcessBench

ProcessBench is a benchmark for step-level verification of mathematical reasoning, built by the Qwen Team at Alibaba and released in December 2024. [1] It...

AI BenchmarksModel Evaluation

QvQ

QvQ (styled QVQ) is a family of experimental visual reasoning models from the Qwen team at Alibaba. The models extend the large language model reasoning...

Chinese AIMultimodal AI

QwQ

Alibaba Cloud Qwen team Series November 28, 2024 (QwQ-32B-Preview) Full release Large language model (reasoning model) Architecture 32.5 billion...

Chinese AILarge Language Models

RLVR

Reinforcement Learning with Verifiable Rewards (RLVR) is a post-training paradigm for large language models in which the reward signal comes from a...

AI InferenceReinforcement Learning

ReAct (prompting)

ReAct (short for Reasoning and Acting) is a prompting paradigm for large language models that interleaves verbal reasoning traces ("Thoughts") with...

AI AgentsPrompt Engineering

Reflexion

Reflexion is a 2023 framework for reinforcing language agents through verbal self-reflection rather than weight updates: the agent reflects in natural language...

AI AgentsMachine Learning

Self-consistency

Self-consistency is a decoding strategy for large language models that samples multiple chain-of-thought reasoning paths for the same question and returns the...

Large Language ModelsPrompt Engineering

SimpleBench

SimpleBench is a text-only benchmark for large language models created by Philip, the host of the AI Explained YouTube channel, with collaborator Hemang. It...

AI BenchmarksArtificial Intelligence

Skywork-R1V

Skywork-R1V is an open-weight family of multimodal reasoning models released by Skywork AI, the AGI and AIGC division of Beijing Kunlun Tech Co., Ltd. (Kunlun...

Chinese AIMultimodal AI

Strawberry (OpenAI codename)

Strawberry was the internal codename used at OpenAI for the research program that produced the o-series of reasoning models, most notably OpenAI o1. The...

AI HistoryOpenAI

Test-time compute

Test-time compute (also called inference-time compute scaling or test-time scaling) is the practice of allocating additional computation while a large language...

Artificial IntelligenceLarge Language Models

Tree of Thoughts

Tree of Thoughts (ToT) is a prompting and inference-time search framework for large language models that lets the model explore multiple intermediate reasoning...

Artificial IntelligencePrompt Engineering

ZAYA1-8B

ZAYA1-8B is an open-weight, reasoning-focused Mixture-of-Experts (MoE) large language model released by San Francisco-based AI research lab Zyphra on May 6,...

AI ModelsLarge Language Models