Reinforcement Learning

104 articlesRSS

Showing 1-60 of 104 articles

Action (Reinforcement Learning)

In reinforcement learning (RL), an action is a decision or move made by an agent that affects the state of the environment. At each discrete time step, the...

Machine Learning

AlphaChip

AlphaChip is a reinforcement-learning method developed by Google DeepMind for designing the physical layout of computer chips, specifically the placement of...

AI HardwareGoogle DeepMind

AlphaDev

AlphaDev is an artificial intelligence system built by Google DeepMind that used deep reinforcement learning to discover faster algorithms for common computing...

AlgorithmsGoogle DeepMind

AlphaGo

AlphaGo is a computer program developed by DeepMind that plays the board game Go, and it was the first artificial intelligence to defeat a professional human...

Artificial IntelligenceGoogle

AlphaGo Zero

AlphaGo Zero is a Go-playing computer program developed by DeepMind that reached a superhuman level entirely through self-play reinforcement learning, starting...

AI in GamingGoogle DeepMind

AlphaStar

AlphaStar is an artificial intelligence system built by Google DeepMind that in 2019 became the first AI to reach Grandmaster level in the real-time strategy...

AI in GamingArtificial Intelligence

AlphaTensor

AlphaTensor is an artificial-intelligence system from DeepMind that uses deep reinforcement learning to discover faster algorithms for matrix multiplication....

Google DeepMindMathematics

AlphaZero

AlphaZero is a general-purpose reinforcement learning algorithm developed by DeepMind that taught itself to play chess, shogi (Japanese chess), and Go at a...

AI in GamingArtificial Intelligence

Andrew Barto

Andrew Barto is an American computer scientist and one of the founders of modern reinforcement learning, the branch of machine learning in which an agent...

Machine LearningPeople

Bellman Equation

See also: reinforcement learning, Markov decision process, Q-learning, dynamic programming, value function, Machine learning terms The Bellman equation is a...

Machine LearningMathematics

Best-of-N sampling

Best-of-N sampling (BoN) is an inference-time method that improves a large language model output by drawing N independent candidate responses to the same...

Machine Learning

Control theory

Control theory is the mathematical and engineering discipline concerned with designing and analysing systems that achieve desired behaviour through measurement...

MathematicsRobotics

Critic

A critic in reinforcement learning (RL) is the component of an actor-critic system that estimates a value function, scoring how good the actor's chosen actions...

Deep LearningMachine Learning

DAPO (Decoupled Clip and Dynamic Sampling Policy Optimization)

DAPO, short for Decoupled Clip and Dynamic sAmpling Policy Optimization, is an open-source reinforcement learning algorithm and training system for large...

Machine Learning

DARE (Drop And REscale)

DARE (Drop And REscale) is a training-free preprocessing technique for model merging that sparsifies the parameter changes introduced by fine-tuning before...

Machine Learning

DDPG (Deep Deterministic Policy Gradient)

DDPG (Deep Deterministic Policy Gradient) is an off-policy, model-free actor-critic algorithm in deep reinforcement learning that learns continuous-control...

Deep Learning

DQN

The Deep Q-Network (DQN) is a model-free, off-policy reinforcement learning algorithm that combines Q-learning with a deep neural network function...

Deep LearningGoogle DeepMind

Dactyl (OpenAI)

Dactyl was a robotics research project at OpenAI that used deep reinforcement learning to control a five-fingered, human-like robot hand and manipulate...

OpenAIRobotics

David Silver

David Silver is a British computer scientist whose work has defined the modern field of deep reinforcement learning and computer game-playing. For more than a...

Google DeepMindPeople

Deep Q-Network (DQN)

Deep Q-Network (DQN) is a reinforcement learning algorithm that uses a deep neural network to approximate the optimal action-value function (Q-function),...

Deep LearningMachine Learning

Depth up-scaling (DUS)

Depth up-scaling (DUS) is a model-scaling method that builds a deeper large language model by duplicating and stacking the layers of an existing pretrained...

Machine Learning

Discount Factor

The discount factor, almost always written as the Greek letter (gamma), is a scalar hyperparameter in reinforcement learning that controls how much an agent...

Machine Learning

DoReMi

DoReMi (Domain Reweighting with Minimax Optimization) is a method for automatically choosing the proportions, or "domain weights," of each data source in a...

Machine Learning

Dreamer (reinforcement learning)

Dreamer is a family of model-based reinforcement learning agents that learn a compact world model of their environment and then improve their behavior by...

World Models

Embodied AI

Embodied AI is artificial intelligence that perceives, reasons about, and acts within physical or simulated environments through a body, using sensors to...

Artificial IntelligenceDeep Learning

Environment

See also: Machine learning terms, Environment ChatGPT Plugins In reinforcement learning (RL), an environment is the external system that an agent interacts...

Machine Learning

Episode (Reinforcement Learning)

An episode in reinforcement learning is one complete sequence of interaction between an agent and its environment, starting from an initial state and ending...

Machine Learning

Epsilon Greedy Policy

See also: Machine learning terms, Reinforcement Learning, Q-Learning The epsilon-greedy policy (also written as ε-greedy) is a simple action-selection rule for...

Machine Learning

Evol-Instruct

Evol-Instruct is a method for automatically generating large instruction tuning datasets by prompting a large language model to rewrite, or "evolve," existing...

Machine Learning

Experience Replay

See also: Reinforcement Learning, Deep Q-Network (DQN), Q-Learning, Replay Buffer Experience replay is a reinforcement learning technique in which an agent...

Machine Learning

GRPO

Group Relative Policy Optimization (GRPO) is a reinforcement learning algorithm for fine-tuning large language models that eliminates the separate critic...

AI InferenceChinese AI

Gato (DeepMind)

Gato is a single generalist AI agent built by DeepMind and described in the May 2022 paper "A Generalist Agent" (arXiv:2205.06175). It is one neural network of...

AI ModelsGoogle DeepMind

Greedy Policy

In reinforcement learning, a greedy policy is a decision rule that, in every state, selects the action with the highest estimated value, formally the action...

Machine Learning

Group Sequence Policy Optimization (GSPO)

Group Sequence Policy Optimization (GSPO) is a reinforcement learning algorithm for training large language models, introduced by the Qwen team at Alibaba in...

Machine Learning

Gym (OpenAI Gym / Gymnasium)

Gym, often written as OpenAI Gym, is an open source Python toolkit for developing and comparing reinforcement learning algorithms, originally released by...

Developer ToolsOpenAI

HuggingFace TRL

TRL (Transformer Reinforcement Learning, now stylized as Transformers Reinforcement Learning) is an open-source Python library maintained by Hugging Face for...

Open Source AITraining & Optimization

Imitation Learning

Imitation learning (IL), also called learning from demonstration (LfD), is a family of machine learning methods in which an agent learns to perform a task by...

Machine LearningRobotics

Importance sampling

Importance sampling (often abbreviated IS) is a Monte Carlo method for estimating the expectation of a function under a target probability distribution by...

Statistics

Instruction backtranslation (Humpback)

Instruction backtranslation is a self-alignment method for generating instruction tuning data, introduced by researchers at Meta AI in the paper...

Machine Learning

Ioannis Antonoglou

Ioannis Antonoglou is a Greek artificial intelligence researcher known as a co-creator of several of the landmark reinforcement learning systems built at...

Google DeepMindPeople

Jeff Clune

Jeff Clune is a computer scientist known for research on open-endedness, evolutionary algorithms, deep reinforcement learning, and what he calls "AI-generating...

AI ResearchPeople

Joelle Pineau

Joelle Pineau (born 1974) is a Canadian computer scientist who is the first Chief AI Officer of Cohere, a professor and William Dawson Scholar at McGill...

Meta AIPeople

John Schulman

John Schulman is an American artificial intelligence researcher, one of the eleven original co-founders of OpenAI, and the inventor of Proximal Policy...

OpenAIPeople

KTO

KTO (Kahneman-Tversky Optimization) is a method for aligning large language models with human feedback using only a binary signal of whether a model output is...

AI AlignmentAI Inference

Kimi K1.5

Kimi K1.5 is a multimodal reasoning large language model developed by Moonshot AI, a Beijing-based artificial intelligence company. The model was announced on...

AI ModelsChinese AI

Machine learning terms/Reinforcement Learning

See also: Machine learning terms Reinforcement learning (RL) is a branch of machine learning in which an agent learns to make sequential decisions by...

Machine Learning

Markov Decision Process (MDP)

See also: Machine learning terms A Markov Decision Process (MDP) is a mathematical framework for modeling sequential decision-making in stochastic...

Machine LearningMathematics

Misha Laskin

Misha Laskin (also published as Michael Laskin) is an AI researcher and entrepreneur best known for his work on reinforcement learning and for co-founding...

AI CompaniesPeople

Model soups

Model soups is a weight-averaging technique (a form of model merging) that combines several independently fine-tuned neural networks into a single model by...

Machine Learning

Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm for sequential decision-making that finds strong actions by running many simulated playthroughs...

AI in GamingAlgorithms

MuJoCo

MuJoCo (short for Multi-Joint dynamics with Contact) is an open-source physics simulator designed for fast and accurate simulation of articulated mechanical...

Open Source AIRobotics

MuZero

MuZero is a model-based reinforcement learning algorithm developed by DeepMind that masters Go, chess, shogi, and 57 Atari video games at superhuman or state...

AI ModelsGoogle DeepMind

NVIDIA Isaac Lab

NVIDIA Isaac Lab is an open-source, GPU-accelerated framework for robot learning that trains robot control policies at scale by running thousands of physics...

NVIDIARobotics

Online learning

See also: Machine learning terms Online learning is a machine learning paradigm in which a model receives data sequentially, one example or one mini-batch at a...

Machine Learning

OpenAI Baselines

OpenAI Baselines is a collection of open-source, high-quality reference implementations of reinforcement learning (RL) algorithms released by OpenAI....

Open Source AIOpenAI

OpenAI Five

OpenAI Five was a reinforcement learning system developed by OpenAI to play the competitive multiplayer video game Dota 2 at a professional level. On April 13,...

AI in GamingArtificial Intelligence

Pieter Abbeel

Pieter Abbeel (born 1977) is a Belgian-American computer scientist and a professor of electrical engineering and computer sciences at the University of...

PeopleRobotics

Pluribus (poker AI)

Pluribus is an artificial intelligence program that defeated elite human professionals at six-player no-limit Texas hold'em, the most popular form of poker...

AI in GamingMeta AI

Policy

See also: Reinforcement learning, Q-learning, Markov decision process In reinforcement learning (RL), a policy is the function that maps an agent's observed...

Machine Learning

Policy gradient methods

Policy gradient methods are a family of reinforcement learning algorithms that directly parameterise the agent's policy and optimise it by stochastic gradient...

Machine LearningTraining & Optimization