Meta Motivo
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,487 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,487 words
Add missing citations, update stale details, or suggest a clearer explanation.
Meta Motivo is a behavioral foundation model for controlling a simulated humanoid body, released by Meta AI's Fundamental AI Research (FAIR) group on December 12, 2024. Meta describes it as the first behavioral foundation model for whole-body humanoid control: a single pre-trained controller that performs a range of tasks zero-shot, without task-specific training or fine-tuning, including motion tracking, reaching goal poses, and optimizing user-specified rewards.[1][2] It was trained with an unsupervised reinforcement learning algorithm the team calls FB-CPR (Forward-Backward representations with Conditional-Policy Regularization), which grounds the learning objective using an unlabeled dataset of human motion capture.[2][3] Meta released the model weights and code openly under a non-commercial license.[4][5]
The work was led by Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta, and is documented in the paper "Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models."[2][3]
A foundation model is trained once on broad data and then adapted to many downstream uses. In language and vision this adaptation is usually prompting or fine-tuning; the analogue for control is a single agent that can be steered to many behaviors at test time. Most prior work in physics-based humanoid animation trains a separate policy for each task or motion, which is expensive and does not transfer. Meta Motivo applies the foundation-model idea to control: one network is pre-trained on unlabeled motion data, and at inference the user supplies a prompt (a reward function, a target pose, or a reference motion) that the model solves without further optimization.[2][6]
The technical basis is the forward-backward (FB) representation, an unsupervised learning approach to reinforcement learning introduced in earlier FAIR research. An FB representation learns to embed states, rewards, and policies into a shared latent space so that, once trained, a near-optimal policy for any reward can be recovered by a simple inference step rather than by retraining.[2] Meta Motivo's contribution is adapting this idea to a high-dimensional humanoid and grounding it in human motion data so that the recovered behaviors look natural rather than merely high-scoring.
FB-CPR stands for Forward-Backward representations with Conditional-Policy Regularization. It is an online unsupervised reinforcement learning method that grounds policy learning using observation-only, unlabeled motion data (the dataset contains states but no actions or rewards).[3][6] The key idea is twofold. First, the forward-backward representation embeds the unlabeled trajectories into the same latent space used to represent states, rewards, and policies, so a recorded human motion can be mapped to a latent vector that conditions the policy. Second, a latent-conditional discriminator regularizes training by encouraging the learned policies to "cover" the states that appear in the unlabeled behavior dataset, pulling the agent's behavior toward the distribution of real human motion.[2][3] This distribution-matching regularization is what gives the model human-like movement while preserving the zero-shot generality of the underlying FB representation.
The implementation uses five neural networks: a forward network and a backward network (which together form the FB representation), an actor (the policy), a critic (a value estimate), and the discriminator.[4][5] At test time the model does not run any additional learning; a task is converted into a latent context vector through one of three inference procedures, one for each prompt type, and the actor then acts conditioned on that vector.[4]
The agent is a SMPL-based humanoid simulated in the MuJoCo physics engine. The body has 24 rigid bodies, of which 23 are actuated, a 358-dimensional observation (covering pose, joint rotations, and velocities), and a 69-dimensional continuous action space. Physics are simulated at 450 Hz with control issued at 30 Hz.[6] SMPL (Skinned Multi-Person Linear model) is a standard parametric model of the human body, which makes the skeleton compatible with large human motion-capture corpora. Meta also released a companion environment, HumEnv, that defines this humanoid, the MuJoCo backend, and a benchmark covering reward, goal, and tracking tasks, so that results can be reproduced.[7]
FB-CPR is trained on unlabeled human motion capture from AMASS, a large archive that unifies many mocap datasets into the SMPL format. The training split used roughly 8,902 motion trajectories totaling about 29 hours, with a held-out test split of about 990 motions (around 3 hours).[6] The data is observation-only: the model sees the human poses but is never given the actions a person used to produce them, nor any reward signal, which is why the learning is unsupervised. Initial states for episodes can be sampled from these mocap frames as well as from static poses or random falls.[7]
After pre-training, Meta Motivo is prompted to solve unseen tasks at test time without any additional learning or planning. The three prompt families and the way each is specified are summarized below.
| Task family | What the user provides | What the model does |
|---|---|---|
| Reward optimization | A reward function (for example, run forward, jump, raise an arm) | Infers a latent context that approximately maximizes that reward and acts on it |
| Goal / pose reaching | A single target body pose | Moves the humanoid to match the target pose |
| Motion tracking | A reference motion (sequence of poses, observation-only) | Imitates the reference trajectory frame by frame |
Because all three reduce to choosing a latent context for the same pre-trained policy, the model can switch between them without retraining.[2][4]
In the paper's benchmarks, Meta Motivo reaches a large fraction of the performance of methods trained specifically for each task while using none of that task-specific training. The evaluation suite includes 45 reward tasks, 50 goal poses, and a set of tracking motions.[6] Reported comparisons include:
| Comparison | Result for FB-CPR |
|---|---|
| Aggregate vs. top-line (task-specific) performance | About 73.4% of top-line performance, zero-shot[1][8] |
| vs. task-specific TD3 on reward tasks | About 61% of TD3's reward score[6] |
| Motion tracking success rate | About 83%[6] |
| vs. Diffuser (planning-based) on reward optimization | About 177% of Diffuser's performance[6] |
| vs. ASE-style zero-shot baseline (all categories) | Roughly 1.4x better[6] |
Meta reports that the model "achieves competitive performance compared to task-specific methods and outperforms state-of-the-art unsupervised reinforcement learning and model-based baselines, while exhibiting more human-like behaviors."[1] In a human-evaluation study, raters judged Meta Motivo's behavior to look more natural than a task-specific baseline in a majority of comparisons (roughly 83% of reward-based tasks and 69% of goal-reaching cases), even where the baseline scored higher on raw reward, which the authors attribute to the motion prior supplied by FB-CPR.[6][8] The model also showed unplanned robustness: it kept producing sensible behavior under environment changes it was never trained for, such as altered gravity, simulated wind, or direct physical perturbations.[1]
The authors note limitations. The model can struggle with tasks far from anything in the motion-capture data, and its movements are imperfect in cases such as falling and recovering to a stand.[8]
Meta released Meta Motivo openly alongside the announcement. The training and inference code is published in the facebookresearch/metamotivo repository, and pre-trained weights are hosted on Hugging Face under the facebook/metamotivo model collection.[4][5] The release includes several smaller variants of about 24.5 million parameters each (corresponding to the five training seeds evaluated in the paper, for example metamotivo-S-1) and a larger, more performant model, metamotivo-M-1, of about 288 million parameters.[4] The model and code are distributed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) license, which permits research and other non-commercial use.[4][5] An interactive web demo lets users prompt the humanoid in a browser.[9]
Meta framed the work as a step toward embodied agents for virtual worlds, suggesting applications such as more lifelike non-player characters and more accessible character animation, while presenting it primarily as a research artifact rather than a product.[1]