Meta Motivo

AI Models Embodied AI Meta AI

7 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v2 · 1,484 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Meta Motivo is a behavioral foundation model for controlling a simulated humanoid body, released by Meta AI's Fundamental AI Research (FAIR) group on December 12, 2024. Meta describes it as the first behavioral foundation model for whole-body humanoid control: a single pre-trained controller that performs a range of tasks zero-shot, without task-specific training or fine-tuning, including motion tracking, reaching goal poses, and optimizing user-specified rewards.^[1]^[2] It was trained with an unsupervised reinforcement learning algorithm the team calls FB-CPR (Forward-Backward representations with Conditional-Policy Regularization), which grounds the learning objective using an unlabeled dataset of human motion capture.^[2]^[3] Meta released the model weights and code openly under a non-commercial license.^[4]^[5]

The work was led by Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, and Matteo Pirotta, and is documented in the paper "Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models."^[2]^[3]

Background

A foundation model is trained once on broad data and then adapted to many downstream uses. In language and vision this adaptation is usually prompting or fine-tuning; the analogue for control is a single agent that can be steered to many behaviors at test time. Most prior work in physics-based humanoid animation trains a separate policy for each task or motion, which is expensive and does not transfer. Meta Motivo applies the foundation-model idea to control: one network is pre-trained on unlabeled motion data, and at inference the user supplies a prompt (a reward function, a target pose, or a reference motion) that the model solves without further optimization.^[2]^[6]

The technical basis is the forward-backward (FB) representation, an unsupervised learning approach to reinforcement learning introduced in earlier FAIR research. An FB representation learns to embed states, rewards, and policies into a shared latent space so that, once trained, a near-optimal policy for any reward can be recovered by a simple inference step rather than by retraining.^[2] Meta Motivo's contribution is adapting this idea to a high-dimensional humanoid and grounding it in human motion data so that the recovered behaviors look natural rather than merely high-scoring.

FB-CPR algorithm

FB-CPR stands for Forward-Backward representations with Conditional-Policy Regularization. It is an online unsupervised reinforcement learning method that grounds policy learning using observation-only, unlabeled motion data (the dataset contains states but no actions or rewards).^[3]^[6] The key idea is twofold. First, the forward-backward representation embeds the unlabeled trajectories into the same latent space used to represent states, rewards, and policies, so a recorded human motion can be mapped to a latent vector that conditions the policy. Second, a latent-conditional discriminator regularizes training by encouraging the learned policies to "cover" the states that appear in the unlabeled behavior dataset, pulling the agent's behavior toward the distribution of real human motion.^[2]^[3] This distribution-matching regularization is what gives the model human-like movement while preserving the zero-shot generality of the underlying FB representation.

The implementation uses five neural networks: a forward network and a backward network (which together form the FB representation), an actor (the policy), a critic (a value estimate), and the discriminator.^[4]^[5] At test time the model does not run any additional learning; a task is converted into a latent context vector through one of three inference procedures, one for each prompt type, and the actor then acts conditioned on that vector.^[4]

Humanoid body and simulator

The agent is a SMPL-based humanoid simulated in the MuJoCo physics engine. The body has 24 rigid bodies, of which 23 are actuated, a 358-dimensional observation (covering pose, joint rotations, and velocities), and a 69-dimensional continuous action space. Physics are simulated at 450 Hz with control issued at 30 Hz.^[6] SMPL (Skinned Multi-Person Linear model) is a standard parametric model of the human body, which makes the skeleton compatible with large human motion-capture corpora. Meta also released a companion environment, HumEnv, that defines this humanoid, the MuJoCo backend, and a benchmark covering reward, goal, and tracking tasks, so that results can be reproduced.^[7]

Training data

FB-CPR is trained on unlabeled human motion capture from AMASS, a large archive that unifies many mocap datasets into the SMPL format. The training split used roughly 8,902 motion trajectories totaling about 29 hours, with a held-out test split of about 990 motions (around 3 hours).^[6] The data is observation-only: the model sees the human poses but is never given the actions a person used to produce them, nor any reward signal, which is why the learning is unsupervised. Initial states for episodes can be sampled from these mocap frames as well as from static poses or random falls.^[7]

Zero-shot tasks

After pre-training, Meta Motivo is prompted to solve unseen tasks at test time without any additional learning or planning. The three prompt families and the way each is specified are summarized below.

Task family	What the user provides	What the model does
Reward optimization	A reward function (for example, run forward, jump, raise an arm)	Infers a latent context that approximately maximizes that reward and acts on it
Goal / pose reaching	A single target body pose	Moves the humanoid to match the target pose
Motion tracking	A reference motion (sequence of poses, observation-only)	Imitates the reference trajectory frame by frame

Because all three reduce to choosing a latent context for the same pre-trained policy, the model can switch between them without retraining.^[2]^[4]

Performance and evaluation

In the paper's benchmarks, Meta Motivo reaches a large fraction of the performance of methods trained specifically for each task while using none of that task-specific training. The evaluation suite includes 45 reward tasks, 50 goal poses, and a set of tracking motions.^[6] Reported comparisons include:

Comparison	Result for FB-CPR
Aggregate vs. top-line (task-specific) performance	About 73.4% of top-line performance, zero-shot^[1]^[8]
vs. task-specific TD3 on reward tasks	About 61% of TD3's reward score^[6]
Motion tracking success rate	About 83%^[6]
vs. Diffuser (planning-based) on reward optimization	About 177% of Diffuser's performance^[6]
vs. ASE-style zero-shot baseline (all categories)	Roughly 1.4x better^[6]

Meta reports that the model "achieves competitive performance compared to task-specific methods and outperforms state-of-the-art unsupervised reinforcement learning and model-based baselines, while exhibiting more human-like behaviors."^[1] In a human-evaluation study, raters judged Meta Motivo's behavior to look more natural than a task-specific baseline in a majority of comparisons (roughly 83% of reward-based tasks and 69% of goal-reaching cases), even where the baseline scored higher on raw reward, which the authors attribute to the motion prior supplied by FB-CPR.^[6]^[8] The model also showed unplanned robustness: it kept producing sensible behavior under environment changes it was never trained for, such as altered gravity, simulated wind, or direct physical perturbations.^[1]

The authors note limitations. The model can struggle with tasks far from anything in the motion-capture data, and its movements are imperfect in cases such as falling and recovering to a stand.^[8]

Release and licensing

Meta released Meta Motivo openly alongside the announcement. The training and inference code is published in the facebookresearch/metamotivo repository, and pre-trained weights are hosted on Hugging Face under the facebook/metamotivo model collection.^[4]^[5] The release includes several smaller variants of about 24.5 million parameters each (corresponding to the five training seeds evaluated in the paper, for example metamotivo-S-1) and a larger, more performant model, metamotivo-M-1, of about 288 million parameters.^[4] The model and code are distributed under the Creative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) license, which permits research and other non-commercial use.^[4]^[5] An interactive web demo lets users prompt the humanoid in a browser.^[9]

Meta framed the work as a step toward embodied agents for virtual worlds, suggesting applications such as more lifelike non-player characters and more accessible character animation, while presenting it primarily as a research artifact rather than a product.^[1]

References

"Sharing new research, models, and datasets from Meta FAIR." Meta AI Blog, December 12, 2024. https://ai.meta.com/blog/meta-fair-updates-agents-robustness-safety-architecture/ ↩
"Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models." Research publication page, AI at Meta, December 12, 2024. https://ai.meta.com/research/publications/zero-shot-whole-body-humanoid-control-via-behavioral-foundation-models/ ↩
Tirinzoni, Andrea; Touati, Ahmed; Farebrother, Jesse; Guzek, Mateusz; Kanervisto, Anssi; Xu, Yingchen; Lazaric, Alessandro; Pirotta, Matteo. "Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models." arXiv:2504.11054. https://arxiv.org/abs/2504.11054 ↩
"facebookresearch/metamotivo: The first behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks." GitHub. https://github.com/facebookresearch/metamotivo ↩
"facebook/metamotivo-S-1." Hugging Face model card. https://huggingface.co/facebook/metamotivo-S-1 ↩
"Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models" (full text). arXiv HTML. https://arxiv.org/html/2504.11054v1 ↩
"facebookresearch/HumEnv." GitHub. https://github.com/facebookresearch/HumEnv ↩
"Meta FAIR Releases Meta Motivo: A New Behavioral Foundation Model for Controlling Virtual Physics-based Humanoid Agents for a Wide Range of Complex Whole-Body Tasks." MarkTechPost, December 16, 2024. https://www.marktechpost.com/2024/12/16/meta-fair-releases-meta-motivo-a-new-behavioral-foundation-model-for-controlling-virtual-physics-based-humanoid-agents-for-a-wide-range-of-complex-whole-body-tasks/ ↩
"Meta Motivo." Project page and demo, Meta Demo Lab. https://metamotivo.metademolab.com/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Llama 3

Background

FB-CPR algorithm

Humanoid body and simulator

Training data

Zero-shot tasks

Performance and evaluation

Release and licensing

See also

References

Improve this article

Related Articles

AI Habitat

Droidlet

SmolVLA

NVIDIA Cosmos

Skild AI

Helix (VLA model)