Overview. TD-MPC2 compares favorably to existing model-free and model-based methods across 104 continuous control tasks spanning multiple domains, with a single set of hyperparameters (right). We further demonstrate the scalability of TD-MPC2 by training a single 317M parameter agent to perform 80 tasks across multiple domains, embodiments, and action spaces (left).
Abstract
TD-MPC is a model-based reinforcement learning (MBRL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results without any hyperparameter tuning. We further show that agent capabilities increase with model and data size, and successfully train a single agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents.
TD-MPC2 Learns Diverse Tasks
We evaluate TD-MPC2 on 104 control tasks across 4 task domains: DMControl, Meta-World, ManiSkill2, and MyoSuite.
Benchmarking
Massively Multitask World Models
Supporting Open-Source Science
We open-source a total of 324 TD-MPC2 model checkpoints, including 12 multi-task models (ranging from 1M to 317M parameters) trained on 80, 70, and 30 tasks, respectively.
Additionally, we also release the two 545M and 345M transition datasets that we used to train our multi-task models. The datasets are sourced from the replay buffers of 240 single-task agents and thus contain a wide range of behaviors.
Domains | Tasks | Embodiments | Episodes | Transitions | Size | Link |
---|---|---|---|---|---|---|
DMControl + Meta-World |
80 | 12 | 2.69M | 545M | 34GB | Download |
DMControl |
30 | 11 | 690k | 345M | 20GB | Download |
We are excited to see what the community will do with these models and datasets, and hope that our release will encourage other research labs to open-source their checkpoints as well.
Paper
TD-MPC2: Scalable, Robust World Models for Continuous ControlNicklas Hansen, Hao Su, Xiaolong Wang
arXiv preprint
Citation
If you find our work useful, please consider citing the paper as follows: