Overview. TD-MPC2 compares favorably to existing model-free and model-based methods across 104 continuous control tasks spanning multiple domains, with a single set of hyperparameters (right). We further demonstrate the scalability of TD-MPC2 by training a single 317M parameter agent to perform 80 tasks across multiple domains, embodiments, and action spaces (left).
Abstract
TD-MPC is a model-based reinforcement learning (MBRL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results without any hyperparameter tuning. We further show that agent capabilities increase with model and data size, and successfully train a single agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents.
TD-MPC2 Learns Diverse Tasks
Benchmarking
Massively Multitask World Models
Supporting Open-Source Science
Paper
TD-MPC2: Scalable, Robust World Models for Continuous ControlNicklas Hansen, Hao Su, Xiaolong Wang
arXiv preprint
Citation
If you find our work useful, please consider citing the paper as follows: