Reinforcement Learning-Guided MPC for Whole-Body Loco-Manipulation

Abstract

We introduce a scalable approach for precision- and safety-critical legged loco-manipulation. Our proposed hierarchical motion planning and control framework enables exploiting whole-body motions to reach objects in hard-to-access areas in cluttered scenes.

Combining RL for high-level planning and MPC for safe and precise low-level control we are able to leverage the best of both worlds. To overcome the burden of embedding the expensive optimization of whole-body MPC in the sample-inefficient RL training, we present an efficient training pipeline that leverages pre-training with a simplified model.

Contributions

Overall, our contributions are the following:

We present a training pipeline that leverages pre-training with a simplified model for training efficiency.
We present a new approach to perceptive learning-based control where the exteroceptive observations are obtained through ray-casting in a voxel-based map, eliminating the need for memory modules in the neural network architecture.
We demonstrated the application of our framework on a whole-body navigation task and validated our framework’s real-world applicability by successfully testing it in physical simulation.

Framework

Our proposed framework combines RL for high-level reasoning and MPC for precise and safe whole-body control to enable safe navigation in cluttered scenes. The RL+MPC hierarchy enables us to efficiently train a single policy that can navigate 100+ scene types, while the safety and optimality guarantees of optimization-based motion planning ensure safe and precise low-level control in any environment.

Training

We employ a two-stage training pipeline, that involves pre-training with a simplified model and fine-tuning with an MPC embedded in training, drastically improving training efficiency.

For the pre-training stage, we use a simplified model parametrized by base pose and end-effector position that is fast to simulate. The embodiment of this model includes a floating cuboid for the base and a sphere for the end-effector.