We employ a two-stage training pipeline, that involves pre-training with a simplified model and fine-tuning
with an MPC embedded in training, drastically improving training
efficiency.
For the pre-training stage, we use a simplified model
parametrized by base pose and end-effector position that is fast to simulate. The embodiment of this model includes a floating cuboid for the base and a sphere for the end-effector.