try to do this on 3 parallel gripper arms in 3d space and we are trying to orient objects in 3d space. sim2real transfer.
sim robot: sampling grasps, motion planning RRT, traj opt contact planning
real robot: point cloud obs, seg, keypoint, task conditioning
- since 3D, more error, more exploration, so use RL for real world policy.
Simulator
- Use Maniskill if for RL
- Use Drake or IsaacLab for motion planning
Data Collection
- IL / BC collecting demos
- motion planning with traj opt
- teleop
- Teacher Student
- does not collect demos but rather trains entirely in sim
- Cons: simulation only supervision, sim2real gaps
- Pros: teacher has real-time feedback, policy adaptation.
- RL optimizing traj using rewards
(sim) Teacher train using motion plan ⇒ student train from teacher ⇒ (real) keypoint point cloud stuff for tracking. can use RL here? or online traj opt
Real Policy
- Diffusion policy
- Cons: computationally expensive, long reaction times
- Pros: supports multimodal distributions.
- Requires a source of privileged information for conditioning (e.g transformation from curr to goal pose)
- RL policy
- Teacher Student Policy
- Teacher in sim
- student in real
- Cons: sim2real gap is wide becuse teacher has privileged information learned.
Baselines