try to do this on 3 parallel gripper arms in 3d space and we are trying to orient objects in 3d space. sim2real transfer.

sim robot: sampling grasps, motion planning RRT, traj opt contact planning

real robot: point cloud obs, seg, keypoint, task conditioning

since 3D, more error, more exploration, so use RL for real world policy.

Simulator

Use Maniskill if for RL
Use Drake or IsaacLab for motion planning

Data Collection

IL / BC collecting demos
- motion planning with traj opt
  - Cons: fixed teacher
- teleop
  - Cons: fixed teacher
- Teacher Student
  - does not collect demos but rather trains entirely in sim
  - Cons: simulation only supervision, sim2real gaps
  - Pros: teacher has real-time feedback, policy adaptation.
RL optimizing traj using rewards

(sim) Teacher train using motion plan ⇒ student train from teacher ⇒ (real) keypoint point cloud stuff for tracking. can use RL here? or online traj opt

Real Policy

Diffusion policy
- Cons: computationally expensive, long reaction times
- Pros: supports multimodal distributions.
- Requires a source of privileged information for conditioning (e.g transformation from curr to goal pose)
RL policy
- a
Teacher Student Policy
- Teacher in sim
- student in real
- Cons: sim2real gap is wide becuse teacher has privileged information learned.

Simulator

Data Collection

Real Policy

Baselines