real-time stereoscopic video streaming from the robot's perspective to a VR headset
Open-TeleVision can control multi-fingered robot hands, enabling more complex and precise manipulations.
A stereo camera on the robot's head streams 3D video to the VR headset.
VR headset user makes an action, changes the pose. A central server handles communication between the VR device and the robot, processing poses and video streams at 60Hz.
retargets these human poses to the robot's joint positions.
The retargeted joint positions are sent directly to the robot for real-time control.
During demonstrations, the system records: a) The stereo video from the robot's perspective b) The joint positions of the robot (which correspond to the retargeted human poses)
This recorded data becomes the "Demonstration Dataset" shown in the right part of the image.
Training the Imitation Policy:
Once trained, the policy can be deployed on the robot.
During deployment, the robot uses its own stereo camera and joint position data as input to the trained model, which then outputs the actions for the robot to perform.
One question is how can teleoperator feel the objects they are manipulating? to get more intutively feel