Vision System - Overview 

Vision skills teach robots how to transform arbitrary objects into target poses. We develop a pre-trained model for performing a wide variety of such tasks; this model can later be fine-tuned on data for a particular task of interest.

The tasks involve placing one object (which we call the “action” object) at a semantically meaningful location relative to another object (which we call the “anchor” object)

action-anchor

TEACH: Data Collection 

The data collection procedure involves manually jogging the robot to do an assembly/disassembly task during which camera and robot data is recorded.

It is assumed that the system setup (like, camera calibration) is done before data collection procedure.

data-collection

Data collection procedure for an insertion placement task

LEARN: Model Training 

A trained model is used for estimating corss-pose between action and achor pointclouds. Details for the model can be found here: TAX-Pose Paper

taxpose

Data input for training:

Action pointclouds at the target position inserted in the Anchor pointclouds.

Waterproof Connector	D-Sub Connector

EXECUTE: Model Inference 

Execution is similar to the data collection process, except robot move autonomously using saved poses from earlier and uses trained model to infer right target pose for action object.

Here is a flow diagram of execution steps involved for insertion placement task: vision-execute

Vision System - Overview

TEACH: Data Collection

LEARN: Model Training

Data input for training:

EXECUTE: Model Inference

Vision System - Overview 

TEACH: Data Collection 

LEARN: Model Training 

EXECUTE: Model Inference 