Authors: John Oberlin, Stefanie Tellex
Robust robotic perception and manipulation of household objects requires the ability to detect, localize and manipulate a wide variety of objects, which may be mirror reflective like polished metal, glossy like smooth plastic, or transparent like glass; for example, picking a metal fork out of a sink full of running water or screwing a metal nut onto a bolt. Existing perceptual approaches based on photographs only take into account the average intensity of light arriving at each pixel from one direction, which limits their ability to account for these non-Lambertian scenes. To address this problem, we demonstrate time-lapse light field photography with an eye-in-hand camera of a manipulator robot. An eye-in-hand robot can capture both the intensity of rays, as in a conventional photograph, as well as the direction of the rays. We present a formal model for robotic light-field photography that fits into a probabilistic robotics framework. Using this model, we can synthesize orthographic photographs, remove specular highlights from those photographs, and perform 3D reconstruction with a monocular camera by finding approximate maximum-likelihood estimates. This information can be used to detect, localize and manipulate non-Lambertian objects in non-Lambertian scenes: our approach enables the Baxter robot to pick a shiny metal fork out of a sink filled with running water 24/25 times, as well as to localize objects well enough to screw a nut onto a quarter inch bolt. The techniques in this paper point the way toward new approaches to robotic perception that leverage a robot’s ability to move its camera to infer the state of the external world.