Event-Driven Visual-Tactile Sensing and Learning for Robots


Tasbolat Taunyazov, Weicong Sng, Brian Lim, Hian Hian, Jethro Kuan, Abdul Fatir, Benjamin Tee, Harold Soh

Abstract

This work contributes an event-driven visual-tactile perception system, comprising a novel biologically-inspired tactile sensor and multi-modal spike-based learning. Our neuromorphic fingertip tactile sensor, NeuTouch, scales well with the number of taxels thanks to its event-based nature. Likewise, our Visual-Tactile Spiking Neural Network (VT-SNN) enables fast perception when coupled with event sensors. We evaluate our visual-tactile system (using the NeuTouch and Prophesee event camera) on two robot tasks: container classification and rotational slip detection. On both tasks, we observe good accuracies relative to standard deep learning methods. We have made our visual-tactile datasets freely-available to encourage research on multi-modal event-driven robot perception, which we believe is a promising approach towards intelligent power-efficient robot systems.

Live Paper Discussion Information

Start Time End Time
07/14 15:00 UTC 07/14 17:00 UTC

Virtual Conference Presentation

Paper Reviews

Review 1

The idea of using event-based tactile sensing is very interesting, and the combination with event-based vision is also new. However, the paper should improve on the following aspects to better justify the contribution: - The authors consider the design of the event-based tactile sensor is a major contribution of the paper, so that they should provide more details of the sensor, regarding the design and evaluation. There is no evaluation of the sensor in the paper. In addition, it is unclear why the authors consider the sensor 'event-based'. Does it mean the sensor can hardly measure the static pressure? Or does it mean the sensor can hardly measure the magnitude of the contact force/pressure value or derivative? - It is unclear why the authors build this event-based sensory system. What're the advantages over the RGB(D) camera and tactile sensors that measure force/pressure constantly? Especially in the classification task, the RGB(D) vision is expected to make a good performance. - The authors need to report more about their experimental data, especially when they are training a deep neural network model. How many datapoints are there in the dataset? How did the authors divide the training set, validation set, and test set? What did the authors do to make sure their dataset has enough variance? Particularly, for the rotational slip detection task, the authors experimented with only one object. The authors need to justify the model could detect rotational slip in more common cases. - In the classification task, there are two variables: types of objects, and weight of the objects. It will be helpful if the authors could report how well the model could differentiate the types of objects, and how well it can differentiate the weight of objects.

Review 3

There are inconsistent descriptions about the number of object classes in the paper. “a third data-collection experiment that expands the number of grasped items to 36 different objects” “Visual-tactile event sensor datasets comprising more than 50 different object classes” “36 object classes with various visual and tactile profiles” With 36 (or 50?) classes available, the first experiment is conducted using only 4 kinds of containers. How does your VTSNN model perform on all object classes? The second experiment (binary classification) seems to easy for visual sensory (100% successful rate). It does not show any improvement because of the tactile sensor. For Equation 2, instead of using hand-crafted regression targets, why not use a classification loss (e.g. cross-entropy) instead? The visual model is an SNN operating on pixel differences. Today’s state-of-the-art models are mostly convolutional networks. What’s would the performance be if using a light-weight CNN? How does the SNN model scale with the increase of taxels? A denser taxel arrangement would have a stronger local correlation between nearby taxel signals, how does that affect your model?