Abstract: The last years have seen astonishing progress in the capabilities of generative AI techniques, particularly in the areas of language and visual understanding and generation. Key to the success of these models are the use of image and text data sets of unprecedented scale along with models that are able to digest such large datasets. We are now seeing the first examples of leveraging such models to equip robots with open-world visual understanding and reasoning capabilities. Unfortunately, however, we have not achieved the RobotGPT moment; these models still struggle with reasoning about geometry and physical interactions in the real world, resulting in brittle performance on seemingly simple tasks such as manipulating objects in the open world. A crucial reason for this problem is the lack of data suitable to train powerful, general models for robot decision making and control. In this talk, I will discuss approaches to generating large datasets for training robot manipulation capabilities, with a focus on the role simulation can play in this context. I will show some of our prior work, where we demonstrated robust sim-to-real transfer of manipulation skills trained in simulation, and then present a path toward generating large scale demonstration sets that could help train robust, open-world robot manipulation models.
Bio: Dieter Fox is Senior Director of Robotics Research at NVIDIA and Professor in the Allen School of Computer Science & Engineering at the University of Washington, where he heads the UW Robotics and State Estimation Lab. Dieter’s research is in robotics and artificial intelligence, with a focus on learning and perception applied to problems such as robot manipulation, mapping, and object detection and tracking. He has published more than 200 technical papers and is the co-author of the textbook “Probabilistic Robotics”. He is a Fellow of the IEEE, AAAI, and ACM, and recipient of the 2020 IEEE Pioneer in Robotics and Automation Award and the 2023 IJCAI John McCarthy Award. He was an editor of the IEEE Transactions on Robotics, program co-chair of the 2008 AAAI Conference on Artificial Intelligence, and program chair of the 2013 Robotics: Science and Systems conference.
Abstract: Our senses, and those of any organism or agent operating in the same environment as us, are inundated with myriads of diverse signals. How does the brain transform this sensory cacophony into a coherent percept of the world? Why does perception sometimes falter, resulting in illusions? This talk will explore how sensory integration can trick the human brain into believing that a puppet is speaking, a rubber hand belongs to our own body or that ‘da’ is heard instead of ‘ba’. I will demonstrate that both accurate and illusionary perceptions result from the brain’s strategy to integrate sensory information based on Bayesian principles.
Specifically, I will show that the brain near-optimally integrates sensory signals arising from common causes into more precise percepts by weighting them in relation to their momentary uncertainties. Moving to more complex scenarios, I will discuss situations where signals may originate from common or independent sources. In these cases, the brain needs to solve the binding problem—determining whether signals come from common causes and should hence be integrated or else be processed independently. I will show that the brain arbitrates between sensory integration and segregation consistent with principles of hierarchical Bayesian causal inference. Finally, I will explore how the brain may employ attentional mechanisms to compute approximate solutions to the binding problem in realistic environments where numerous signals and sources make optimal Bayesian inference infeasible for the brain’s limited computational capacities.
Overall, our research highlights the critical role of sensory integration based on Bayesian principles in enabling the brain to resolve perceptual ambiguities and reduce its uncertainties about the world. Since robots need to operate in similar sensory environments and thus face comparable challenges, these insights may have important implications for robotic perception and decision making.
Bio: Uta Noppeney is Professor of Systems Neuroscience at the Neurophysics department and a Principal Investigator at the Donders Centres for Cognitive Neuroimaging and Neuroscience within the Donders Institute for Brain, Cognition and Behaviour. Previously, she was a Professor of Computational Neuroscience and director of the Computational Neuroscience and Cognitive Robotics Centre at the University of Birmingham (UK) and independent research group leader at the Max Planck Institute for Biological Cybernetics, Tuebingen (Germany). Her research investigates the computational and neural mechanisms of perceptual inference, learning and attention in dynamic multisensory environments. She uses a multidisciplinary approach integrating psychophysics, computational modelling (Bayesian, neural network) and advanced neuroimaging techniques (fMRI, MEG, EEG, TMS). She is the recipient of a Young Investigator Award of the Cognitive Neuroscience Society in 2013, a Turing Fellowship in 2018, an ERC starting grant in 2013 and an ERC advanced grant in 2023. She is also a member of the Academia Europaea and an academic editor of PLOS Biology and Multisensory Research.