Spatial AI
Abstract: In this talk we’ll discuss how to build rich 3D representations of the environment to assist people and robots to perform tasks. We’ll first discuss how to build visual 3D maps of environments and use those for visual (re)localization, spatial data access and navigation. We’ll cover recent methods based on geometry, learning and combining both. One of the questions we will consider is what is best learned and where we should use explicit geometric concepts. We’ll also discuss how to build rich 3D semantic representations that enable queries and interactions with the scene. Our approach allows open vocabulary queries by leveraging foundation models. While these models are very powerful in recognizing arbitrary objects, there are some aspects that are still missing to enable robotic interactions. We’ll also briefly cover some of our work on action recognition which is key in building AI assistants and could also be useful to enable robots to learn from examples.
Bio: Marc Pollefeys is a Professor of Computer Science at ETH Zurich and the Director of the Microsoft Spatial AI Lab in Zurich. He is a Fellow of IEEE, ACM, AAIA and ELLIS, as well as a member of the Academia Europaea. His work received several prizes and awards, including the Marr Prize and several best paper awards. He obtained his PhD from the KU Leuven in 1999 and was a professor at UNC Chapel Hill before joining ETH Zurich. He is best known for his work in 3D computer vision, having been the first to develop a software pipeline to automatically turn photographs into 3D models, but also works on robotics, graphics and machine learning problems. Other noteworthy projects he worked on are real-time 3D scanning with mobile devices (2013), a real-time pipeline for 3D reconstruction of cities from vehicle mounted-cameras (2007), camera-based self-driving cars and the first fully autonomous vision-based drone (2012). More recently his academic research has focused on combining 3D reconstruction with semantic scene understanding. He served as the program chair for CVPR 2009 and general chair for ECCV 2014 and ICCV 2019 and was the founding president of the European Computer Vision Foundation.
Abstract:
Bio: Professor Philip Torr did his PhD (DPhil) at the Robotics Research Group of the University of Oxford under Professor David Murray of the Active Vision Group. He worked for another three years at Oxford as a research fellow, and still maintains close contact as visiting fellow there. He left Oxford to work for six years as a research scientist for Microsoft Research, first in Redmond, USA, in the Vision Technology Group, then in Cambridge founding the vision side of the Machine Learning and Perception Group. He then became a Professor in Computer Vision and Machine Learning at Oxford Brookes University. In 2013, Philip returned to Oxford as full professor where he has established the Torr Vision group. He won several awards including the Marr prize (the highest honour in vision) in 1998. He is a Royal Society Wolfson Research Merit Award Holder. Recently, together with members of his group, he has won several other awards including an honorary mention at the NIPS 2007 conference for the paper 'P. Kumar, V. Kolmorgorov, and P.H.S. Torr, An Analysis of Convex Relaxations for MAP Estimation', in NIPS 21, Neural Information Processing Conference, and (oral) Best Paper at Conference for 'O. Woodford, P.H.S. Torr, I. Reid, and A.W. Fitzgibbon, Global Stereo Reconstruction under Second Order Smoothness Priors', in Proceedings IEEE Conference of Computer Vision and Pattern Recognition, 2008 . More recently he has been awarded best science paper at BMVC 2010 and ECCV 2010. He was involved in the algorithm design for Boujou released by 2D3. Boujou has won a clutch of industry awards, including Computer Graphics World Innovation Award, IABM Peter Wayne Award, and CATS Award for Innovation, and a technical EMMY. He then worked closely with this Oxford based company as well as other companies such as Sony on the Wonderbook project. He has been involved in numerous spin-outs as founder or advisor including: FiveAI, Onfido, Oxsight, Eigent, DreamTech, Visionary Machines, CamelAI, as well as working closely with big tech companies like Google, Meta, Apple, Microsoft, and Sony. He was elected Fellow of the Royal Academy of Engineering (FREng) in 2019, and Fellow of the Royal Society (FRS) in 2021 for contributions to computer vision. In 2021 he was made Turing AI world leading researcher fellow.