A 2D + 3D Rich Data Approach to Scene Understanding
نویسندگان
چکیده
On your one-minute walk from the coffee machine to your desk each morning, you pass by dozens of scenes – a kitchen, an elevator, your office – and you effortlessly recognize them and perceive their 3D structure. But this one-minute scene-understanding problem has been an open challenge in computer vision since the field was first established 50 years ago. In this dissertation, we aim to rethink the path researchers took over these years, challenge the standard practices and implicit assumptions in the current research, and redefine several basic principles in computational scene understanding. The key idea of this dissertation is that learning from rich data under natural setting is crucial for finding the right representation for scene understanding. First of all, to overcome the limitations of object-centric datasets, we built the Scene Understanding (SUN) Database, a large collection of real-world images that exhaustively spans all scene categories. This scene-centric dataset provides a more natural sample of human visual world, and establishes a realistic benchmark for standard 2D recognition tasks. However, while an image is a 2D array, the world is 3D and our eyes see it from a viewpoint, but this is not traditionally modeled. To obtain a 3D understanding at high-level, we reintroduce geometric figures using modern machinery. To model scene viewpoint, we propose a panoramic place representation to go beyond aperture computer vision and use data that is close to natural input for human visual system. This paradigm shift toward rich representation also opens up new challenges that require a new kind of big data – data with extra descriptions, namely rich data. Specifically, we focus on a highly valuable kind of rich data – multiple viewpoints in 3D – and we build the SUN3D database to obtain an integrated place-centric representation of scenes. We argue for the great importance of modeling the computer’s role as an agent in a 3D scene, and demonstrate the power of place-centric scene representation. Thesis Supervisor: Antonio Torralba Title: Associate Professor
منابع مشابه
A generalised framework for saliency-based point feature detection
Here we present a novel, histogram-based salient point feature detector that may naturally be applied to both images and 3D data. Existing point feature detectors are often modality specific, with 2D and 3D feature detectors typically constructed in separate ways. As such, their applicability in a 2D-3D context is very limited, particularly where the 3D data is obtained by a LiDAR scanner. By c...
متن کاملData-Driven Scene Understanding from 3D Models
In this paper, we propose a data-driven approach to leverage repositories of 3D models for scene understanding. Our ability to relate what we see in an image to a large collection of 3D models allows us to transfer information from these models, creating a rich understanding of the scene. We develop a framework for auto-calibrating a camera, rendering 3D models from the viewpoint an image was t...
متن کاملRobust Recovery of 3D Ellipse Data
This paper is concerned with robust, accurate and computationally tractable methods for the automatic recovery of 3D ellipse data from edge based stereo. The processing paradigm relies heavily on the 2D image as a rich and robust source of scene feature hypotheses (in this case ellipses). Rather than attempt to recover 3D scene descriptions by grouping unstructured estimates of disparity and/or...
متن کاملDeveloping a BIM-based Spatial Ontology for Semantic Querying of 3D Property Information
With the growing dominance of complex and multi-level urban structures, current cadastral systems, which are often developed based on 2D representations, are not capable of providing unambiguous spatial information about urban properties. Therefore, the concept of 3D cadastre is proposed to support 3D digital representation of land and properties and facilitate the communication of legal owners...
متن کاملInteractive Manipulation of 3D Scene Projections
Linear perspective is a good approximation to the format in which the human visual system conveys 3D scene information to the brain. Artists expressing 3D scenes, however, create nonlinear projections that balance their linear perspective view of a scene with elements of aesthetic style, layout and relative importance of scene objects. Manipulating the many parameters of a linear perspective ca...
متن کامل