Integrating Function, Geometry, Appearance for Scene Parsing

نویسندگان

Yibiao Zhao

Song-Chun Zhu

چکیده

In this paper, we present a Stochastic Scene Grammar (SSG) for parsing 2D indoor images into 3D scene layouts. Our grammar model integrates object functionality, 3D object geometry, and their 2D image appearance in a Function-Geometry-Appearance (FGA) hierarchy. In contrast to the prevailing approach in the literature which recognizes scenes and detects objects through appearance-based classification using machine learning techniques, our method takes a different perspective to scene understanding and recognizes objects and scenes by reasoning their functionality. Functionality is an essential property which often defines the categories of objects and scenes, and decides the design of geometry and scene layout. For example, a sofa is for people to sit comfortably, and a kitchen is a space for people to prepare food with various objects. Our SSG formulates object functionality and contextual relations between objects and imagined human poses in a joint probability distribution in the FGA hierarchy. The latter includes both functional concepts (the scene category, functional groups, functional objects, functional parts) and geometric entities (3D/2D/1D shape primitives). The decomposition of the grammar is terminated on the bottom-up detected lines and regions. We use a Markov chain Monte Carlo (MCMC) algorithm to optimize the Bayesian a posteriori probability and the output parse tree includes a 3D description of the 2D image in the FGA hierarchy. Experimental results on two Yibiao Zhao University of California, Los Angeles (UCLA), USA E-mail: [email protected] www.yibiaozhao.com Song-Chun Zhu University of California, Los Angeles (UCLA), USA E-mail: [email protected] http://www.stat.ucla.edu/~sczhu challenging indoor datasets demonstrate that the proposed approach not only significantly widens the scope of indoor scene parsing from traditional scene segmentation, labeling, and 3D reconstruction to functional object recognition, but also yields improved overall performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scene-centric Joint Parsing of Cross-view Videos

Cross-view video understanding is an important yet underexplored area in computer vision. In this paper, we introduce a joint parsing framework that integrates view-centric proposals into scene-centric parse graphs that represent a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations ...

متن کامل

Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs

متن کامل

Geometry and Illumination Modelling for Scene Understanding

Project Summary The goal this proposal is to develop unified framework for reasoning about objects, scenes and lighting from single and multiple views of indoors and outdoors environments. We propose computational models for semantic parsing of scenes which incorporate information about the lighting and illumination to resolve the ambiguities of purely appearance based methods and develop class...

متن کامل

Pedestrians Tracking in a Camera Network

With the increase of the number of cameras installed across a video surveillance network, the ability of security staffs to attentively scan all the video feeds actually decreases. Therefore, the need for an intelligent system that operates as a tracking system is vital for security personnel to do their jobs well. Tracking people as they move through a camera network with non-overlapping field...

متن کامل