Learning to Look around Objects for Top-View Representations of Outdoor Scenes
نویسندگان
چکیده
Given a single RGB image of a complex outdoor road scene in the perspective view, we address the novel problem of estimating an occlusion-reasoned semantic scene layout in the top-view. This challenging problem not only requires an accurate understanding of both the 3D geometry and the semantics of the visible scene, but also of occluded areas. We propose a convolutional neural network that learns to predict occluded portions of the scene layout by looking around foreground objects like cars or pedestrians. But instead of hallucinating RGB values, we show that directly predicting the semantics and depths in the occluded areas enables a better transformation into the top-view. We further show that this initial top-view representation can be significantly enhanced by learning priors and rules about typical road layouts from simulated or, if available, map data. Crucially, training our model does not require costly or subjective human annotations for occluded areas or the topview, but rather uses readily available annotations for standard semantic segmentation. We extensively evaluate and analyze our approach on the KITTI and Cityscapes data sets.
منابع مشابه
Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet
Neurophysiological evidence for invariant representations of objects and faces in the primate inferior temporal visual cortex is described. Then a computational approach to how invariant representations are formed in the brain is described that builds on the neurophysiology. A feature hierarchy model in which invariant representations can be built by self-organizing learning based on the tempor...
متن کاملInvariant visual object recognition: a model, with lighting invariance.
How are invariant representations of objects formed in the visual cortex? We describe a neurophysiological and computational approach which focusses on a feature hierarchy model in which invariant representations can be built by self-organizing learning based on the statistics of the visual input. The model can use temporal continuity in an associative synaptic learning rule with a short term m...
متن کاملLearning Invariant Visual Shape Representations from Physics
3D shape determines an object’s physical properties to a large degree. In this article, we introduce an autonomous learning system for categorizing 3D shape of simulated objects from single views. The system extends an unsupervised bottom-up learning architecture based on the slowness principle with top-down information derived from the physical behavior of objects. The unsupervised bottom-up l...
متن کاملTop-down control of visual perception: attention in natural vision.
Top-down perceptual influences can bias (or pre-empt) perception. In natural scenes, the receptive fields of neurons in the inferior temporal visual cortex (IT) shrink to become close to the size of objects. This facilitates the read-out of information from the ventral visual system, because the information is primarily about the object at the fovea. Top-down attentional influences are much les...
متن کاملTop down saliency estimation via superpixel-based discriminative dictionaries
Predicting where humans look in images has gained significant popularity in recent years. In this work, we present a novel method for learning top-down visual saliency, which is well-suited to locate objects of interest in complex scenes. During training, we jointly learn a superpixel based class-specific dictionary and a Conditional Random Field (CRF). While using such a discriminative diction...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018