Multimodal Recurrent Neural Networks with Information Transfer Layers for Indoor Scene Labeling
نویسندگان
چکیده
This paper proposes a new method called Multimodal RNNs for RGB-D scene semantic segmentation. It is optimized to classify image pixels given two input sources: RGB color channels and Depth maps. It simultaneously performs training of two recurrent neural networks (RNNs) that are crossly connected through information transfer layers, which are learnt to adaptively extract relevant cross-modality features. Each RNN model learns its representations from its own previous hidden states and transferred patterns from the other RNNs previous hidden states; thus, both model-specific and crossmodality features are retained. We exploit the structure of quad-directional 2D-RNNs to model the short and long range contextual information in the 2D input image. We carefully designed various baselines to efficiently examine our proposed model structure. We test our Multimodal RNNs method on popular RGB-D benchmarks and show how it outperforms previous methods significantly and achieves competitive results with other state-of-the-art works.
منابع مشابه
Convolutional Neural Networks with Intra-Layer Recurrent Connections for Scene Labeling
Scene labeling is a challenging computer vision task. It requires the use of both local discriminative features and global context information. We adopt a deep recurrent convolutional neural network (RCNN) for this task, which is originally proposed for object recognition. Different from traditional convolutional neural networks (CNN), this model has intra-layer recurrent connections in the con...
متن کاملDA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks
3D scene understanding is important for robots to interact with the 3D world in a meaningful way. Most previous works on 3D scene understanding focus on recognizing geometrical or semantic properties of a scene independently. In this work, we introduce Data Associated Recurrent Neural Networks (DA-RNNs), a novel framework for joint 3D scene mapping and semantic labeling. DA-RNNs use a new recur...
متن کاملMulti-level Contextual RNNs with Attention Model for Scene Labeling
Context in image is crucial for scene labeling while existing methods only exploit local context generated from a small surrounding area of an image patch or a pixel, by contrast long-range and global contextual information is ignored. To handle this issue, we in this work propose a novel approach for scene labeling by exploring multi-level contextual recurrent neural networks (ML-CRNNs). Speci...
متن کاملSemantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks
This work investigates the use of deep fully convolutional neural networks (DFCNN) for pixel-wise scene labeling of Earth Observation images. Especially, we train a variant of the SegNet architecture on remote sensing data over an urban area and study different strategies for performing accurate semantic segmentation. Our contributions are the following: 1) we transfer efficiently a DFCNN from ...
متن کاملMulti-Path Feedback Recurrent Neural Network for Scene Parsing
In this paper, we consider the scene parsing problem. We propose a novel Multi-Path Feedback recurrent neural network (MPF-RNN) to enhance the capability of RNNs on modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse in pixel-wise classification. In contrast to CNNs without feedback and RNNs with only a single feedback path, MPFRNN p...
متن کامل