CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
نویسندگان
چکیده
Current RGB-D scene recognition approaches often train two standalone backbones for RGB and depth modalities with the same Places or ImageNet pre-training. However, pre-trained network is still biased by RGB-based models which may result in a suboptimal solution. In this paper, we present single-model self-supervised hybrid pre-training framework modalities, termed as CoMAE. Our CoMAE presents curriculum learning strategy to unify popular representation algorithms: contrastive masked image modeling. Specifically, first build patch-level alignment task pre-train single encoder shared via cross-modal learning. Then, passed multi-modal autoencoder capture finer context features from generative perspective. addition, our design without requirement of fusion module very flexible robust generalize unimodal scenario both training testing phases. Extensive experiments on SUN NYUDv2 datasets demonstrate effectiveness experiment results reveal that data-efficient learner. Although only use small-scale unlabeled set pre-training, are competitive state-of-the-art methods extra large-scale supervised dataset Code will be released at https://github.com/MCG-NJU/CoMAE.
منابع مشابه
A Pre-Trained Ensemble Model for Breast Cancer Grade Detection Based on Small Datasets
Background and Purpose: Nowadays, breast cancer is reported as one of the most common cancers amongst women. Early detection of the cancer type is essential to aid in informing subsequent treatments. The newest proposed breast cancer detectors are based on deep learning. Most of these works focus on large-datasets and are not developed for small datasets. Although the large datasets might lead ...
متن کاملRGB-D-based action recognition datasets: A survey
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation ag...
متن کاملThe role of RGB-D benchmark datasets: an overview
The advent of the Microsoft Kinect three years ago stimulated not only the computer vision community for new algorithms and setups to address well-known problems in the community but also sparked the launch of several new benchmark datasets to which future algorithms can be compared to. This review of the literature and industry developments concludes that the current RGB-D benchmark datasets c...
متن کاملA Portable Immersive Surgery Training System Using RGB-D Sensors
Surgical training plays an important role in assisting residents to develop critical skills. Providing effective surgical training, however, remains as a challenging task. Existing videotaped training instructions can only show imagery from a fixed viewpoint that lacks both depth perception and interactivity. We present a new portable immersive surgical training system that is capable of acquir...
متن کاملTraining-Based Spectral Reconstruction from a Single RGB Image
This paper focuses on a training-based method to reconstruct a scene’s spectral reflectance from a single RGB image captured by a camera with known spectral response. In particular, we explore a new strategy to use training images to model the mapping between cameraspecific RGB values and scene reflectance spectra. Our method is based on a radial basis function network that leverages RGB white-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i3.25419