Multimodal contrastive learning for unsupervised video representation learning

نویسندگان

چکیده

In this paper, we propose a multimodal unsupervised video learning algorithm designed to incorporate information from any number of modalities present in the data. We cooperatively train network corresponding each modality: at stage training, one these networks is selected be trained using output other networks. To verify our algorithm, model RGB, optical flow, and audio. then evaluate effectiveness by performing action classification nearest neighbor retrieval on supervised dataset. compare triple modality contrastive models or two modalities, find that all three tandem provides 1.5% improvement UCF101 accuracy, 1.4% R@1 recall, 3.5% R@5 2.4% R@10 recall as compared only RGB demonstrating merit utilizing many possible cooperative model.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

Unsupervised Model-Free Representation Learning

Numerous control and learning problems face the situation where sequences of high-dimensional highly dependent data are available, but no or little feedback is provided to the learner. To address this issue, we formulate the following problem. Given a series of observations X0, . . . , Xn coming from a large (high-dimensional) space X , find a representation function f mapping X to a finite spa...

متن کامل

Unsupervised Learning Layers for Video Analysis

This paper presents two unsupervised learning layers (UL layers) for label-free video analysis: one for fully connected layers, and the other for convolutional ones. The proposed UL layers can play two roles: they can be the cost function layer for providing global training signal; meanwhile they can be added to any regular neural network layers for providing local training signals and combined...

متن کامل

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation wh...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IS&T International Symposium on Electronic Imaging Science and Technology

سال: 2023

ISSN: ['2470-1173']

DOI: https://doi.org/10.2352/ei.2023.35.14.coimg-173