CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation
نویسندگان
چکیده
In 3D action recognition, there exists rich complementary information between skeleton modalities. Nevertheless, how to model and utilize this remains a challenging problem for self-supervised representation learning. work, we formulate the cross-modal interaction as bidirectional knowledge distillation problem. Different from classic solutions that transfer of fixed pre-trained teacher student, in is continuously updated bidirectionally distilled To end, propose new Cross-modal Mutual Distillation (CMD) framework with following designs. On one hand, neighboring similarity distribution introduced learned each modality, where relational naturally suitable contrastive frameworks. other asymmetrical configurations are used student stabilize process high-confidence By derivation, find positive mining previous works can be regarded degenerated version our CMD. We perform extensive experiments on NTU RGB+D 60, 120, PKU-MMD II datasets. Our approach outperforms existing methods sets series records. The code available at: https://github.com/maoyunyao/CMD .
منابع مشابه
Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
Thanks to the success of deep learning, cross-modal retrieval has made significant progress recently. However, there still remains a crucial bottleneck: how to bridge the modality gap to further enhance the retrieval accuracy. In this paper, we propose a self-supervised adversarial hashing (SSAH) approach, which lies among the early attempts to incorporate adversarial learning into cross-modal ...
متن کاملAdaptively Unified Semi-supervised Learning for Cross-Modal Retrieval
Motivated by the fact that both relevancy of class labels and unlabeled data can help to strengthen multi-modal correlation, this paper proposes a novel method for cross-modal retrieval. To make each sample moving to the direction of its relevant label while far away from that of its irrelevant ones, a novel dragging technique is fused into a unified linear regression model. By this way, not on...
متن کاملa semi-supervised human action learning
exploiting multimodal information like acceleration and heart rate is a promising method to achieve human action recognition. a semi-supervised action recognition approach aucc (action understanding with combinational classifier) using the diversity of base classifiers to create a high-quality ensemble for multimodal human action recognition is proposed in this paper. furthermore, both labeled ...
متن کاملData Distillation: Towards Omni-Supervised Learning
We investigate omni-supervised learning, a special regime of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Omni-supervised learning is lowerbounded by performance on existing labeled datasets, offering the potential to surpass state-of-the-art fully supervised methods. To exploit the omni-supervised setting, we p...
متن کاملCross-modal Common Representation Learning by Hybrid Transfer Network
DNN-based cross-modal retrieval is a research hotspot to retrieve across different modalities as image and text, but existing methods often face the challenge of insufficient cross-modal training data. In single-modal scenario, similar problem is usually relieved by transferring knowledge from largescale auxiliary datasets (as ImageNet). Knowledge from such single-modal datasets is also very us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-20062-5_42