Balanced knowledge distillation for long-tailed learning

نویسندگان

چکیده

Deep models trained on long-tailed datasets exhibit unsatisfactory performance tail classes. Existing methods usually modify the classification loss to increase learning focus classes, which unexpectedly sacrifice head In fact, this scheme leads a contradiction between two goals of learning, i.e., generalizable representations and facilitating for work, we explore knowledge distillation in scenarios propose novel framework, named Balanced Knowledge Distillation (BKD), disentangle achieve both simultaneously. Specifically, given teacher model, train student model by minimizing combination an instance-balanced class-balanced loss. The former benefits from sample diversity learns representation, while latter considers class priors facilitates We conduct extensive experiments several benchmark demonstrate that proposed BKD is effective framework scenarios, as well competitive method learning. Our source code available: https://github.com/EricZsy/BalancedKnowledgeDistillation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Loss for Knowledge Distillation with Conditional Adversarial Networks

There is an increasing interest on accelerating neural networks for real-time applications. We study the studentteacher strategy, in which a small and fast student network is trained with the auxiliary information provided by a large and accurate teacher network. We use conditional adversarial networks to learn the loss function to transfer knowledge from teacher to student. The proposed method...

متن کامل

Learning Efficient Object Detection Models with Knowledge Distillation

Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much ...

متن کامل

Unsupervised Learning for Information Distillation

Current document archives are enormously large and constantly increasing and that makes it practically impossible to make use of them efficiently. To analyze and interpret large volumes of speech and text of these archives in multiple languages and produce structured information of interest to its user, information distillation techniques are used. In order to access the key information in resp...

متن کامل

Sequence-Level Knowledge Distillation

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...

متن کامل

Knowledge Distillation for Bilingual Dictionary Induction

Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neurocomputing

سال: 2023

ISSN: ['0925-2312', '1872-8286']

DOI: https://doi.org/10.1016/j.neucom.2023.01.063