Harmonized Dense Knowledge Distillation Training for Multi-Exit Architectures
نویسندگان
چکیده
Multi-exit architectures, in which a sequence of intermediate classifiers are introduced at different depths the feature layers, perform adaptive computation by early exiting ``easy" samples to speed up inference. In this paper, novel Harmonized Dense Knowledge Distillation (HDKD) training method for multi-exit architecture is designed encourage each exit flexibly learn from all its later exits. particular, general dense knowledge distillation objective proposed incorporate possible beneficial supervision information learning, where harmonized weighting scheme multi-objective optimization problem consisting classification loss and loss. A bilevel algorithm alternatively updating weights multiple objectives network parameters. Specifically, parameters optimized with respect performance on validation set gradient descent. Experiments CIFAR100 ImageNet show that HDKD strategy harmoniously improves state-of-the-art neural networks. Moreover, does not require within modifications can be effectively combined other previously-proposed techniques further boosts performance.
منابع مشابه
Solving Dense Generalized Eigenproblems on Multi-threaded Architectures
We compare two approaches to compute a fraction of the spectrum of dense symmetric definite generalized eigenproblems: one is based on the reduction to tridiagonal form, and the other on the Krylov-subspace iteration. Two large-scale applications, arising in molecular dynamics and material science, are employed to investigate the contributions of the application, architecture, and parallelism o...
متن کاملDeep Learning in Multi-Layer Architectures of Dense Nuclei
In dense clusters of neurons in nuclei, cells may interconnect via soma-to-soma interactions, in addition to conventional synaptic connections. We illustrate this idea with a multi-layer architecture (MLA) composed of multiple clusters of recurrent sub-networks of spiking Random Neural Networks (RNN) with dense soma-to-soma interactions. We use this RNN-MLA architecture for deep learning. The i...
متن کاملSequence-Level Knowledge Distillation
Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large. In this paper we consider applying knowledge distillation approaches (Bucila et al., 2006; Hinton et al., 2015) that have proven successful for reducing the size of neura...
متن کاملHamleDT: Harmonized multi-language dependency treebank
We present HamleDT – a HArmonized Multi-LanguagE Dependency Treebank. HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. In the present article, we provide a thorough investigation and discussion of a number of phenomena that are comparable across languages, though their ann...
متن کاملKnowledge Distillation for Bilingual Dictionary Induction
Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2021
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v35i11.17225