Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation
نویسندگان
چکیده
As the applications for artificial intelligence are growing rapidly, numerous network compression algorithms have been developed to restrict computing resources such as smartphones, edge, and IoT devices. Knowledge distillation (KD) leverages soft labels derived from a teacher model less parameterized achieving high accuracy with reduced computational burden. Moreover, online KD provides parallel through collaborative learning between student networks, thus enhancing training speed. A binarized neural (BNN) offers an intriguing opportunity facilitate aggressive at expense of drastically degraded accuracy. In this study, two performance improvements proposed when BNN is applied network: 1) weight clipping (PWC) reduce dead weights in 2) quantization gap-aware adaptive temperature scheduling networks. contrast constant (CWC), PWC demonstrates 3.78% top-1 test enhancement trainable by decreasing gradient mismatch CIFAR-10 dataset. Furthermore, increases 0.08% over temperature. By aggregating both methodologies, dataset was 94.60%, that Tiny-ImageNet comparable 32-bit full-precision network.
منابع مشابه
Performance Comparison of Binarized Neural Network with Convolutional Neural Network
Deep learning is a trending topic widely studied by researchers due to increase in the abundance of data and getting meaningful results with them. Convolutional Neural Networks (CNN) is one of the most popular architectures used in deep learning. Binarized Neural Network (BNN) is also a neural network which consists of binary weights and activations. Neural Networks has large number of paramete...
متن کاملDistillation Column Identification Using Artificial Neural Network
 Abstract: In this paper, Artificial Neural Network (ANN) was used for modeling the nonlinear structure of a debutanizer column in a refinery gas process plant. The actual input-output data of the system were measured in order to be used for system identification based on root mean square error (RMSE) minimization approach. It was shown that the designed recurrent neural network is able to pr...
متن کاملLarge Scale Distributed Neural Network Training through Online Distillation
Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased testtime cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforwar...
متن کاملLarge Scale Distributed Neural Network Training through Online Distillation
Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased testtime cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforwar...
متن کاملFP-BNN: Binarized neural network on FPGA
Deep neural networks (DNNs) have attracted significant attention for their excellent accuracy especially in areas such as computer vision and artificial intelligence. To enhance their performance, technologies for their hardware acceleration are being studied. FPGA technology is a promising choice for hardware acceleration, given its low power consumption and high flexibility which makes it sui...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3238715