Binarized Neural Network With Parameterized Weight Clipping and Quantization Gap Minimization for Online Knowledge Distillation

نویسندگان

چکیده

As the applications for artificial intelligence are growing rapidly, numerous network compression algorithms have been developed to restrict computing resources such as smartphones, edge, and IoT devices. Knowledge distillation (KD) leverages soft labels derived from a teacher model less parameterized achieving high accuracy with reduced computational burden. Moreover, online KD provides parallel through collaborative learning between student networks, thus enhancing training speed. A binarized neural (BNN) offers an intriguing opportunity facilitate aggressive at expense of drastically degraded accuracy. In this study, two performance improvements proposed when BNN is applied network: 1) weight clipping (PWC) reduce dead weights in 2) quantization gap-aware adaptive temperature scheduling networks. contrast constant (CWC), PWC demonstrates 3.78% top-1 test enhancement trainable by decreasing gradient mismatch CIFAR-10 dataset. Furthermore, increases 0.08% over temperature. By aggregating both methodologies, dataset was 94.60%, that Tiny-ImageNet comparable 32-bit full-precision network.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Comparison of Binarized Neural Network with Convolutional Neural Network

Deep learning is a trending topic widely studied by researchers due to increase in the abundance of data and getting meaningful results with them. Convolutional Neural Networks (CNN) is one of the most popular architectures used in deep learning. Binarized Neural Network (BNN) is also a neural network which consists of binary weights and activations. Neural Networks has large number of paramete...

متن کامل

Distillation Column Identification Using Artificial Neural Network

  Abstract: In this paper, Artificial Neural Network (ANN) was used for modeling the nonlinear structure of a debutanizer column in a refinery gas process plant. The actual input-output data of the system were measured in order to be used for system identification based on root mean square error (RMSE) minimization approach. It was shown that the designed recurrent neural network is able to pr...

متن کامل

Large Scale Distributed Neural Network Training through Online Distillation

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased testtime cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforwar...

متن کامل

Large Scale Distributed Neural Network Training through Online Distillation

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased testtime cost (for ensembles) and increased complexity of the training pipeline (for distillation), these techniques are challenging to use in industrial settings. In this paper we explore a variant of distillation which is relatively straightforwar...

متن کامل

FP-BNN: Binarized neural network on FPGA

Deep neural networks (DNNs) have attracted significant attention for their excellent accuracy especially in areas such as computer vision and artificial intelligence. To enhance their performance, technologies for their hardware acceleration are being studied. FPGA technology is a promising choice for hardware acceleration, given its low power consumption and high flexibility which makes it sui...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3238715