Dropout distillation
نویسندگان
چکیده
Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called “standard dropout” is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined “dropout distillation”, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.
منابع مشابه
Active Bias: Training a More Accurate Neural Network by Emphasizing High Variance Samples
Self-paced learning and hard example mining re-weight training instances to improve learning accuracy. This paper presents two improved alternatives based on lightweight estimates of sample uncertainty in stochastic gradient descent (SGD): the variance in predicted probability of the correct class across iterations of minibatch SGD, and the proximity of the correct class probability to the deci...
متن کاملAnalysis of Deep Neural Networks with Extended Data Jacobian Matrix
Deep neural networks have achieved great success on a variety of machine learning tasks. There are many fundamental and open questions yet to be answered, however. We introduce the Extended Data Jacobian Matrix (EDJM) as an architecture-independent tool to analyze neural networks at the manifold of interest. The spectrum of the EDJM is found to be highly correlated with the complexity of the le...
متن کاملA Non-Random Dropout Model for Analyzing Longitudinal Skew-Normal Response
In this paper, multivariate skew-normal distribution is em- ployed for analyzing an outcome based dropout model for repeated mea- surements with non-random dropout in skew regression data sets. A probit regression is considered as the conditional probability of an ob- servation to be missing given outcomes. A simulation study of using the proposed methodology and comparing it with a semi-parame...
متن کاملA Comparative Review of Selection Models in Longitudinal Continuous Response Data with Dropout
Missing values occur in studies of various disciplines such as social sciences, medicine, and economics. The missing mechanism in these studies should be investigated more carefully. In this article, some models, proposed in the literature on longitudinal data with dropout are reviewed and compared. In an applied example it is shown that the selection model of Hausman and Wise (1979, Econometri...
متن کاملBuilding Robust Deep Neural Networks for Road Sign Detection
Deep Neural Networks are built to generalize outside of training set in mind by using techniques such as regularization, early stopping and dropout. But considerations to make them more resilient to adversarial examples are rarely taken. As deep neural networks become more prevalent in mission critical and real time systems, miscreants start to attack them by intentionally making deep neural ne...
متن کامل