Enabling Sparse Winograd Convolution by Native Pruning
نویسندگان
چکیده
Sparse methods and the use of Winograd convolutions are two orthogonal approaches, each of which significantly accelerates convolution computations in modern CNNs. Sparse Winograd merges these two and thus has the potential to offer a combined performance benefit. Nevertheless, training convolution layers so that the resulting Winograd kernels are sparse has not hitherto been very successful. By introducing a Winograd layer in place of a standard convolution layer, we can learn and prune Winograd coefficients “natively” and obtain sparsity level beyond 90% with only 0.1% accuracy loss with AlexNet on ImageNet dataset. Furthermore, we present a sparse Winograd convolution algorithm and implementation that exploits the sparsity, achieving up to 31.7 effective TFLOP/s in 32-bit precision on a latest Intel Xeon CPU, which corresponds to a 5.4× speedup over a state-of-the-art dense convolution implementation.
منابع مشابه
Efficient Sparse-Winograd Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are compute intensive which limits their application on mobile devices. Their energy is dominated by the number of multiplies needed to perform the convolutions. Winograd’s minimal filtering algorithm (Lavin (2015)) and network pruning (Han et al. (2015)) reduce the operation count. Unfortunately, these two methods cannot be combined — because applying the W...
متن کاملPruning of Winograd and FFT Based Convolution Algorithm
Winogradand FFT-based convolution are two efficient convolution algorithms targeting high-performance inference. Their efficiency comes from the reduction of the number of multiplication operations due to linear and Fourier transforms. However, the two existing approaches cannot handle efficient compression of the neural network, which might contribute significant improvement in computation and...
متن کاملEscort: Efficient Sparse Convolutional Neural Networks on GPUs
Deep neural networks have achieved remarkable accuracy in many artificial intelligence applications, e.g. computer vision, at the cost of a large number of parameters and high computational complexity. Weight pruning can compress DNN models by removing redundant parameters in the networks, but it brings sparsity in the weight matrix, and therefore makes the computation inefficient on GPUs. Alth...
متن کاملPruning Filters for Efficient ConvNets
Convolutional Neural Networks (CNNs) are extensively used in image and video recognition, natural language processing and other machine learning applications. The success of CNNs in these areas corresponds with a significant increase in the number of parameters and computation costs. Recent approaches towards reducing these overheads involve pruning and compressing the weights of various layers...
متن کاملCompact Deep Convolutional Neural Networks With Coarse Pruning
The learning capability of a neural network improves with increasing depth at higher computational costs. Wider layers with dense kernel connectivity patterns furhter increase this cost and may hinder real-time inference. We propose feature map and kernel level pruning for reducing the computational complexity of a deep convolutional neural network. Pruning feature maps reduces the width of a l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1702.08597 شماره
صفحات -
تاریخ انتشار 2017