Optimizing DNN Compilation for Distributed Training With Joint OP and Tensor Fusion
نویسندگان
چکیده
This article proposes DisCo , an automatic deep learning compilation module for data-parallel distributed training. Unlike most compilers that focus on training or inference a single device, optimizes DNN model over multiple GPU machines. Existing single-device strategies do not work well in training, due mainly to communication inefficiency they incur. generates optimized, joint computation operator and tensor fusion enable highly efficient A GNN-based simulator is built effectively estimate per-iteration time achieved by operator/tensor candidates. backtracking search algorithm driven the simulator, navigating efficiently large strategy space identify good minimize time. We compare with existing DL schemes show it achieves speed-up close ideal, full computation-communication overlap case.
منابع مشابه
Optimizing Dendritic Cell Preparation for Fusion with Melanoma Cells
Background: Fusion of dendritic cells (DCs) with melanoma cells could reinforce the antigenicity of tumors as a strategy for the treatment of malignant melanoma. However, the insufficient quantity of DCs and the low fusion efficiency limits the development of such approach. Objective: To define the dosage of the stimulating factors as well as the induction condition for the optimal DCs prepara...
متن کاملDNN-Train: Benchmarking and Analyzing DNN Training
We aim to build a new benchmark pool for deep neural network training and to analyze how eicient existing frameworks are in performing this training. We will provide our methodology and develop proper proiling tools to perform this analysis.
متن کاملScalable distributed DNN training using commodity GPU cloud computing
We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD) training of Deep Neural Networks (DNN). The method solves the well-known communication bottleneck problem that arises for data-parallel SGD because compute nodes frequently need to synchronize a replica of the model. We solve it by purposefully controlling the rate of weight-update per individual weight, whic...
متن کاملGmm-free Dnn Training
While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture models (GMMs) for alignments both for supervised training and for context dependent (CD) tree building. Here we explore bootstrapping DNN AM training without GMM AMs and show that CD trees can be built with DNN alignments which are better ma...
متن کاملOptimizing fusion architectures for limited training data sets
A method is described to improve the performance of sensor fusion algorithms. Data sets available for training fusion algorithms are often smaller than desired, since the sensor suite used for data acquisition is always limited by the slowest, least reliable sensor. In addition, the fusion process expands the dimension of the data, which increases the requirement for training data. By using str...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2022
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2022.3201531