A combined priority scheduling method for distributed machine learning
نویسندگان
چکیده
Abstract Algorithms and frameworks for distributed machine learning have been widely used in numerous artificial intelligence engineering applications. A cloud platform provides a large number of resources at lower cost is more convenient method such With the rapid development containerization, native combinations based on Docker Kubernetes provided effective resource support learning. However, does not provide efficient priority or fair scheduling strategies computationally intensive time-consuming jobs, which easily leads to deadlock, waste, low job execution efficiency. Therefore, utilize order between multiple jobs as well dependencies tasks same job, considering intra- inter-group priorities, combined proposed Volcano. Considering user priority, task longest wait time, parallelism, affinity non-affinity parameter server worker nodes, model inter- intra-job proposed, mapped into strategy intra-group priorities pods, enabling training The experiment results show that achieves preferential allocation urgent, high high-priority with users improves anti-affinity settings among pods reduce time information interaction nodes certain extent, thereby improving completion This group alleviates problems deadlock waste caused by insufficient computing.
منابع مشابه
A Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملOn Model Parallelization and Scheduling Strategies for Distributed Machine Learning
Distributed machine learning has typically been approached from a data parallel perspective, where big data are partitioned to multiple workers and an algorithm is executed concurrently over different data subsets under various synchronization schemes to ensure speed-up and/or correctness. A sibling problem that has received relatively less attention is how to ensure efficient and correct model...
متن کاملMachine scheduling for multitask machining
Multitasking is an important part of today’s manufacturing plants. Multitask machine tools are capable of processing multiple operations at the same time by applying a different set of part and tool holding devices. Mill-turns are multitask machines with the ability to perform a variety of operations with considerable accuracy and agility. One critical factor in simultaneous machining is to cre...
متن کاملOnline Job Scheduling in Distributed Machine Learning Clusters
Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared model parameters. In a shared cluster handling multiple tr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Eurasip Journal on Wireless Communications and Networking
سال: 2023
ISSN: ['1687-1499', '1687-1472']
DOI: https://doi.org/10.1186/s13638-023-02253-4