Improved heuristic job scheduling method to enhance throughput for big data analytics
نویسندگان
چکیده
Data-parallel computing platforms, such as Hadoop and Spark, are deployed in clusters for big data analytics. There is a general tendency that multiple users share the same cluster. The schedule of jobs becomes serious challenge. Over long period past, Shortest-Job-First (SJF) method has been considered optimal solution to minimize average job completion time. However, SJF leads low system throughput case where small number short consume large amount resources. This factor prolongs We propose an improved heuristic scheduling method, called Densest-Job-Set-First (DJSF) method. DJSF schedules by maximizing completed per unit time, aiming decrease Job Completion Time (JCT) improve throughput. perform extensive simulations based on Google cluster data. Compared with decreases JCT 23.19% enhances 42.19%. Tetris, packing improves efficiency 55.4%, so platforms complete more time span.
منابع مشابه
A Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملA genetic algorithm-based job scheduling model for big data analytics
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and...
متن کاملHadoop performance modeling and job optimization for big data analytics
Big data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is ...
متن کاملA Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling
We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics compo...
متن کاملApplication of Big Data Analytics in Power Distribution Network
Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Tsinghua Science & Technology
سال: 2022
ISSN: ['1878-7606', '1007-0214']
DOI: https://doi.org/10.26599/tst.2020.9010047