HetSpark: A Framework that Provides Heterogeneous Executors to Apache Spark
نویسندگان
چکیده
منابع مشابه
Ddup - towards a deduplication framework utilising apache spark
This paper is about a new framework called DeduPlication (DduP). DduP aims to solve large scale deduplication problems on arbitrary data tuples. DduP tries to bridge the gap between big data, high performance and duplicate detection. At the moment a first prototype exists but the overall project status is work in progress. DduP utilises the promising successor of Apache Hadoop MapReduce [Had14]...
متن کاملFlare: Native Compilation for Heterogeneous Workloads in Apache Spark
The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While S...
متن کاملApproximate Stream Analytics in Apache Flink and Apache Spark Streaming
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...
متن کاملMLlib: Machine Learning in Apache Spark
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2018
ISSN: 1877-0509
DOI: 10.1016/j.procs.2018.08.244