apache spark

نتایج جستجو برای: apache spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

SPARQLGX in Action: Efficient Distributed Evaluation of SPARQL with Apache Spark

2016

Damien Graux Louis Jachiet Pierre Genevès Nabil Layaïda

We demonstrate sparqlgx: our implementation of a distributed sparql evaluator. We show that sparqlgx makes it possible to evaluate sparql queries on billions of triples distributed across multiple nodes, while providing attractive performance figures.

متن کامل

MLlib: Machine Learning in Apache Spark

Journal: :Journal of Machine Learning Research 2016

Xiangrui Meng Joseph K. Bradley Burak Yavuz Evan R. Sparks Shivaram Venkataraman Davies Liu Jeremy Freeman D. B. Tsai Manish Amde Sean Owen Doris Xin Reynold Xin Michael J. Franklin Reza Bosagh Zadeh Matei Zaharia Ameet Talwalkar

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...

متن کامل

Big Spatial Data Processing Frameworks: Feature and Performance Evaluation

2017

Stefan Hagedorn Philipp Götze Kai-Uwe Sattler

Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial d...

متن کامل

Cluster Computing Paradigms– A Comparative study of Evolving Frameworks

2016

N. Anila Sundar Vijay Krishna Menon P. N. Kumar

Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also mana...

متن کامل

Quartet: Harmonizing Task Scheduling and Caching for Cluster Computing

2016

Francis Deslauriers Peter McCormick George Amvrosiadis Ashvin Goel Angela Demke Brown

Cluster computing frameworks such as Apache Hadoop and Apache Spark are commonly used to analyze large data sets. The analysis often involves running multiple, similar queries on the same data sets. This data reuse should improve query performance, but we find that these frameworks schedule query tasks independently of each other and are thus unable to exploit the data sharing across these task...

متن کامل

DNA short read alignment on apache spark

Journal: :Applied Computing and Informatics 2020

متن کامل

Processing large-scale data with Apache Spark

Journal: :Korean Journal of Applied Statistics 2016

متن کامل

Modeling and Simulating Apache Spark Streaming Applications

Journal: :Softwaretechnik-Trends 2016

Johannes Kroß Helmut Krcmar

Stream processing systems are used to analyze big data streams with low latency. The performance in terms of response time and throughput is crucial to ensure all arriving data are processed in time. This depends on various factors such as the complexity of used algorithms and configurations of such distributed systems and applications. To ensure a desired system behavior, performance evaluatio...

متن کامل

Balanced Graph Partitioning with Apache Spark

2014

Emanuele Carlini Patrizio Dazzi Andrea Esposito Alessandro Lulli Laura Ricci

A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-...

متن کامل

Real-time News Recommendations using Apache Spark

2016

Jaschar Domann Jens Meiners Lea Helmers Andreas Lommatzsch

Recommending news articles is a challenging task due to the continuous changes in the set of available news articles and the contextdependent preferences of users. Traditional recommender approaches are optimized for analyzing static data sets. In news recommendation scenarios, characterized by continuous changes, high volume of messages, and tight time constraints, alternative approaches are n...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید