نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

2016
Damien Graux Louis Jachiet Pierre Genevès Nabil Layaïda

We demonstrate sparqlgx: our implementation of a distributed sparql evaluator. We show that sparqlgx makes it possible to evaluate sparql queries on billions of triples distributed across multiple nodes, while providing attractive performance figures.

Journal: :Journal of Machine Learning Research 2016
Xiangrui Meng Joseph K. Bradley Burak Yavuz Evan R. Sparks Shivaram Venkataraman Davies Liu Jeremy Freeman D. B. Tsai Manish Amde Sean Owen Doris Xin Reynold Xin Michael J. Franklin Reza Bosagh Zadeh Matei Zaharia Ameet Talwalkar

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...

2017
Stefan Hagedorn Philipp Götze Kai-Uwe Sattler

Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial d...

2016
N. Anila Sundar Vijay Krishna Menon P. N. Kumar

Cluster computing is an approach for storing and processing huge amount of data that is being generated. Hadoop and Spark are the two cluster computing platforms which are prominent today. Hadoop incorporates the MapReduce concept and is scalable as well as fault-tolerant. But the limitations of Hadoop paved way for another cluster computing framework named Spark. It is faster and can also mana...

2016
Francis Deslauriers Peter McCormick George Amvrosiadis Ashvin Goel Angela Demke Brown

Cluster computing frameworks such as Apache Hadoop and Apache Spark are commonly used to analyze large data sets. The analysis often involves running multiple, similar queries on the same data sets. This data reuse should improve query performance, but we find that these frameworks schedule query tasks independently of each other and are thus unable to exploit the data sharing across these task...

Journal: :Applied Computing and Informatics 2020

Journal: :Korean Journal of Applied Statistics 2016

Journal: :Softwaretechnik-Trends 2016
Johannes Kroß Helmut Krcmar

Stream processing systems are used to analyze big data streams with low latency. The performance in terms of response time and throughput is crucial to ensure all arriving data are processed in time. This depends on various factors such as the complexity of used algorithms and configurations of such distributed systems and applications. To ensure a desired system behavior, performance evaluatio...

2014
Emanuele Carlini Patrizio Dazzi Andrea Esposito Alessandro Lulli Laura Ricci

A significant part of the data produced every day by online services is structured as a graph. Therefore, there is the need for efficient processing and analysis solutions for large scale graphs. Among the others, the balanced graph partitioning is a well known NP-complete problem with a wide range of applications. Several solutions have been proposed so far, however most of the existing state-...

2016
Jaschar Domann Jens Meiners Lea Helmers Andreas Lommatzsch

Recommending news articles is a challenging task due to the continuous changes in the set of available news articles and the contextdependent preferences of users. Traditional recommender approaches are optimized for analyzing static data sets. In news recommendation scenarios, characterized by continuous changes, high volume of messages, and tight time constraints, alternative approaches are n...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید