نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

Journal: :CoRR 2017
Oliver Gutsche Luca Canali Illia Cremer Matteo Cremonesi Peter Elmer Ian Fisk Maria Girone Bo Jayatilaka Jim Kowalkowski Viktor Khristenko Evangelos Motesnitsalis Jim Pivarski Saba Sehrish Kacper Surdy Alexey Svyatkovskiy

Experimental Particle Physics has been at the forefront of analyzing the world’s largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called ”Big Data” technologies have emerged from industry and open source projects to support th...

2015
Massimiliano Bertolucci Emanuele Carlini Patrizio Dazzi Alessandro Lulli Laura Ricci

Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this ...

2014
Myroslava Stavnycha

Recently, Spark as data processing engine, gained huge popularity because of better performance in terms of the speed. Developers of Spark claim that it may outperform Hadoop MapReduce in 100 times in memory and 10 times on disk [1]. This paper outlines which innovations improved speed and how. In order to investigate improvements, I analysed technical documentation, which is available, since b...

Journal: :Computer Science (AGH) 2016
Wlodzimierz Funika Pawel Koperek

Organizations across the globe gather more and more data, encouraged by easyto-use and cheap cloud storage services. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, use of this method for large-scale dataset ana...

2018
Laeeq Ahmed Valentin Georgiev Marco Capuccini Salman Zubair Toor Wesley Schaal Erwin Laure Ola Spjuth

BACKGROUND Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION In this study we propose a strategy that is b...

2016
Meenakshi Sharma Vaishali Chauhan Keshav Kishore

In this paper we discuss the various challenges of Big Data and problem arises due to continuous explosion of data resulting from the likes of social media and other online sources to gain access to deeper analysis of their data. This paper discusses two of the comparison of Hadoop Map Reduce and the recently introduced Apache Spark – both of which provide a processing model for analyzing big d...

2015
Stefan Hagedorn Kai-Uwe Sattler Michael Gertz

Spatio-temporal event data do not only arise from sensor readings, but also in information retrieval and text analysis. However, such events extracted from a text corpus may be imprecise in both dimensions. In this paper we focus on the task of event correlation, i.e., finding events that are similar in terms of space and time. We present a framework for Apache Spark that provides correlation o...

Journal: :KIPS Transactions on Software and Data Engineering 2017

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید