apache spark

نتایج جستجو برای: apache spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

CMS Analysis and Data Reduction with Apache Spark

Journal: :CoRR 2017

Oliver Gutsche Luca Canali Illia Cremer Matteo Cremonesi Peter Elmer Ian Fisk Maria Girone Bo Jayatilaka Jim Kowalkowski Viktor Khristenko Evangelos Motesnitsalis Jim Pivarski Saba Sehrish Kacper Surdy Alexey Svyatkovskiy

Experimental Particle Physics has been at the forefront of analyzing the world’s largest datasets for decades. The HEP community was among the first to develop suitable software and computing tools for this task. In recent times, new toolkits and systems for distributed data processing, collectively called ”Big Data” technologies have emerged from industry and open source projects to support th...

متن کامل

Static and Dynamic Big Data Partitioning on Apache Spark

2015

Massimiliano Bertolucci Emanuele Carlini Patrizio Dazzi Alessandro Lulli Laura Ricci

Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this ...

متن کامل

Good parallel software development practices. Apache Spark case

2014

Myroslava Stavnycha

Recently, Spark as data processing engine, gained huge popularity because of better performance in terms of the speed. Developers of Spark claim that it may outperform Hadoop MapReduce in 100 times in memory and 10 times on disk [1]. This paper outlines which innovations improved speed and how. In order to investigate improvements, I analysed technical documentation, which is available, since b...

متن کامل

Scaling Evolutionary Programming with the Use of Apache Spark

Journal: :Computer Science (AGH) 2016

Wlodzimierz Funika Pawel Koperek

Organizations across the globe gather more and more data, encouraged by easyto-use and cheap cloud storage services. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, use of this method for large-scale dataset ana...

متن کامل

Efficient iterative virtual screening with Apache Spark and conformal prediction

2018

Laeeq Ahmed Valentin Georgiev Marco Capuccini Salman Zubair Toor Wesley Schaal Erwin Laure Ola Spjuth

BACKGROUND Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION In this study we propose a strategy that is b...

متن کامل

A Review: Mapreduce and Spark for Big Data Analytics

2016

Meenakshi Sharma Vaishali Chauhan Keshav Kishore

In this paper we discuss the various challenges of Big Data and problem arises due to continuous explosion of data resulting from the likes of social media and other online sources to gain access to deeper analysis of their data. This paper discusses two of the comparison of Hadoop Map Reduce and the recently introduced Apache Spark – both of which provide a processing model for analyzing big d...

متن کامل

A Framework for Scalable Correlation of Spatio-temporal Event Data

2015

Stefan Hagedorn Kai-Uwe Sattler Michael Gertz

Spatio-temporal event data do not only arise from sensor readings, but also in information retrieval and text analysis. However, such events extracted from a text corpus may be imprecise in both dimensions. In this paper we focus on the task of event correlation, i.e., finding events that are similar in terms of space and time. We present a framework for Apache Spark that provides correlation o...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید

CMS Analysis and Data Reduction with Apache Spark

Static and Dynamic Big Data Partitioning on Apache Spark

Good parallel software development practices. Apache Spark case

Scaling Evolutionary Programming with the Use of Apache Spark

Efficient iterative virtual screening with Apache Spark and conformal prediction

A Review: Mapreduce and Spark for Big Data Analytics

A Framework for Scalable Correlation of Spatio-temporal Event Data

Using Apache Spark on genome assembly for scalable overlap-graph reduction

SSQUSAR : A Large-Scale Qualitative Spatial Reasoner Using Apache Spark SQL

Parallel particle swarm optimization classification algorithm variant implemented with Apache Spark