Apache Spark

نتایج جستجو برای: Apache Spark

تعداد نتایج: 18089 فیلتر نتایج به سال:

A Reference Architecture and Road map for Enabling E- commerce on Apache Spark

2015

Mohit Sewak Sachchidanand Singh

Apache Spark is an execution engine that besides working as an isolated distributed, in-memory computing engine also offers close integration with Hadoop’s distributed file system (HDFS). Apache Spark's underlying appeal is in providing a unified framework to create sophisticated applications involving workloads. It unifies multiple workloads, handles unstructured data very well and has easy-to...

متن کامل

Identifying the potential of Near Data Computing for Apache Spark

Journal: :CoRR 2017

Ahsan Javed Awan Mats Brorsson Vladimir Vlassov Eduard Ayguadé

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest is Near Data Computing (NDC) due to technological advancement in the last decade. However, it is not known if ...

متن کامل

TRANSMUT‐Spark: Transformation mutation for Apache Spark

Journal: :Software Testing, Verification & Reliability 2022

Summary This paper proposes TRANSMUT‐Spark for automating mutation testing of big data processing code within Spark programs. Apache is an engine analytics/processing that hides the inherent complexity parallel programming. Nonetheless, programmers must cleverly combine built‐in functions programs and guide to use right management strategies exploit computational resources required by avoid sub...

متن کامل

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

2017

Bilal Akil Ying Zhou Uwe Röhm

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...

متن کامل

BigData Analysis in Healthcare: Apache Hadoop , Apache spark and Apache Flink

Journal: :Frontiers in Health Informatics 2019

متن کامل

Implementing Apache Spark jobs execution and Apache Spark cluster creation for Openstack Sahara

Journal: :Proceedings of the Institute for System Programming of RAS 2015

متن کامل

Getting Started with Apache Spark

2015

James A. Scott

متن کامل

Bioinformatics applications on Apache Spark

Journal: :GigaScience 2018

متن کامل

Change Detection of Mobile Lidar Data Using Cloud Computing

2016

Kun Liu Jan Boehm Christian Alis

Change detection has long been a challenging problem although a lot of research has been conducted in different fields such as remote sensing and photogrammetry, computer vision, and robotics. In this paper, we blend voxel grid and Apache Spark together to propose an efficient method to address the problem in the context of big data. Voxel grid is a regular geometry representation consisting of...

متن کامل

PRoST: Distributed Execution of SPARQL Queries Using Mixed Partitioning Strategies

2018

Matteo Cossu Michael Färber Georg Lausen

The rapidly growing size of RDF graphs in recent years necessitates distributed storage and parallel processing strategies. To obtain efficient query processing using computer clusters a wide variety of different approaches have been proposed. Related to the approach presented in the current paper are systems built on top of Hadoop HDFS, for example using Apache Accumulo or using Apache Spark. ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید