نتایج جستجو برای: apache spark

تعداد نتایج: 18089  

Journal: :Scalable Computing: Practice and Experience 2016
Lukas Forer Enis Afgan Hansi Weißensteiner Davor Davidovic Günther Specht Florian Kronenberg Sebastian Schönherr

For many years Apache Hadoop has been used as a synonym for processing data in the MapReduce fashion. However, due to the complexity of developing MapReduce applications, adoption of this paradigm in genetics has been limited. To alleviate some of the issues, we have previously developed Cloudflow a high-level pipeline framework that allows users to create sophisticated biomedical pipelines usi...

Journal: :JCS 2016
Adai Shomanov Madina Mansurova

Corresponding Author: Shomanov Aday Department of Computer Science, al-Farabi Kazakh National University, Almaty, Kazakhstan Email: [email protected] Abstract: Parallel computations are essential tool in solving large-scale computationally demanding problems. Due to large diversity and heterogeneity of the currently available parallel processing techniques and paradigms it is usually diff...

2015
Zaid Al-Ars Hamid Mushtaq

This paper analyzes the scalability potential of embarrassingly parallel genomics applications using the Apache Spark big data framework and compares their performance with native implementations as well as with Apache Hadoop scalability. The paper uses the BWA DNA mapping algorithm as an example due to its good scalability characteristics and due to the large data files it uses as input. Resul...

2015
Jia Yu Jinxuan Wu Mohamed Sarwat

This paper introduces GeoSpark an in-memory cluster computing framework for processing large-scale spatial data. GeoSpark consists of three layers: Apache Spark Layer, Spatial RDD Layer and Spatial Query Processing Layer. Apache Spark Layer provides basic Spark functionalities that include loading / storing data to disk as well as regular RDD operations. Spatial RDD Layer consists of three nove...

Journal: :CoRR 2017
Do Le Quoc Ruichuan Chen Pramod Bhatotia Christof Fetzer Volker Hilt Thorsten Strufe

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...

Journal: :CoRR 2016
Shelan Perera Ashansa Perera Kamal Hakimzadeh

Big data processing is a hot topic in today’s computer science world. There is a significant demand for analysing big data to satisfy many requirements of many industries. Emergence of the Kappa architecture created a strong requirement for a highly capable and efficient data processing engine. Therefore data processing engines such as Apache Flink and Apache Spark emerged in open source world ...

Journal: :DEStech Transactions on Engineering and Technology Research 2018

پایان نامه :دانشگاه الزهراء علیها السلام 1393

با توجه به رشد سریع داده ها در سال های اخیر به تکنیکی برای مدیریت این داده ها نیاز داریم. بنابراین شرکت های مختلف چارچوب هایی را برای این منظور پشنهاد داده اند. چارچوب های mapreduceو apache spark از این دست هستند. این چارجوب ها پیچیدگی های برنامه نویسی موازی همانند توزیع داده ها و زمانبندی را رفع می کنند. در این میان پرس و جو این حجم از داده ها نیز اهمیت بسیاری دارد. بنابراین در این پژوهش روشی ...

2014
Bo Liu

Maritime traffic patterns extraction is an essential part for maritime security and surveillance and DBSCANSD is a density based clustering algorithm extracting the arbitrary shapes of the normal lanes from AIS data. This paper presents a parallel DBSCANSD algorithm on top of Apache Spark. The project is an experimental research work and the results shown in this paper is preliminary. The exper...

Journal: :International Journal of Computer Applications 2017

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید