Processing large-scale data with Apache Spark
نویسندگان
چکیده
منابع مشابه
Towards Large Scale Environmental Data Processing with Apache Spark
Currently available environmental datasets are either manually constructed by professionals or automatically generated from the observations provided by sensing devices. Usually, the former are modelled and recorded with traditional general-purpose relational technologies, whereas the latter require more specific scientific array formats and tools. Declarative data processing technologies are a...
متن کاملSPARQL query processing with Apache Spark
The number and the size of linked open data graphs keep growing at a fast pace and confronts semantic RDF services with problems characterized as Big data. Distributed query processing is one of them and needs to be efficiently addressed with execution guaranteeing scalability, high availability and fault tolerance. RDF data management systems requiring these properties are rarely built from sc...
متن کاملLarge Scale Distributed Data Science from scratch using Apache Spark 2.0
Apache Spark is an open-source cluster computing framework. It has emerged as the next generation big data processing engine, overtaking Hadoop MapReduce which helped ignite the big data revolution. Spark maintains MapReduce’s linear scalability and fault tolerance, but extends it in a few important ways: it is much faster (100 times faster for certain applications), much easier to program in d...
متن کاملA comparison on scalability for batch big data processing on Apache Spark and Apache Flink
*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Korean Journal of Applied Statistics
سال: 2016
ISSN: 1225-066X
DOI: 10.5351/kjas.2016.29.6.1077