Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...

متن کامل

A comparison on scalability for batch big data processing on Apache Spark and Apache Flink

*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...

متن کامل

Approximate Stream Analytics in Apache Flink and Apache Spark Streaming

Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...

متن کامل

Large Scale Citation Matching Using Apache Hadoop

During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to hand...

متن کامل

TPM-Based Authentication Mechanism for Apache Hadoop

Hadoop is an open source distributed system for data storage and parallel computations that is widely used. It is essential to ensure the security, authenticity, and integrity of all Hadoop’s entities. The current secure implementations of Hadoop rely on Kerberos, which suffers from many security and performance issues including single point of failure, online availability requirement, and conc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Symmetry

سال: 2021

ISSN: 2073-8994

DOI: 10.3390/sym13020195