Performance Evaluation of Stream Log Collection Using HADOOP Distributed File System
نویسنده
چکیده
Recently stream logging has been referred to widely by web based and product based companies. Stream logging is one of the most important topic of agenda in business re-engineering. Business re-engineering is done in order to improve the effectiveness and productiveness of a particular product or service. Stream logging is achieved with minimum cost using transaction based model over a distributed environment such as HADOOP distributed file system. HADOOP is a distributed file system that helps to improve performance, scalability and reliability. Here a single master and multiple slave model is employed over HADOOP. The proposed model is based on analytics performed by Google for web pages. Here we present a macroscopic analysis of workload characterized by popularity and arrival process. Though numerous transaction models such as Valor, Ameno have been proposed, this model helps to achieve better utilization and execution time over reduced constrained resource. Keywords— thread scheduling, multi-core, kernel, BST-tree, block scheduling
منابع مشابه
TidyFS: A Simple and Small Distributed File System
In recent years, there has been an explosion of interest in computing using clusters of commodity, shared nothing computers. In this paper, we describe the design of TidyFS, a simple and small distributed file system that provides the abstractions necessary for data parallel computations on clusters. Similar to other large-scale distributed file systems such as the Google File System (GFS) and ...
متن کاملAn Efficient Design and Implementation of an MdbULPS in a Cloud-Computing Environment
Flexibly expanding the storage capacity required to process a large amount of rapidly increasing unstructured log data is difficult in a conventional computing environment. In addition, implementing a log processing system providing features that categorize and analyze unstructured log data is extremely difficult. To overcome such limitations, we propose and design a MongoDB-based unstructured ...
متن کاملHadoop Scalability and Performance Testing in Heterogeneous Clusters
This paper aims to evaluate cluster configurations using Hadoop in order to check parallelization performance and scalability in information retrieval. This evaluation will establish the necessary capabilities that should be taken into account specifically on a Distributed File System (HDFS: Hadoop Distributed File System), from the perspective of storage and indexing techniques, and queriy dis...
متن کاملDistributed Metadata Management Scheme in HDFS
A Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably and to stream those data sets at high bandwidth to user applications. Metadata management is critical to distributed file system. In HDFS architecture, a single master server manages all metadata, while a number of data servers store file data. This architecture can’t meet the exponentially increased stor...
متن کاملTowards Efficient Design and Implementation of a Hadoop-based Distributed Video Transcoding System in Cloud Computing Environment
In this paper, we propose a Hadoop-based Distributed Video Transcoding System in a cloud computing environment that transcodes various video codec formats into the MPEG-4 video format. This system provides various types of video content to heterogeneous devices such as smart phones, personal computers, television, and pads. We design and implement the system using the MapReduce framework, which...
متن کامل