Parallel K-Means Clustering of Remote Sensing Images Based on MapReduce
نویسندگان
چکیده
The K-Means clustering is a basic method in analyzing RS (remote sensing) images, which generates a direct overview of objects. Usually, such work can be done by some software (e.g. ENVI, ERDAS IMAGINE) in personal computers. However, for PCs, the limitation of hardware resources and the tolerance of time consuming present a bottleneck in processing a large amount of RS images. The techniques of parallel computing and distributed systems are no doubt the suitable choices. Different with traditional ways, in this paper we try to parallel this algorithm on Hadoop, an open source system that implements the MapReduce programming model. The paper firstly describes the color representation of RS images, which means pixels need to be translated into a particular color space CIELAB that is more suitable for distinguishing colors. It also gives an overview of traditional K-Means. Then the programming model MapReduce and a platform Hadoop are briefly introduced. This model requires customized ‘map/reduce’ functions, allowing users to parallel processing in two stages. In addition, the paper detail map and reduce functions by pseudo-codes, and the reports of performance based on the experiments are given. The paper shows that results are acceptable and may also inspire some other approaches of tackling similar problems within the field of remote sensing applications.
منابع مشابه
Parallel k means clustering based on mapreduce pdf
Parallel K-Means Clustering Based on. Weizhong Zhao1, 2, Huifang Ma1, 2, and Qing He1. The Key Laboratory of Intelligent Information.The K-Means clustering is a basic method in analyzing RS remote sensing images.
متن کاملMapReduce-based Parallel Learning for Large-scale Remote Sensing Im- ages
Machine learning applied to large-scale remote sensing images shows inadequacies in computational capability and storage space. To solve this problem, we propose a cloud computing-based scheme for learning remote sensing images in a parallel manner: (1) a hull vector-based hybrid parallel support vector machine model (HHB-PSVM) is proposed. It can substantially improve the efficiency of trainin...
متن کاملInternational Journal of Advanced Studies in Computer Science and Engineering Ijascse Volume 5 Issue 4, 2016
In today’s era of big data, with the introduction of high resolution systems, remote sensing imagery is one of the fastest growing fields resulting in a rapid increase in volume of data being generated day by day. To handle such massive volumes of data, a high processing speed has become an indispensable requirement. This is possible with the help of big data platforms such as Hadoop. Clusterin...
متن کاملComparing k-means clusters on parallel Persian-English corpus
This paper compares clusters of aligned Persian and English texts obtained from k-means method. Text clustering has many applications in various fields of natural language processing. So far, much English documents clustering research has been accomplished. Now this question arises, are the results of them extendable to other languages? Since the goal of document clustering is grouping of docum...
متن کاملA Hadoop-based Distributed Framework for Efficient Managing and Processing Big Remote Sensing Images
Various sensors from airborne and satellite platforms are producing large volumes of remote sensing images for mapping, environmental monitoring, disaster management, military intelligence, and others. However, it is challenging to efficiently storage, query and process such big data due to the dataand computingintensive issues. In this paper, a Hadoop-based framework is proposed to manage and ...
متن کامل