A genetic algorithm (GA) based automated classifier for remote sensing imagery
نویسنده
چکیده
Conventional unsupervised classification divides all pixels within an image into corresponding classes based on the distance between pixels and the cluster centres. The number of classes must be selected a priori but is seldom ascertainable with little information. To analyze a large dataset, such as a remote sensing dataset, requires an automatic unsupervised classifier which needs no human effort during the process of image clustering. A genetic algorithm (GA) is adopted to search the cluster centres and choose a suitable cluster number for digital images to overcome the disadvantages of the conventional unsupervised classifier. The GA-based automated classifier was executed on several test images for validity and SPOT satellite imagery for practical application. The satellite images classified by the GA-based classifier and iterative self-organizing data analysis technique (ISODATA) were compared with a classified result through a supervised classification. According to the estimation of classification accuracy by error matrices and K statistic, the GA-based classifier performed better than the unsupervised ISODATA and as good as a supervised classifier, even without manipulation by an analyst. A modified GA-based classifier using maximum likelihood (represented by the z score) as a clustering criterion was also proposed and proven to be capable of performing automatically as well as a supervised classifier. Résumé. La classification non dirigée conventionnelle divise tous les pixels à l’intérieur de l’image en classes correspondantes sur la base de la distance entre les pixels et les centres des regroupements. Le nombre de classes doit être sélectionné a priori, mais ce nombre est difficile à évaluer lorsque l’on dispose de peu d’information. Pour analyser un gros ensemble de données comme c’est le cas en télédétection, il est nécessaire d’avoir un classifieur automatique non dirigé qui ne requiert aucune intervention humaine durant le processus d’analyse des regroupements de l’image. L’algorithme génétique (AG) est adopté pour rechercher les centres des regroupements ainsi qu’un nombre satisfaisant de regroupements pour que les images numériques puissent s’affranchir des inconvénients du classifieur non dirigé conventionnel. Le classifieur automatisé basé sur l’AG a été utilisé sur plusieurs images tests pour la validation et sur des images de SPOT pour une application plus pratique. Les images satellitaires classifiées au moyen du classifieur AG et d’ISODATA (« iterative self organizing data analysis technique ») ont été comparées avec un résultat de classification par le biais d’une classification dirigée. L’estimation de la précision de classification utilisant les matrices d’erreur et les statistiques K a montré que le classifieur basé sur l’AG affiche une meilleure performance que l’ISODATA non dirigé et une aussi bonne performance que le classifieur dirigé même sans manipulation par l’analyste. Un classifieur modifié basé sur l’AG utilisant le maximum de vraisemblance (représenté par la note z) comme critère de regroupement a aussi été proposé et a montré sa capacité d’agir également de façon automatique comme classifieur dirigé. [Traduit par la Rédaction] Yang 213 Introduction Image classification, including supervised and unsupervised classification, is a major analytical procedure in digital image processing (Lillesand and Kiefer, 2000). Supervised classification procedures require the analyst to provide training areas, which are groups of pixels with known identities, to assemble groups of similar pixels into a proper class (Avery and Berlin, 1992). In comparison, unsupervised classification divides all pixels within an image into corresponding classes pixel by pixel and proceeds with fewer interactions with the analyst. Unsupervised clustering techniques are broadly used for exploratory data analysis. Unsupervised classification on remote sensing imagery can be defined as the identification of natural groups within multidimensional data and is an essential step in automatic pattern recognition. A typical unsupervised classification requires a specific number of classes based on the analyst’s knowledge of the scene. However, the analyst seldom has sufficient information to decide on a suitable cluster number. In many cases, the given cluster number results in an improper classification, and new runs have to be performed from scratch or several clusters with greater similarity have to be merged based on the experience of the analyst. Recently, clustering techniques have been applied to vast digital datasets, such as (i) medical images for diagnosing tumors as benign or malignant in mammographs (Guliato et al., 2003a; 2003b), segmenting bone and soft tissue in radiographs (Pakin et al., 2003), and discriminating myocardial heart disease from echocardiographs (Tsai et al., 2004); and (ii) remote sensing images for land use analysis (Miller et al., 1995; Mohanty and Majumdar, 1996; Bandyopadhyay and © 2007 CASI 203 Can. J. Remote Sensing, Vol. 33, No. 3, pp. 203–213, 2007 Received 22 September 2005. Accepted 19 April 2007. Published on the Canadian Journal of Remote Sensing Web site at http://pubs.nrc-cnrc.gc.ca/cjrs on 19 July 2007. M.-D. Yang. Department of Civil Engineering, National Chung Hsing University, 250 Kuo-Kuang Road, Taichung, Taiwan, Republic of China (e-mail: [email protected]). Maulik, 2002; Maulik and Bandyopadhyay, 2003), agriculture monitoring (Rydberg and Borgefors, 2001; Murthy et al., 2003), and natural hazard investigation and management (Ostir et al., 2003; van der Sande et al., 2003; Yang et al., 2004; 2007). For civil and environmental engineers, clustering techniques for practical applications are expected to detect the earth terrain on remote sensing images automatically. Spectral properties of specific informational classes of remote sensing imagery change temporally, so the relationships between informational classes and spectral classes are not always constant, and relationships defined for one image cannot be extended to others. In addition, the analyst has very limited knowledge about the menu of classes and their specific identities in most cases. With an unknown cluster number a priori, the computational process and clustering accuracy of unsupervised classification remain to be improved. The aim of this research is to develop a repeatable, accurate, and time-effective method to classify remote sensing imagery automatically. A genetic algorithm (GA) based classifier was established for solving a multidimensional unsupervised classification problem to result in a best partition without prior knowledge of the clustering number. The GA classifier was encoded and tested on two artificial datasets with known cluster numbers and cluster centres and a real image with an unknown cluster number and cluster centres. The GA classifier was then applied to a satellite image to identify a landslide area in central Taiwan.
منابع مشابه
تعیین ماشینهای بردار پشتیبان بهینه در طبقهبندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک
Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...
متن کاملIntelligent and Robust Genetic Algorithm Based Classifier
The concepts of robust classification and intelligently controlling the search process of genetic algorithm (GA) are introduced and integrated with a conventional genetic classifier for development of a new version of it, which is called Intelligent and Robust GA-classifier (IRGA-classifier). It can efficiently approximate the decision hyperplanes in the feature space. It is shown experime...
متن کاملUsing an Imperialistic Competitive Algorithm in Global Polynomials Optimization (Case Study: 2D Geometric Correction of IKONOS and SPOT Imagery)
The number of high resolution space imageries in photogrammetry and remote sensing society is growing fast. Although these images provide rich data, the lack of sensor calibration information and ephemeris data does not allow the users to apply precise physical models to establish the functional relationship between image space and object space. As an alternative solution, some generalized mode...
متن کاملA Random Forest Classifier based on Genetic Algorithm for Cardiovascular Diseases Diagnosis (RESEARCH NOTE)
Machine learning-based classification techniques provide support for the decision making process in the field of healthcare, especially in disease diagnosis, prognosis and screening. Healthcare datasets are voluminous in nature and their high dimensionality problem comprises in terms of slower learning rate and higher computational cost. Feature selection is expected to deal with the high dimen...
متن کاملSuper-resolution mapping of wetland inundation from remote sensing
Mapping the spatio-temporal characteristics of wetland 19 inundation has important significance to the study of wetland environment and 20 associated flora and fauna. High temporal remote sensing imagery is widely 21 used for this purpose with the limitations of relatively low spatial resolutions. In 22 this study, a novel method based on integration of back-propagation neural 23 network (BP) a...
متن کامل