Integrated Feature Selection and Clustering for Taxonomic Problems within Fish Species Complexes
نویسندگان
چکیده
As computer and database technologies advance rapidly, biologists all over the world can share biologically meaningful data from images of specimens and use the data to classify the specimens taxonomically. Accurate shape analysis of a specimen from multiple views of 2D images is crucial for finding diagnostic features using geometric morphometric techniques. We propose an integrated feature selection and clustering framework that automatically identifies a set of feature variables to group specimens into a binary cluster tree. The candidate features are generated from reconstructed 3D shape and local saliency characteristics from 2D images of the specimens. A Gaussian mixture model is used to estimate the significance value of each feature and control the false discovery rate in the feature selection process so that the clustering algorithm can efficiently partition the specimen samples into clusters that may correspond to different species. The experiments on a taxonomic problem involving species of suckers in the genus Carpiodes demonstrate promising results using the proposed framework with only a small size of samples.
منابع مشابه
Optimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملOptimal Feature Selection for Data Classification and Clustering: Techniques and Guidelines
In this paper, principles and existing feature selection methods for classifying and clustering data be introduced. To that end, categorizing frameworks for finding selected subsets, namely, search-based and non-search based procedures as well as evaluation criteria and data mining tasks are discussed. In the following, a platform is developed as an intermediate step toward developing an intell...
متن کاملNUMERICAL TAXONOMIC STUDY OF THE IRANIAN SPECIES OF ALYSSUM L. BASED ON MORPHOLOGICAL CHARACTERS
The genus Alyssum L. belongs to the subtribe Alyssinae, tribe Alysseae and family Cruciferae (Brassicaceae). This genus is one of the largest genera of the family of Cruciferae in Iran, and seems to be the most problematic genus in which the boundary of certain species is not completely clear due to the polymorphism of morphological characters. The main objective of this research is to stud...
متن کاملSteel Consumption Forecasting Using Nonlinear Pattern Recognition Model Based on Self-Organizing Maps
Steel consumption is a critical factor affecting pricing decisions and a key element to achieve sustainable industrial development. Forecasting future trends of steel consumption based on analysis of nonlinear patterns using artificial intelligence (AI) techniques is the main purpose of this paper. Because there are several features affecting target variable which make the analysis of relations...
متن کاملEvolving Ensembles of Feature Subsets towards Optimal Feature Selection for Unsupervised and Semi-supervised Clustering
The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background informatio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Multimedia
دوره 3 شماره
صفحات -
تاریخ انتشار 2008