An efficient incremental protein sequence clustering algorithm - TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region

نویسنده

  • K. Subramanian
چکیده

Clustering is the division of data into groups of similar objects. The main objective of this unsupervised learning technique is to find a natural grouping or meaningful partition by using a distance or similarity function. Clustering is mainly used for dimensionality reduction, prototype selectionlabstractions for pattern classification, data reorganization and indexing and for detecting outliers and noisy patterns. Clustering techniques are applied in pattern classification schemes, hioinformatics, data mining, web mining, biometrics, document processing, remote sensed data analysis, biomedical data analysis, etc., in which the data size is very large. In this paper, an efficient incremental clustering algorithm 'Leaders-Subleaders' an extension of leader algorithm, suitable for protein sequences of hioinformatics is proposed for effective clustering and prototype selection for pattern classification. It is another simple and efficient technique to generate a hierarchical structure for finding the suhgroupslsuhclusters within each cluster which may be used to find the superfamily, family and subfamily relationships of protein sequences. The experimental results (classification accuracy using the prototypes obtained and the computation time) of the proposed algorithm are compared with that of leader based and nearest neighbour classifier (NNC) methods. It is found to be computationally efficient when compared to NNC. Classification accuracy obtained using the representatives generated by the Leaders-Subleaders method is found to be better, than that of using leaders as representatives and it 'approaches to that of NNC if sequential search is used on the sequences from the selected subcluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Abnormal activity detection in video sequences using learnt probability densities - TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region

Absfvact-Video surveillance is concemed with identifying 2. FEATURE XTRACTION AND PROCESSING abnormal or unusual activity at a scene. In this paper, we develop stochastic models to characterize the normal activities in a scene. Given video sequences of normal activity, probabilistic models are leamt to describe the normal motion in the scene. For any new video sequences motion trajectories are ...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

A Comparative Appraisal of Roadway Accident for Asia-Pacific Countries

This paper describes an attempt to shed some light on road safety in Asia Pacific region by characterizing and assessing its road accidents. The relevant national road accident data were extracted from centralized data sources of international agencies. Due to data incompleteness and missing values, 21 Asia Pacific countries, presenting more than half of the world’s population, were selected fo...

متن کامل

An L1-norm method for generating all of efficient solutions of multi-objective integer linear programming problem

This paper extends the proposed method by Jahanshahloo et al. (2004) (a method for generating all the efficient solutions of a 0–1 multi-objective linear programming problem, Asia-Pacific Journal of Operational Research). This paper considers the recession direction for a multi-objective integer linear programming (MOILP) problem and presents necessary and sufficient conditions to have unbounde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004