Stochastic Motif Extraction Using Hidden Markov Model
نویسندگان
چکیده
In this paper, we study the application of an HMM (hidden Markov model) to the problem of representing protein sequences by a stochastic motif. A stochastic protein motif represents the small segments of protein sequences that have a certain function or structure. The stochastic motif, represented by an HMM, has conditional probabilities to deal with the stochastic nature of the motif. This HMM directly reflects the characteristics of the motif, such as a protein periodical structure or grouping. In order to obtain the optimal HMM, we developed the "ilerative duplication method" for HMM topology learning. It starts from a small fully-connected network and iterates the network generation and parameter optimization until it achieves sufficient discrimination accuracy. Using this method, we obtained an HMM for a leucine zipper motif. Compared to the accuracy of a symbolic pattern representation with accuracy of 14.8 percent, an HMM achieved 79.3 percent in prediction. Additionally, the method can obtain an HMM for various types of zinc finger motifs, and it might separate the mixed data. We demonstrated that this approach is applicable to the validation of the protein database; a constructed HMM has indicated that one protein sequence annotated as "leucine-zipper like sequence" in the database is quite different from other leucine-zipper sequences in terms of likelihood, and we found this discrimination is plausible.
منابع مشابه
Motif Extraction: Normalization of Scores
This paper examines a method to normalize a score of a stochastic motif, represented by a hidden Markov model (HMM). The accuracy of the Z score method, which is one of the score normalization method, is compared with that of the whole search method.
متن کاملParallel Characteristic Extraction from Protein Sequence Database
An adaptive massively parallel system for flexible information processing has been investigated. This research requires a feedback from the real application. In this paper, a parallel characteristic extraction from the protein sequence database is described. Since the protein sequence database is huge and sequences have variety, an adaptive massively parallel system is mandatory. An HMM (hidden...
متن کاملمدل سازی فضایی-زمانی وقوع و مقدار بارش زمستانه در گستره ایران با استفاده از مدل مارکف پنهان
Multi site modeling of rainfall is one of the most important issues in environmental sciences especially in watershed management. For this purpose, different statistical models have been developed which involve spatial approaches in simulation and modeling of daily rainfall values. The hidden Markov is one of the multi-site daily rainfall models which in addition to simulation of daily rainfall...
متن کاملJavanese Character Recognition Using Hidden Markov Model
Hidden Markov Model (HMM) is a stochastic method which has been used in various signal processing and character recognition. This study proposes to use HMM to recognize Javanese characters from a number of different handwritings, whereby HMM is used to optimize the number of state and feature extraction. An 85.7 % accuracy is obtained as the best result in 16-stated vertical model using pure HM...
متن کاملHidden Markov Models for Information Extraction
As compared to many other techniques used in natural language processing, hidden markov models (HMMs) are an extremely flexible tool and has been successfully applied to a wide variety of stochastic modeling tasks. This paper uses a machine learning approach to examine the effectiveness of HMMs on extracting information of varying levels of structure. A stochastic optimization procedure is used...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 2 شماره
صفحات -
تاریخ انتشار 1994