Evolving Regular Expression-Based Sequence Classifiers for Protein Nuclear Localisation
نویسندگان
چکیده
A number of bioinformatics tools use regular expression (RE) matching to locate protein or DNA sequence motifs that have been discovered by researchers in the laboratory. For example, patterns representing nuclear localisation signals (NLSs) are used to predict nuclear localisation. NLSs are not yet well understood, and so the set of currently known NLSs may be incomplete. Here we use genetic programming (GP) to generate RE-based classifiers for nuclear localisation. While the approach is a supervised one (with respect to protein location), it is unsupervised with respect to alreadyknown NLSs. It therefore has the potential to discover new NLS motifs. We apply both treebased and linear GP to the problem. The inclusion of predicted secondary structure in the input does not improve performance. Benchmarking shows that our majority classifiers are competitive with existing tools. The evolved REs are usually “NLS-like” and work is underway to analyse these for novelty.
منابع مشابه
The Effect of 8-Weeks of Low-Intensity Swimming Training on Promyelocytic Leukemia Zinc Finger Protein and Spermatid Transition Nuclear Protein Gene Expression in Azoospermic Rats Model
Aims: One of the causes of infertility in men is the azoospermia disease, which is attributed to the lack of sperm in each sperm. The primary function of spermatogenesis is the maintenance, proliferation, and differentiation of spermatogonial cells. Thus, the present study aimed to investigate the changes in Promyelocytic Leukemia Zinc Finger (PLZF) and spermatid Transition Nuclear Protein (TNP...
متن کاملEvolving Classifiers for Protein Nuclear Localisation using Genetic Programming
Being able to predict the location of a protein in the cell is one of the steps toward knowing its role and activity. With that information one could also conclude possible effects on the organism carrying those proteins. The number of putative, unclassified proteins is constantly growing due to the continuous genome sequencing projects. Hence, the need for fast and cheap methods to classify pr...
متن کاملNucPred - Predicting nuclear localization of proteins
UNLABELLED NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression...
متن کاملCloning, Expression, Purification and Immunoreactivity Analysis of Gag Derived Protein p17 from HIV-1 CRF35 in Fusion with Thioredoxin from Human Subjects
So far, recombinant antigens of HIV-1, the etiologic cause of Acquired Immunodeficiency Syndrome (AIDS), have been widely used for the diagnosis and vaccine development. P17 or the matrix protein formed by the proteolytic cleavage of gag is strongly antigenic and is as conserved and immunogenic as p24. In some cases, antibodies to p17 are more prevalent than antibodies to p24 and the decline in...
متن کاملEvolving Protein Motifs Using a Stochastic Regular Language with Codon-Level Probabilities
Experiments involving the evolution of protein motifs using genetic programming are presented. The motifs use a stochastic regular expression language that uses codon-level probabilities within conserved sets (masks). Experiments compared basic genetic programming with Lamarckian evolution, as well as the use of “natural” probability distributions for masks obtained from the sequence database. ...
متن کامل