Protein motifs retrieval by SS terns occurrences
نویسندگان
چکیده
0167-8655/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.patrec.2012.12.003 ⇑ Corresponding author. Tel.: +39 0382 985358; fax E-mail addresses: [email protected] (V uniparthenope.it (A. Ferone), [email protected] (O. uniparthenope.it (A. Petrosino). This paper describes a new approach to the analysis of protein 3D structure based on the Secondary Structure (SS) representation. The focus is here on structural motif retrieval. The strategy is derived from the Generalized Hough Transform (GHT), but considering as structural primitive element, the triplet of SSs. The triplet identity is evaluated on the triangle having the vertices on the SS midpoints, and is represented by the three midpoints distances. The motif is characterized by the complete set of triplets, so the Reference Table (RT) has a tuple for each triplet. Tuples contain, beside the discriminant component (the three edge lengths), the mapping rule, i.e. the Reference Point (RP) location referred to the triplet. In the macromolecule to be analyzed, each possible triplet is searched in the RT and every match gives a contribution to a candidate location of the RP. Presence and location of the searched motif are certified by the collection of a number of contribution equal (obviously in absence of noise and ambiguities) to the RT cardinality (i.e. the number of motif triplets). The approach is tested on twenty proteins selected randomly from the PDB, but having a different number of SSs ranging from 14 to 46. The retrieval of all possible structural blocks composed by three, four and five SSs (very compact and completely distributed) have been conducted. The results show valuable performances for precision and computation time. 2012 Elsevier B.V. All rights reserved.
منابع مشابه
The SLiMDisc server: short, linear motif discovery in proteins
Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. Overrepresentation of convergent occurrences of motifs in proteins with a common attribute (such as similar subcellular location or a shared interaction partner) provides a feasible means to discover novel occurrences computationally. The SLiMDisc (Short, Linear Motif Di...
متن کاملFitting a mixture model by expectation maximization to discover motifs in biopolymers
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein se quences by using the technique of expectation maxi mization to t a two component nite mixture model to the set of sequences Multiple motifs are found by tting a mixture model to the data probabilistically erasing the occurrences of the motif thus found and repeating the process to nd successi...
متن کاملFitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer
The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find...
متن کاملSLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks
Short linear motifs (SLiMs) are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent...
متن کاملDistance-based identification of structure motifs in proteins using constrained frequent subgraph mining.
Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition Letters
دوره 34 شماره
صفحات -
تاریخ انتشار 2013