Protein motifs retrieval by SS terns occurrences

نویسندگان

  • Virginio Cantoni
  • Alessio Ferone
  • Ozlem Ozbudak
  • Alfredo Petrosino
چکیده

0167-8655/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.patrec.2012.12.003 ⇑ Corresponding author. Tel.: +39 0382 985358; fax E-mail addresses: [email protected] (V uniparthenope.it (A. Ferone), [email protected] (O. uniparthenope.it (A. Petrosino). This paper describes a new approach to the analysis of protein 3D structure based on the Secondary Structure (SS) representation. The focus is here on structural motif retrieval. The strategy is derived from the Generalized Hough Transform (GHT), but considering as structural primitive element, the triplet of SSs. The triplet identity is evaluated on the triangle having the vertices on the SS midpoints, and is represented by the three midpoints distances. The motif is characterized by the complete set of triplets, so the Reference Table (RT) has a tuple for each triplet. Tuples contain, beside the discriminant component (the three edge lengths), the mapping rule, i.e. the Reference Point (RP) location referred to the triplet. In the macromolecule to be analyzed, each possible triplet is searched in the RT and every match gives a contribution to a candidate location of the RP. Presence and location of the searched motif are certified by the collection of a number of contribution equal (obviously in absence of noise and ambiguities) to the RT cardinality (i.e. the number of motif triplets). The approach is tested on twenty proteins selected randomly from the PDB, but having a different number of SSs ranging from 14 to 46. The retrieval of all possible structural blocks composed by three, four and five SSs (very compact and completely distributed) have been conducted. The results show valuable performances for precision and computation time. 2012 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The SLiMDisc server: short, linear motif discovery in proteins

Short, linear motifs (SLiMs) play a critical role in many biological processes, particularly in protein-protein interactions. Overrepresentation of convergent occurrences of motifs in proteins with a common attribute (such as similar subcellular location or a shared interaction partner) provides a feasible means to discover novel occurrences computationally. The SLiMDisc (Short, Linear Motif Di...

متن کامل

Fitting a mixture model by expectation maximization to discover motifs in biopolymers

The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein se quences by using the technique of expectation maxi mization to t a two component nite mixture model to the set of sequences Multiple motifs are found by tting a mixture model to the data probabilistically erasing the occurrences of the motif thus found and repeating the process to nd successi...

متن کامل

Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer

The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences. Multiple motifs are found by fitting a mixture model to the data, probabilistically erasing the occurrences of the motif thus found, and repeating the process to find...

متن کامل

SLiMScape 3.x: a Cytoscape 3 app for discovery of Short Linear Motifs in protein interaction networks

Short linear motifs (SLiMs) are small protein sequence patterns that mediate a large number of critical protein-protein interactions, involved in processes such as complex formation, signal transduction, localisation and stabilisation. SLiMs show rapid evolutionary dynamics and are frequently the targets of molecular mimicry by pathogens. Identifying enriched sequence patterns due to convergent...

متن کامل

Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining.

Structure motifs are amino acid packing patterns that occur frequently within a set of protein structures. We define a labeled graph representation of protein structure in which vertices correspond to amino acid residues and edges connect pairs of residues and are labeled by (1) the Euclidian distance between the C(alpha) atoms of the two residues and (2) a boolean indicating whether the two re...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2013