bbcontacts – Supplementary information
نویسندگان
چکیده
BetaSheet916 (Cheng and Baldi, 2005) consists of 916 protein chains with an available X-ray structure of resolution below 2.5 Å. These chains contain 31,638 β-residue contacts distributed into 4519 antiparallel β-strand contacts, 2214 parallel β-strand contacts and 1429 isolated β-bridges. BetaSheet1452 (Savojardo et al., 2013) was built from the structures deposited in the Protein Data Bank after 2004 but using a procedure similar to the BetaSheet916 building procedure. BetaSheet1452 involves 56,552 β-residue contacts distributed into 3937 antiparallel β-strand contacts, 7892 parallel β-strand contacts and 2412 isolated β-bridges. To build our training dataset, we extracted all CATH domains that did not belong to any of the fold groups identified in the test datasets in CATH v3.5. This set of 22,563 domains belonging to 864 fold groups was then filtered to reduce redundancy. For this purpose, we used the pdbfilter.pl script from the HH-suite (Remmert et al., 2011) with parameters -cov 0 -e 0.01 -id 0 (no sequence identity restriction for filtering, but the minimum Evalue between any two representative sequences is 0.01 and no minimum coverage was applied when discarding redundant sequences). Among the 1482 PDB domains in this redundancy-filtered dataset, 943 domains containing β-contacts form our training dataset (867 X-ray structures with resolution below 3.5 Å and 76 NMR structures). These 943 domains contain 19,339 β-contacts: 2511 parallel β-contacts, 16,041 antiparallel β-contacts and 787 β-bridges. Because not all chains in BetaSheet916 and BetaSheet1452 were fully annotated in CATH v3.5, there might remain some redundancy between the training dataset and the test dataset. We verified that the results for BetaSheet916 and BetaSheet1452 did not deteriorate when the dataset was restricted to the subset of each dataset containing all chains fully annotated in CATH v3.5 (and thus non-redundant with the training dataset) (see section S2.1 below and Figure S1). In Figures S21, S22, S23, S24 and S25, we also show results for the training dataset and the test dataset BetaSheet1452. Because bbcontacts relies on correlated mutations and thus predicts side-chain and not backbone contacts, the positions involved in β-bulges were adjusted to reflect the expected pattern: for a β-bulge between res1 and res2 (in one strand) and resX (in the other strand), all three side-chains must point in the same direction with respect to the plane formed by the β-sheet.
منابع مشابه
Structural bioinformatics bbcontacts: prediction of b-strand pairing from direct coupling patterns
Motivation: It has recently become possible to build reliable de novo models of proteins if a multiple sequence alignment (MSA) of at least 1000 homologous sequences can be built. Methods of global statistical network analysis can explain the observed correlations between columns in the MSA by a small set of directly coupled pairs of columns. Strong couplings are indicative of residue-residue c...
متن کاملBbcontacts: Prediction of Β-strand Pairing from Direct Coupling Patterns
MOTIVATION It has recently become possible to build reliable de novo models of proteins if a multiple sequence alignment (MSA) of at least 1000 homologous sequences can be built. Methods of global statistical network analysis can explain the observed correlations between columns in the MSA by a small set of directly coupled pairs of columns. Strong couplings are indicative of residue-residue co...
متن کاملوضعیت تغذیه کمکی در کودکان زیر یک سال مراجعهکننده به درمانگاههای ایلام
Background & Aim: Exclusive breast feeding is highly recommended for children under six months and the best time for starting supplementary food is the end of sixth months. Inadequate feeding can lead to malnutrition. Since infants;apos supplementary feeding pattern is influenced highly by the socio-cultural status it is necessary to study the subject in diverse conditions. This study aimed to ...
متن کاملDesigning a Supplementary Health Insurance Model for Iran
Designing a Supplementary Health Insurance Model for Iran Ali Vafaee Najar 1, Elaheh Hooshmand 1, * 1Social Determinates of Health Research Center, Mashhad University of Medical Sciences, Mashhad, Iran Abstract Background: Considering the importance of complementary health insurance and the necessity of designing a supplementary insurance model for the health system of the country, the purpose...
متن کاملText Clustering for Information Retrieval System Using Supplementary Information
Text clustering extends over wide range of applications from information retrieval system, pattern recognition, search engines to social networks, and other digital collections. Text data involved in such applications usually have ample of unused data associated with them. The paper focuses on handling this unused data, referred as supplementary information, to generate effective clusters. The ...
متن کامل