Discriminative modelling of context-specific amino acid substitution probabilities

نویسندگان

  • Christof Angermüller
  • Andreas Biegert
  • Johannes Söding
چکیده

MOTIVATION Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. RESULTS Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discriminative modeling of context-specific amino acid substitution probabilities

2 THE DISCRIMINATIVE MODEL SPACE CONTAINS THE GENERATIVE MODEL SPACE In the following we will show that the generative model with any set of parameters is equivalent to the discriminative model with an appropriately chosen set of parameters. In other words, the discriminative model with these particular parameters predicts the same context-specific substitution probabilities P (a|Ci) as the gen...

متن کامل

Ulla: a program for calculating environment-specific amino acid substitution tables

SUMMARY Amino acid residues are under various kinds of local environmental restraints, which influence substitution patterns. Ulla,(1) a program for calculating environment-specific substitution tables, reads protein sequence alignments and local environment annotations. The program produces a substitution table for every possible combination of environment features. Sparse data is handled usin...

متن کامل

A combined empirical and mechanistic codon model.

The evolutionary selection forces acting on a protein are commonly inferred using evolutionary codon models by contrasting the rate of synonymous to nonsynonymous substitutions. Most widely used models are based on theoretical assumptions and ignore the empirical observation that distinct amino acids differ in their replacement rates. In this paper, we develop a general method that allows assim...

متن کامل

Using substitution probabilities to improve position-specific scoring matrices

Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences ...

متن کامل

Substitution of soybean with canola meal in laying hens diets formulated based on total and digestible amino acids on performance and blood parameters

An experiment was conducted to study the effects of substitution soybean meal (SBM) with canola meal (CM) and formulated diets based on total and digestible amino acid on performance, egg quality, organs weight and blood parameters of laying hens from 73 to 83 weeks of age. A total of 128 laying hens were distributed by completely randomized design in a 2×2 factorial arrangement with 2 protein ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 28 24  شماره 

صفحات  -

تاریخ انتشار 2012