N-Dimensional Mapping of Amino Acid Substitution Matrices
نویسندگان
چکیده
A procedure to map score matrices in n-dimensional spaces is presented. Score or substitutions matrices are used as similarity-like measure between amino acid in protein alignment procedures. The first stage of heuristic local alignments procedures as FASTA and BLAST uses local matching of very short sequences, also named k-tuples. By using L1 metric this matching task can be computed very fast. This procedure can be implemented by using SIMD instructions which are present in most of low cost microprocessors included in personal workstations and server. To design this procedure a table that maps the scores matrices as PAM y BLOSUM are needed. This table defines a representation of each amino acid residue in a n-dimensional space of lower dimensionality as possible; this is accomplished by using techniques of MDS as used in Pattern Recognition and Machine Learning. Previously, a distance function must be defined from the score matrix. To map the distance function a variation of the Sammon non-lineal dimensionality reduction procedure is used with a genetic algorithm that minimizes a goal function. To fit the SIMD constraints, both the dimension k of tuples and the space dimensionality n must verify: k×n = 8×m. The table results for the BLOSUM62 with 1,2 and 4-dimensionality and graphical representations of the solution map are included. These last show that the biochemical amino acid groups are well mapped as data cluster; also the strong hydrophobic residues have a highlight spatial property, because there are linearly separable in 2-dimensional mapping.
منابع مشابه
Amino acid substitution matrices for protein conformation identification
Methods for alignment of protein sequences typically measure similarity by using substitution matrix with scores for all possible exchanges of one amino acid with another. Although widely used, the matrices derived from homologous sequence segments, such as Dayhoff’s PAM matrices and Henikoff’s BLOSUM matrices, are not specific for protein conformation identification. Using a different approach...
متن کاملSubstitution of soybean with canola meal in laying hens diets formulated based on total and digestible amino acids on performance and blood parameters
An experiment was conducted to study the effects of substitution soybean meal (SBM) with canola meal (CM) and formulated diets based on total and digestible amino acid on performance, egg quality, organs weight and blood parameters of laying hens from 73 to 83 weeks of age. A total of 128 laying hens were distributed by completely randomized design in a 2×2 factorial arrangement with 2 protein ...
متن کاملPosition Dependent and Independent Evolutionary Models Based on Empirical Amino Acid Substitution Matrices
Evolutionary models measure the probability of amino acid substitutions occurring over different evolutionary distances. We examine various evolutionary models based on empirically derived amino acid substitution matrices. The models are constructed using the PAM and BLOSUM amino acid substitution matrices. We rescale these matrices by raising them to powers to model substitution patterns that ...
متن کاملAmino Acid Substitution Matrices Estimated by Maximum Likelihood
The present work describes protrates, a program that estimates amino acid substitution matrices and among-site substitution rates based on their likelihood for a given tree topology and a dataset of aligned proteins. The issue of producing maximum likelihood (ML) rate matrices over protein data have been adressed under the framework of general-purpose unbiased substitution matrices [1, 9], sinc...
متن کاملAmino acid substitution matrices.
The BLOSUM (BLOck SUbstitution Matrices) matrices were derived by Steven and Jorja Henikoff in 1992 1. They were based on a much larger data set than the PAM matrices, and used conserved local alignments or “blocks,” rather than global alignments of very closely related sequences. In order to account for different degrees of sequence divergence, the Henikoffs used clustering rather than an expl...
متن کامل