Characterizing Intermediate Conformations in Protein Conformational Space
نویسندگان
چکیده
In this paper we present a novel parallel coordinate based clustering method using Gaussian mixture distribution models to characterize the conformational space of proteins. An algorithm is proposed to detect highly populated regions which may correspond to interesting intermediate states that are difficult to detect experimentally. The data is represented as feature vectors of N dimensions, which are lower-dimension projections of the protein conformations. First, we cluster each dimension separately, using model-based clustering with Gaussian mixture models for density estimation. The resulting partitions are used to model relations between pairs of dimensions. Finally, disjoint multi-dimensional clusters of conformations as well as groups of conformations that are unlikely to exist as significant intermediates are identified, based on the progressive analysis of how data flows between those pairs. The idea has its roots in parallel coordinates, which is a visualization technique that lays out coordinate axes in parallel rather than orthogonal to each other, thereby allowing patterns between pairs of axis as well as outliers to be visually identified in multi-dimensional data. We believe that the size of the resulting clusters may provide information about the likelihood of the corresponding conformations to exist as important intermediates. We tested our method on the conformational space for the enzyme AdK which undergoes large scale conformational changes and used our method to detect clusters which may correspond to experimentally known intermediates. Finally, we compare our clusters with the ones generated by the K-Means clustering algorithm and discuss the advantages of our method for the problem of characterizing proteins conformational space.
منابع مشابه
Flexible backbone sampling methods to model and design protein alternative conformations.
Sampling alternative conformations is key to understanding how proteins work and engineering them for new functions. However, accurately characterizing and modeling protein conformational ensembles remain experimentally and computationally challenging. These challenges must be met before protein conformational heterogeneity can be exploited in protein engineering and design. Here, as a stepping...
متن کاملAb Initio Study of Conformational and Configurational Properties of 1, 3- Diazacyclohepta-1, 2-diene and 1, 3-Diazacycloocta-1, 2-diene
Ab initio calculations at HF/6-31G* level of theory for geometry optimization and MP2/6-31G*//HF/6-31G* for a single point total energy calculation are reported for the importantenergy-minimum conformations and transition-state geometries of 1, 3-diazacyclohepta-1, 2-diene (2) and 1, 3-diazacycloocta-1, 2-diene (3). The C2 symmetric twist-chair (2-TC)conformation of 2 is calculated to be 7.4 kJ...
متن کاملEnhancing Sampling of the Conformational Space Near the Protein Native State
A protein molecule assumes specific conformations under native conditions to fit and interact with other molecules. Due to the role that three-dimensional structure plays in protein function, significant efforts are devoted to elucidating native conformations. Many search algorithms are proposed to navigate the high-dimensional protein conformational space and its underlying energy surface in s...
متن کاملAn Ab-initio tree-based exploration to enhance sampling of low-energy protein conformations
This paper proposes a robotics-inspired method to enhance sampling of native-like protein conformations when employing only amino-acid sequence. Computing such conformations, essential to associate structural and functional information with gene sequences, is challenging due to the high-dimensionality and the rugged energy surface of the protein conformational space. The contribution of this wo...
متن کاملGuiding the Search for Native-like Protein Conformations with an Ab-initio Tree-based Exploration
In this paper we propose a robotics-inspired method to enhance sampling of native-like conformations when employing only aminoacid sequence information for a protein at hand. Computing such conformations, essential to associating structural and functional information with gene sequences, is challenging due to the highdimensionality and the rugged energy surface of the protein conformational spa...
متن کامل