Development of Computational Methods for Predicting Structural Characteristics of Helical Membrane Proteins
نویسندگان
چکیده
The transmembrane (TM) domains of most membrane proteins consist of helix bundles. The seemingly simple task of TM helix bundle assembly has turned out to be extremely difficult. This is true even for simple TM helix bundle proteins, i.e., those that have the simple form of compact TM helix bundles. Here, we present a computational method that is capable of generating native-like structural models for simple TM helix bundle proteins having modest numbers of TM helices based on sequence conservation patterns. Thus, the only requirement for our method is the presence of more than 30 homologous sequences for an accurate extraction of sequence conservation patterns. The prediction method first computes a number of representative well-packed conformations for each pair of contacting TM helices, and then a library of tertiary folds is generated by overlaying overlapping TM helices of the representative conformations. This library is scored using sequence conservation patterns, and a subsequent clustering analysis yields 5 final models. Assuming that neighboring TM helices in the sequence contact each other (but not that TM helices A and G contact each other), the method produced structural models of CA RMSD of 3 ~ 5 Å from corresponding crystal structures for bacteriorhodopsin, halorhodopsin, sensory rhodopsin II, and rhodopsin. In blind predictions, this type of contact knowledge is not available. Mimicking this, predictions were made for the rotor of the V-type Na-ATPase without such knowledge. The CA RMSD between the best model and its crystal structure is only 3.4 Å, and its contact accuracy reaches 55%. Furthermore, the model correctly identifies the binding pocket for sodium ion. These results demonstrate that the method can be readily applied to ab initio structure prediction of simple TM helix bundle proteins having modest numbers of TM helices. Introduction Over the past decade, steady progress has been noted in the structure prediction of soluble proteins. Especially in the new fold category, a couple of methods such as Rosetta and TASSER have shown admirable performance. In contrast, little has been achieved in the structure prediction of membrane proteins. As pointed out by White and von Heijne, two fundamental problems need to be addressed for the development of reliable structure prediction methods: the mechanisms of the biological assembly of membrane proteins and the thermodynamic principles of their structural stability in the lipid bilayer. Even though great strides have been made regarding the two problems over the past years, our understanding is not yet sufficient. In fact, as discussed by White, the situation seems to get more complicated than expected, in light of the complex structures recently presented for the ClC chloride channel and the KvAP voltage-gated potassium channel. The end-to-end arrangement of helices within the hydrophobic core of the membrane seen in the aquaporin family also compounds our understanding about membrane protein folding. Nevertheless, for simple membrane proteins, i.e., those that have the simple form of compact transmembrane (TM) helix bundles, structural modeling can, in principle, be divided into two steps according to the well-established two-stage model: determination of the portions of the primary sequence that traverse the membrane and assembly of these TM helices. Since TM boundaries can be accurately predicted in many cases, the structural modeling boils down to assembly of TM helices. This has been considered an easier problem compared to the structure prediction of soluble proteins. Yet, years of work have attested that this is still too difficult a problem, and successful structural modeling has been mostly confined to homooligomeric complexes, where symmetry constraints can be easily imposed on to simplify the conformational search problem. The current study tackles structural modeling of simple TM helix bundle proteins (being “simple” as defined above) that consist of modest numbers of TM helices. So far, a number of studies have been reported about structural modeling of polytopic membrane proteins. In 1997, Baldwin and his coworkers presented a method for assembling the TM helices of the rhodopsin family of G-protein-coupled receptors (GPCRs) based on helix parameters extracted from cryo-electron microscopy (cryo-EM) maps and sequence conservation patterns. The study demonstrated that sequence conservation patterns can be a powerful tool for structural modeling. Yet, the presence of EM maps was a prerequisite for the presented methodology, which is really a heavy requirement given that EM maps are as difficult to get as crystals for the determination of atomic-scale structural models. Subsequently, Goddard and his coworkers developed a computational method of predicting the structures of GPCRs. However, their method also assumed the presence of EM maps, since it is based on helix parameters extracted from EM maps. A computational method of different nature has also been proposed, where a set of distance constraints is utilized for the generation of a small number of feasible TM helix bundles. This method successfully predicted the structure of bovine rhodopsin using a set of 27 distance constraints. Even though it is relatively easier to get this type of experimental constraints than EM maps, it is still expected to be quite laborious to obtain that many experimental constraints routinely. The method presented in this study does not make any heavy assumptions of this sort. Since it is based on sequence conservation patterns, the only requirement is the presence of more than ~ 30 homologous sequences. Given rapid increases in the size of sequence databases, we regard this requirement as quite light. Methods Overview of the prediction protocol The fundamental assumption of our prediction protocol is that, even though complex tertiary interactions among non-neighboring TM helices are expected to play an important role in the determination of overall structures, we could split, to a large extent, modeling of TM helix bundles into modeling of TM helix pairs and subsequent assembly to TM helix bundles. This assumption might not hold for complex polytopic membrane proteins, yet it is expected to be a reasonably good approximation for simple TM helix bundle proteins, the focus of the current study. Based on the pairwise separation scheme, we first compute representative well-packed conformations for each pair of contacting TM helices. Then, a library of tertiary folds is generated by overlaying overlapping TM helices of the representative conformations. This library is scored using sequence conservation patterns. As is usually done in the protein structure prediction field, a clustering analysis of the top-scoring folds and subsequent rigidbody refinements produce 8 candidate models. This whole process is repeated 50 times. The generated candidate models are then pooled, and the same clustering analysis yields 5 final models. This overall flow of the prediction protocol is depicted in Fig. 1. Test proteins As outlined above, we restricted our attention to simple TM helix bundle proteins as represented by bacteriorhodopsin. We were reluctant to test the prediction protocol against structure fragments, for example, the Nor Cterminal domains of lactose permease, because they are not “clean” domains as understood in soluble proteins. Since we score the library of tertiary folds using sequence conservation patterns, membrane proteins with small numbers of homologous sequences in sequence databases were not suitable, either. With these criteria in mind, we found 5 suitable targets from the list of membrane proteins with known structure summarized by White (http://blanco.biomol.uci.edu): bacteriorhodopsin (bR), halorhodopsin (hR), sensory rhodopsin II (sR), rhodopsin, and the rotor of the V-type Na-ATPase (NtpK). The sequence identities of the three bacterial rhodopsins are ~ 30%. Yet it is to be noted that a couple of recent studies considered them to be independent targets. Since the prediction protocol generates a structural model for the TM domains of the test proteins, TM boundaries need to be defined before structural modeling. The current study focuses on the second stage of the two-stage model: assembly of independently stable TM helices to TM helix bundles. Thus we simply took TM boundary information from the PDBTM database. Once defined, individual TM helices were constructed as ideal right-handed helices with backbone dihedral angles of = -57° and = -47°. Random perturbations in the TM boundaries taken from the PDB-TM database within a variation of ±2 residues did not affect prediction results significantly (data not shown). Systematic conformational search of the TM helix bundles As stated above, the way we travel through the conformational space of a given TM helix bundle is by overlaying overlapping helices of the representative conformations. Thus, one first needs to compute representative conformations. This is carried out as follows: For each pair of contacting TM helices, 3888 conformations are to be explored in a systematic way (see below). These are scored by our newly developed scoring function. Then, the 1000 lowestenergy conformations are clustered into a few groups, and the centroid conformations for the groups become the representative well-packed conformations. It is an open issue how many lowest-energy conformations are to be clustered into how many groups. Empirically, we chose to cluster 1000 lowest-energy conformations into 13 groups. With regards to the number of representative conformations assigned to each pair of contacting TM helices, we investigated the possibilities from as large as 40 to as small as 10. Going below 11 gave significantly poorer results for some test proteins. Going up beyond 15 slightly improved the results, yielding a model of C alpha atom root-mean-square deviation (CA RMSD) of 2.5 Å for some cases. Yet, the total computational time increases rapidly with the number of representative conformations. For example, when using 13 conformations for each pair of contacting TM helices of the 7 TM helix bundle protein, the computational cost for generating a library of tertiary folds is 13, taking ~ 10 minutes on a 2.8 GHz processor. However, it takes ~ 150 hours on the same processor when using 40 conformations for each pair (40). It is desirable to use sufficient numbers of representative conformations to guarantee an acceptable quality of results, as long as affordable on a typical workstation. Numbers between 11 and 15 seem to be a good choice in this regard for TM helix bundles consisting of 7 TM helices. For TM helix bundles with 4 TM helices, the numbers between 31 and 35 seem to be a good choice. For reasons of limited space, we only present results with 15 representative conformations for bR, hR, sR, and rhodopsin (all have 7 TM helices). For NtpK (having 4 TM helices), the results with 35 representative conformations are reported. For a systematic and unbiased scanning of the conformational space of a pair of contacting TM helices, we first randomly rotated the two TM helices. Then, four of the six variables describing their relative orientations were manipulated as follows (Fig. 2). Describing the helix-helix distance, , was set to 9.0 Å, which is a typical value observed for contacting TM helices. To allow the two helices to contact each other at different positions, δ was varied in steps of 5.0 Å in the range of –5.0 Å ~ 5.0 Å for both helices. Describing the two rotational angles about the helix axes, and were varied in steps of 20°. , describing the tilting angle, was allowed among -24°, -12°, 12°, or 24°. In total, 3888 (3·18·18·4) conformations were explored. We could have performed a denser scanning of the conformational space, yet we observed that the current degree of scanning is dense enough for simple TM helix bundle proteins. Furthermore, a rather coarse scanning is desirable given that the next step is a clustering calculation. As before, we optimize the side chain conformations of each structure explored using SCWRL and compute interaction energies with a cutoff at 9 Å. Those conformations were removed during the scanning that harbor steric clashes (1.5 Å cutoff for the inter-atomic distance between heavy atoms of the SCWRL-optimized structure). Our earlier report defined the interaction centers only for 11 amino acids. The expanded list of the interaction centers for all 20 amino acids is summarized in Table 1. Clustering was performed using the average linkage clustering algorithm. Upon generating representative conformations for contacting TM helix pairs, a library of tertiary folds was built by overlaying overlapping TM helices of the representative conformations. To speed up the computation, TM helix bundles were represented only by CAs from this step on. Folds with bad contacts (distance between CAs of different TM helices of 4.0 Å or less) were removed. TM helix bundle proteins are well known to form compact structures. Thus loosely packed folds were removed as well. For this, we calculated the average distance between all pairs of CA-based helix centers. Simple experiments on a 2D grid show that 16.0 Å is a reasonable upper bound for TM helix bundles consisting of 7 TM helices (Fig. 3). This compactness filter was also useful in keeping the sizes of libraries to a manageable level. For TM helix bundles with 4 TM helices, it was not necessary to apply this sort of compactness filter since the sizes of the libraries were inherently small. Scoring of a library of TM helix bundle folds based on sequence conservation patterns A number of studies have shown that the more conserved a sequence position is, the less likely it is to be exposed to the lipid bilayer. We make use of this observation for scoring the libraries of TM helix bundle folds. Specifically, scores were computed using the following equation.
منابع مشابه
Wave Motion and Stop-Bands in Pipes with Helical Characteristics Using Wave Finite Element Analysis
Pipes are widely used in many industrial and mechanical applications and devices. Although there are many different constructions according to the specific application and device, these can show helical pattern, such as spiral pipes, wire-reinforced pipes/shells, spring-suspension, and so on. Theoretical modelling of wave propagation provides a prediction about the dynamic behavior, and it is f...
متن کاملSequence based methods for the prediction and analysis of the structural topology of transmembrane beta barrel proteins
Transmembrane proteins play a major role in the normal functioning of the cell. Many transmembrane proteins act as a drug target and hence are of utmost importance to the pharmaceutical industry. In spite of the significance of transmembrane proteins, relatively few transmembrane 3D structures are available due to experimental bottlenecks. Due to this, it is imperative to develop novel computat...
متن کاملOn the derivation of propensity scales for predicting exposed transmembrane residues of helical membrane proteins
Helical membrane proteins (HMPs) play a crucial role in diverse physiological processes. Given the difficulty in determining their structures by experimental techniques, it is desired to develop computational methods for predicting the burial status of transmembrane residues. Deriving a propensity scale for the 20 amino acids to be exposed to the lipid bilayer from known structures is central t...
متن کاملStructural aspects of oligomerization taking place between the transmembrane alpha-helices of bitopic membrane proteins.
Recent advances in biophysical methods have been able to shed more light on the structures of helical bundles formed by the transmembrane segments of bitopic membrane proteins. In this manuscript, I attempt to review the biological importance and diversity of these interactions, the energetics of bundle formation, motifs capable of inducing oligomerization and methods capable of detecting, solv...
متن کاملTMDIM: an improved algorithm for the structure prediction of transmembrane domains of bitopic dimers
[Formula: see text]-Helical transmembrane proteins are the most important drug targets in rational drug development. However, solving the experimental structures of these proteins remains difficult, therefore computational methods to accurately and efficiently predict the structures are in great demand. We present an improved structure prediction method TMDIM based on Park et al. (Proteins 57:5...
متن کاملFolding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs.
Helical integral membrane proteins share several structural determinants that are widely conserved across their universe. The discovery of common motifs has furthered our understanding of the features that are important to stability in the membrane environment, while simultaneously providing clues about proteins that lack high-resolution structures. Motif analysis also helps to target mutagenes...
متن کامل