Aid to discovery of new protein foldings

ثبت نشده
چکیده

Large-scale sequencing projects produce an exploding number of known protein sequences. The current number is about 36,000 [Bairoch & Boeckmann, 92] sequences, but before the end of the century many more than 100,000 will have to be dealt with. This is in contrast to the far slower increase in the number of known protein structures, currently about 2,000 [Berstein et al., 77]. The rate of increasing is roughly 100 sequences/day and 1.5 3D structure/day. Thus, it is increasingly important to develop computational approaches to determine automatically (predict) the structure of proteins whose sequences are known. Because the general problem of prediction of protein fold is so difficult, researchers have tried to predict regular substructures forming the imaginary level of protein structural description known as secondary structure. Knowledge of the secondary structure can contribute significantly towards the goal of tertiary fold prediction. This knowledge can constrain the possible conformations of the protein [Cohen & Knutz, 89], provide a good starting point and reduces the search space in simulation of protein folding by molecular dynamics [Levitt, 83] or lattice models [Skolnick & Kolinski, 90], or can be used in predicting higher order structures (e.g. super secondary structures [Taylor & Thornton, 83], domains [Lathrop, 87]). The established methods for protein secondary structure prediction include hand-crafted expert rules [Lim, 74], biological predictive patterns [Cohen et al., 83, 86] [Presnell et al., 92], statistical Chou-Fasman theory [Chou & Fasman, 74], information theory-based GOR method [Garnier et al., 78]. More recent methods often make use of inductive learning techniques, whereby a system is trained with a set of sample proteins of known conformation and then uses what it has learned to predict the structure of previously unseen proteins. Both neural networks [Qian & Sejnowski, 88] [Kneller, 90] [Zhang et al., 92] and symbolic induction have been applied [King & Sternberg, 90] [Muggletonet al., 92] in the secondary structure prediction context. Despite the apparent practical importance of the secondary structure concept, the quarter of century long research efforts have shown the existence of a secondary structure prediction limit. Even if this limit has been recently ameliorated up to 70% [Cost & Salzberg, 93] [Leng & Bachanan, 93] [Rost & Sander, 93], this rate of accuracy is too low to be of practical use in constraining the conformation space for tertiary structure prediction. The main reason of the failure of the secondary structure prediction methods is that the formation of structure (including the secondary one) is only to a certain degree due to sequentially local interactions of amino acids. However, most methods known to date do rely on local information. It becomes widely recognized that to deal properly with protein structure prediction problem one should tackle differently the representation issues. Several attempts have been made to change the representations involved into the secondary structure prediction problem. The protein sequence usually presented by amino acids has only been examined in terms of physico-chemical properties of amino acids [Hunter, 91] [Cherkauer & Shavlik, 93]. The secondary structure elements have been replaced by alternative classes of local structure, which account for recognized helical and strand regions, as well as for novel categories such as Nand C-caps of helices and strands [Zhang et al., 93]. However, these first works addressing representation issues do not improve significantly the state-of-the-art. Obviously, much more representation work is needed. Biological knowledge representation is hard for several reasons: the tricky nature and ill-formalized character of the available biological knowledge the lack of general theoretical understanding in the field the fact that this knowledge comes in a raw form

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing an Ontology for Knowledge Discovery in Iran’s Vaccine

Ontology is a requirement engineering product and the key to knowledge discovery. It includes the terminology to describe a set of facts, assumptions, and relations with which the detailed meanings of vocabularies among communities can be determined. This is a qualitative content analysis research. This study has made use of ontology for the first time to discover the knowledge of vaccine in Ir...

متن کامل

Proteomics Applications in Health: Biomarker and Drug Discovery and Food Industry

Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...

متن کامل

Proteomics Applications in Health: Biomarker and Drug Discovery and Food Industry

Advancing in genome sequencing has greatly propelled the understanding of the living world, however, it is insufficient for full description of a biological system. Focusing on, proteomics has emerged as another large-scale platform for improving the understanding of biology. Proteomic experiments can be used for different aspects of clinical and health sciences such as food technology, biomark...

متن کامل

Few Optimal Foldings of HP Protein Chains on Various Lattices ∗

We consider whether or not protein chains in the HP model have unique or few optimal foldings. We solve the conjecture proposed by Aichholzer et al. that the open chain L2k−1 = (HP )(PH) for k ≥ 3 has exactly two optimal foldings on the square lattice. We show that some closed and open chains have unique optimal foldings on the hexagonal and triangular lattices, respectively.

متن کامل

A New Discovery about Inflow Control Devices in Controlling Water and Increasing Oil Recovery

Inflow control devices (ICD), which prevent water breakthrough by controlling the inflow profile of a well, have been used successfully in many oilfields. This paper will introduce a new discovery and an unsuccessful example. Moreover, this paper investigates meticulously and thoroughly to find the application conditions of the new discovery. Based on permeability rush coefficient and permeabil...

متن کامل

Search for the Pharmacophore of Histone Deacetylase Inhibitors Using Pharmacophore Query and Docking Study

Histone deacetylase inhibitors have gained a great deal of attention recently for the treatment of cancers and inflammatory diseases. So design of new inhibitors is of great importance in pharmaceutical industries and labs. Creating pharmacophor models in order to design new molecules or search a library for finding lead compounds is of great interest. This approach reduces the overall cost ass...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007