The Relative Importance of Input Encoding and Learning Methodology on Protein Secondary Structure Prediction
نویسندگان
چکیده
In this thesis the relative importance of input encoding and learning algorithm on protein secondary structure prediction is explored. A novel input encoding, based on multidimensional scaling applied to a recently published amino acid substitution matrix, is developed and shown to be superior to an arbitrary input encoding. Both decimal valued and binary input encodings are compared. Two neural network learning algorithms, Resilient Propagation and Learning Vector Quantization, which have not previously been applied to the problem of protein secondary structure prediction, are examined. Input encoding is shown to have a greater impact on prediction accuracy than learning methodology with a binary input encoding providing the highest training and test set prediction accuracy. INDEX WORDS: Neural Networks, Protein Secondary Structure Prediction, Input Encoding, Resilient Propagation, Learning Vector Quantization THE RELATIVE IMPORTANCE OF INPUT ENCODING AND LEARNING METHODOLOGY ON PROTEIN SECONDARY STRUCTURE PREDICTION
منابع مشابه
THE RELATIVE IMPORTANCE OF INPUT ENCODING AND LEARNING METHODOLOGY ON PROTEIN SECONDARY STRUCTURE PREDICTION by ARNSHEA CLAYTON
In this thesis the relative importance of input encoding and learning algorithm on protein secondary structure prediction is explored. A novel input encoding, based on multidimensional scaling applied to a recently published amino acid substitution matrix, is developed and shown to be superior to an arbitrary input encoding. Both decimal valued and binary input encodings are compared. Two neura...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملIdentification of Multiple Input-multiple Output Non-linear System Cement Rotary Kiln using Stochastic Gradient-based Rough-neural Network
Because of the existing interactions among the variables of a multiple input-multiple output (MIMO) nonlinear system, its identification is a difficult task, particularly in the presence of uncertainties. Cement rotary kiln (CRK) is a MIMO nonlinear system in the cement factory with a complicated mechanism and uncertain disturbances. The identification of CRK is very important for different pur...
متن کاملPrediction of Secondary Structure of Citrus Viroids Reported from Southern Iran
Abstract Viroids are smallest, single-stranded, circular, highly structured plant pathogenic RNAs that do not code for any protein. Viroids belong to two families, the Avsunviroidae and the Pospiviroidae. Members of the Pospiviroidae family adopt a rod-like secondary structure. In this study the most stable secondary structures of citrus viroid variants that reported from Fars province wer...
متن کاملDevising and experimenting correlation-based metrics for evaluating the effectiveness of input encoding techniques in prediction tasks
Motivations Defining an optimal encoding for input data is fundamental to achieve high performances in prediction tasks. Its main responsibility is to transform input data to a format suitable for the classification algorithm. The selection of the best encoding is typically done by resorting to the knowledge of a human expert, entrusted with extracting the features that s/he deems useful for th...
متن کامل