Overcome Support Vector Machine Diagnosis Overfitting
نویسندگان
چکیده
Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.
منابع مشابه
Fault diagnosis in a distillation column using a support vector machine based classifier
Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...
متن کاملOversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods
The traditional biological assay is very time-consuming, and thus the ability to quickly screen large numbers of compounds against a specific biological target is appealing. To speed up the biological evaluation of compounds, high-throughput screening is widely used in the fields of biomedical, biological information, and drug discovery. The research presented in this study focuses on the use o...
متن کاملA Fuzzy Support Tensor Machines based on Support Vector Data Description
Most of the traditional machine learning algorithms are based on the vector, but in tensor space, Tensor learning is helpful to overcome the over-fitting problem than vector learning. In the meanwhile, these algorithms based on tensor require a smaller set of decision variables as compared to those approaches based on vector. Support tensor machine (STM) is a prevalent machine learning approach...
متن کاملLocalized Support Vector Machine and Its Efficient Algorithm
Nonlinear Support Vector Machines employ sophisticated kernel functions to classify data sets with complex decision surfaces. Determining the right parameters of such functions is not only computationally expensive, the resulting models are also susceptible to overfitting due to their large VC dimensions. Instead of fitting a nonlinear model, this paper presents a framework called Localized Sup...
متن کاملShort Term Load Forecasting Using A Hybrid Model Based On Support Vector Regression
This paper proposes a new hybrid method based on support vector regression (SVR) to predict the load value of power systems accurately. The proposed method will use the SVR to overcome some deficiencies such as overfitting and complicated structure that exist in the neural network. In order to find the optimal values of the parameters, krill herd (KH) algorithm is used as the optimizer. The KH ...
متن کامل