Overcome Support Vector Machine Diagnosis Overfitting

نویسندگان

  • Henry Han
  • Xiaoqian Jiang
چکیده

Support vector machines (SVMs) are widely employed in molecular diagnosis of disease for their efficiency and robustness. However, there is no previous research to analyze their overfitting in high-dimensional omics data based disease diagnosis, which is essential to avoid deceptive diagnostic results and enhance clinical decision making. In this work, we comprehensively investigate this problem from both theoretical and practical standpoints to unveil the special characteristics of SVM overfitting. We found that disease diagnosis under an SVM classifier would inevitably encounter overfitting under a Gaussian kernel because of the large data variations generated from high-throughput profiling technologies. Furthermore, we propose a novel sparse-coding kernel approach to overcome SVM overfitting in disease diagnosis. Unlike traditional ad-hoc parametric tuning approaches, it not only robustly conquers the overfitting problem, but also achieves good diagnostic accuracy. To our knowledge, it is the first rigorous method proposed to overcome SVM overfitting. Finally, we propose a novel biomarker discovery algorithm: Gene-Switch-Marker (GSM) to capture meaningful biomarkers by taking advantage of SVM overfitting on single genes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fault diagnosis in a distillation column using a support vector machine based classifier

Fault diagnosis has always been an essential aspect of control system design. This is necessary due to the growing demand for increased performance and safety of industrial systems is discussed. Support vector machine classifier is a new technique based on statistical learning theory and is designed to reduce structural bias. Support vector machine classification in many applications in v...

متن کامل

Oversampling to Overcome Overfitting: Exploring the Relationship between Data Set Composition, Molecular Descriptors, and Predictive Modeling Methods

The traditional biological assay is very time-consuming, and thus the ability to quickly screen large numbers of compounds against a specific biological target is appealing. To speed up the biological evaluation of compounds, high-throughput screening is widely used in the fields of biomedical, biological information, and drug discovery. The research presented in this study focuses on the use o...

متن کامل

A Fuzzy Support Tensor Machines based on Support Vector Data Description

Most of the traditional machine learning algorithms are based on the vector, but in tensor space, Tensor learning is helpful to overcome the over-fitting problem than vector learning. In the meanwhile, these algorithms based on tensor require a smaller set of decision variables as compared to those approaches based on vector. Support tensor machine (STM) is a prevalent machine learning approach...

متن کامل

Localized Support Vector Machine and Its Efficient Algorithm

Nonlinear Support Vector Machines employ sophisticated kernel functions to classify data sets with complex decision surfaces. Determining the right parameters of such functions is not only computationally expensive, the resulting models are also susceptible to overfitting due to their large VC dimensions. Instead of fitting a nonlinear model, this paper presents a framework called Localized Sup...

متن کامل

Short Term Load Forecasting Using A Hybrid Model Based On Support Vector Regression

This paper proposes a new hybrid method based on support vector regression (SVR) to predict the load value of power systems accurately. The proposed method will use the SVR to overcome some deficiencies such as overfitting and complicated structure that exist in the neural network. In order to find the optimal values of the parameters, krill herd (KH) algorithm is used as the optimizer. The KH ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2014