Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

Authors

  • M. Mohammadi Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran.
  • M. Sarmad Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran.
Abstract:

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus, the idea of Debruyne’s “outlier map” is employed in this paper to identify the outlying points in the SVM classification problem. However, due to the computational reasons such as convenience and rapidity, a robust Mahalanobis distance based on the minimum covariance determinant estimator is utilized. This method has a good compatibility by the data with low dimensional structure. In addition to the classification accuracy, the margin width is used as the criterion for the performance assessment. The larger margin is more desired, due to the higher generalization ability. It should be noted that, by omission of the detected outliers using the suggested outlier map the generalization ability and accuracy of SVM are increased. This leads to the conclusion that the proposed method is very efficient in identifying the outliers. The capability of recognizing the outlying and misclassified observations for this new version of outlier map has been retained similar to the older version, which is tested on the simulated and real world data.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator

Mahalanobis-type distances in which the shape matrix is derived from a consistent highbreakdown robust multivariate location and scale estimator can be used to 2nd outlying points. Hardin and Rocke (http://www.cipic.ucdavis.edu/∼dmrocke/preprints.html) developed a new method for identifying outliers in a one-cluster setting using an F distribution. We extend the method to the multiple cluster c...

full text

The minimum weighted covariance determinant estimator

In this paper we introduce weighted estimators of the location and dispersion of a multivariate data set with weights based on the ranks of the Mahalanobis distances. We discuss some properties of the estimators like the breakdown point, influence function and asymptotic variance. The outlier detection capacities of different weight functions are compared. A simulation study is given to investi...

full text

A Fast Algorithm for the Minimum Covariance Determinant Estimator

The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter. Its objective is to nd h observations (out of n) whose covariance matrix has the lowest determinant. Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. ...

full text

RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator

The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the centre and shape of a high dimensional data set. It consists of determining a subsample of h points out of nwhichminimises the generalised variance. By definition, the computation of this estimator gives rise to a combinatorial optimisation problem, forwhich several approximate algorithms have bee...

full text

Using Wavelet Support Vector Machine for Fault Diagnosis of Gearboxes

Identifying fault categories, especially for compound faults, is a challenging task in mechanical fault diagnosis. For this task, this paper proposes a novel intelligent method based on wavelet packet transform (WPT) and multiple classifier fusion. An unexpected damage on the gearbox may break the whole transmission line down. It is therefore crucial for engineers and researchers to monitor the...

full text

Support Vector Clustering for Outlier Detection

In this paper a novel Support vector clustering(SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only prefe...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 7  issue 2

pages  299- 309

publication date 2019-04-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023