Robust outlier detection with L0-SVDD
نویسندگان
چکیده
The problem of outlier detection consists in finding data that is not representative of the population from which it was ostensibly derived. Recently, to solve this problem, Liu et al. [1] proposed a two steps hypersphere-based approach, taking into account a confidence score pre-calculated for each input data. Defining these scores in a first step, independently from the second one, makes this approach not well-suited for large stream data. To solve these difficulties, we propose a global reformulation of the support vector data description (SVDD) problem based on the L0 norm, well suited for outlier detection. We demonstrate that this L0-SVDD problem can be solved using an iterative procedure providing data specific weighting terms. We show that our approach outperforms state of the art outlier detection techniques using both synthetic and clinical data.
منابع مشابه
A Revisit to Support Vector Data Description
Support vector data description (SVDD) is a useful method for outlier detection and has been applied to a variety of applications. However, in the existing optimization procedure of SVDD, there are some issues which may lead to improper usage of SVDD. Some of the issues might already be known in practice, but the theoretical discussion, justification and correction are still lacking. Given the ...
متن کاملFast Incremental SVDD Learning Algorithm with the Gaussian Kernel
Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses ...
متن کاملA Revisit to Support Vector Data Description (SVDD)
Support vector data description (SVDD), proposed by [1], is a useful method for outlier detection. Its model is obtained by solving the dual optimization problem. In this paper, we point out some issues in their derivations. For example, they formulate SVDD as a non-convex problem and derive the dual problem only under some parameter settings. Given the wide use of SVDD, it is important to addr...
متن کاملAutonomously Determining the Parameters for SVDD with RBF Kernel from a One-Class Training Set
The one-class support vector machine “support vector data description” (SVDD) is an ideal approach for anomaly or outlier detection. However, for the applicability of SVDD in real-world applications, the ease of use is crucial. The results of SVDD are massively determined by the choice of the regularisation parameter C and the kernel parameter σ of the widely used RBF kernel. While for two-clas...
متن کاملSupplementary Material of Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds
Theorem 1 (Exact Recovery of Outlier Pursuit). Suppose m = Θ(n), Range(L0) = Range(PI⊥ 0 L0), and [S0]:j 6∈ Range(L0) for ∀j ∈ I0. Then any solution (L0+H,S0−H) to Outlier Pursuit (1) with λ = 1/ √ log n exactly recovers the column space of L0 and the column support of S0 with a probability at least 1 − cn−10, if the column support I0 of S0 is uniformly distributed among all sets of cardinality...
متن کامل