Generalized Outlier Detection with Flexible Kernel Density Estimates
نویسندگان
چکیده
We analyse the interplay of density estimation and outlier detection in density-based outlier detection. By clear and principled decoupling of both steps, we formulate a generalization of density-based outlier detection methods based on kernel density estimation. Embedded in a broader framework for outlier detection, the resulting method can be easily adapted to detect novel types of outliers: while common outlier detection methods are designed for detecting objects in sparse areas of the data set, our method can be modified to also detect unusual local concentrations or trends in the data set if desired. It allows for the integration of domain knowledge and specific requirements. We demonstrate the flexible applicability and scalability of the method on large real world data sets.
منابع مشابه
Outlier Detection with Kernel Density Functions
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density ...
متن کاملOnline Bivariate Outlier Detection in Final Test Using Kernel Density Estimation
In parametric IC testing, outlier detection is applied to filter out potential unreliable devices. Most outlier detection methods are used in an offline setting and hence are not applicable to Final Test, where immediate pass/fail decisions are required. Therefore, we developed a new bivariate online outlier detection method that is applicable to Final Test without making assumptions about a sp...
متن کاملContinuous Adaptive Outlier Detection on Distributed Data Streams
In many applications, stream data are too voluminous to be collected in a central fashion and often transmitted on a distributed network. In this paper, we focus on the outlier detection over distributed data streams in real time, firstly, we formalize the problem of outlier detection using the kernel density estimation technique. Then, we adopt the fading strategy to keep pace with the transie...
متن کاملStatistical Outlier Detection in Large Multivariate Datasets
This work focuses on detecting outliers within large and very large datasets using a computationally efficient procedure. The algorithm uses Tukey’s biweight function applied on the dataset to filter out the effects of extreme values for obtaining appropriate location and scale estimates. Robust Mahalanobis distances for all data points are calculated using these location and scale estimates. A...
متن کاملAdaptive kernel density-based anomaly detection for nonlinear systems
This paper presents an unsupervised, density-based approach to anomaly detection. The purpose is to define a smooth yet effective measure of outlierness that can be used to detect anomalies in nonlinear systems. The approach assigns each sample a local outlier score indicating how much one sample deviates from others in its locality. Specifically, the local outlier score is defined as a relativ...
متن کامل