OPTICS-OF: Identifying Local Outliers
نویسندگان
چکیده
For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier, however, is not just a binary property. Instead, it is a property that applies to a certain degree to each object in a data set, depending on how ‘isolated’ this object is, with respect to the surrounding clustering structure. In this paper, we formally introduce a new notion of outliers which bases outlier detection on the same theoretical foundation as density-based cluster analysis. Our notion of an outlier is ‘local’ in the sense that the outlier-degree of an object is determined by taking into account the clustering structure in a bounded neighborhood of the object. We demonstrate that this notion of an outlier is more appropriate for detecting different types of outliers than previous approaches, and we also present an algorithm for finding them. Furthermore, we show that by combining the outlier detection with a density-based method to analyze the clustering structure, we can get the outliers almost for free if we already want to perform a cluster analysis on a data set.
منابع مشابه
Local multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran
Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...
متن کاملA Two-Phase Robust Estimation of Process Dispersion Using M-estimator
Parameter estimation is the first step in constructing any control chart. Most estimators of mean and dispersion are sensitive to the presence of outliers. The data may be contaminated by outliers either locally or globally. The exciting robust estimators deal only with global contamination. In this paper a robust estimator for dispersion is proposed to reduce the effect of local contamination ...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملImproving the Robustness to Outliers of Mixtures of Probabilistic PCAs
Principal Component Analysis, when formulated as a probabilistic model, can be made robust to outliers by using a Student-t assumption on the noise distribution instead of a Gaussian one. On the other hand, mixtures of PCA is a model aimed to discover nonlinear dependencies in data by finding clusters and identifying local linear submanifolds. This paper shows how mixtures of PCA can be made ro...
متن کاملWho Should be Interviewed? A Response from Cluster Analysis
Objective: This article presents an application of cluster analysis for social sciences researches especially those studies that have an interview as part of their data collection. This application is more suitable for sequential mixed method researchers who use quantitative data to frame subsequent qualitative subsamples for conducting interviews. Methods: In more detail, the algorithm (i....
متن کامل