Outlier Detection and Visualisation in High Dimensional Data
نویسندگان
چکیده
The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Such applications have to deal with high dimensional data sets with hundreds of dimensions. However, in high dimensional space, the data are sparse and the notion of proximity fails to retain its meaningfulness. Many recent algorithms use heuristics such as genetic algorithms, the taboo search... in order to palliate these difficulties in high dimensional data. We present in this paper a new hybrid algorithm for outlier detection in high dimensional data. We evaluate the performances of the new algorithm on different high dimensional data sets, and visualise its results for some data sets.
منابع مشابه
Outlier detection for high dimensional data pdf
Is particularly useful for high dimensional data where outliers cannot be found.High dimensional data in Euclidean space pose special challenges to data. In about just the last few years, the task of unsupervised outlier detection has found.Outlier detection is an outstanding data mining task referred to open pdf with mac word class="text" href="https://tokiqivy.files.wordpress.com/2015/06/opel...
متن کاملOutlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator
The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملExample-Based DB-Outlier Detection from High Dimensional Datasets
Outlier detection is an important problem that has applications in many fields. High dimensional datasets are common in such applications. Among the existing outlier detection methods, Distance-Based outlier (DB-Outlier) detection is one of the most generalizable and simplest approaches. It finds outliers by calculating distances between data points. However, in high dimensional space, data dis...
متن کاملA Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...
متن کامل