Finding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset

نویسندگان

Peng Yang

Qingsheng Zhu

چکیده

Detecting outliers is an important task in many applications. Since most applications possess high dimensional data, traditional outlier detecting methods will become inefficient in such cases. To solve the problem, we propose the concept of outlying reduction by extending attribute reduction in rough set theory. Additionally, we define the key knowledge attribute subspace (KKAS), which can produce the outlying partition approximating to that in the full dimensional attribute space. An efficient method for finding KKAS is proposed. It first finds all outliers in the full attribute space and then, calculates KAS for corresponding projection of each outlier. Finally, the KKAS can be identified by the value of outlying partition similarity. Experiments both on UCI datasets and real-life datasets show that our method is effective, efficient, and highly scalable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Subspace Outlier Detection in High Dimensional Space

Rare data in a large-scale database are called outliers that reveal significant information in the real world. The subspace-based outlier detection is regarded as a feasible approach in very high dimensional space. However, the outliers found in subspaces are only part of the true outliers in high dimensional space, indeed. The outliers hidden in normalclustered points are sometimes neglected i...

متن کامل

Finding and Visualizing Subspace Clusters of High Dimensional Dataset Using Advanced Star Coordinates

Analysis of high dimensional data is a research area since many years. Analysts can detect similarity of data points within a cluster. Subspace clustering detects useful dimensions in clustering high dimensional dataset. Visualization allows a better insight of subspace clusters. However, displaying such high dimensional database clusters on the 2-dimensional display is a challenging task. We p...

متن کامل

A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...

متن کامل

An Overview of Robust Subspace Recovery

This paper will serve as an introduction to the body of work on robust subspace recovery. Robust subspace recovery involves finding an underlying low-dimensional subspace in a dataset that is possibly corrupted with outliers. While this problem is easy to state, it has been difficult to develop optimal algorithms due to its underlying nonconvexity. This work will emphasize advantages and disadv...

متن کامل

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

JCIT

دوره 5 شماره

صفحات -

تاریخ انتشار 2010

Finding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset

نویسندگان

چکیده

منابع مشابه

Robust Subspace Outlier Detection in High Dimensional Space

Finding and Visualizing Subspace Clusters of High Dimensional Dataset Using Advanced Star Coordinates

A Novel Subspace Outlier Detection Approach in High Dimensional Data Sets

An Overview of Robust Subspace Recovery

Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data

عنوان ژورنال:

اشتراک گذاری