A Unified Subspace Outlier Ensemble Framework for Outlier Detection
نویسندگان
چکیده
$EVWUDFW 7KH WDVN RI RXWOLHU GHWHFWLRQ LV WR ILQG VPDOO JURXSV RI GDWD REMHFWV WKDW DUH H[FHSWLRQDO ZKHQ FRPSDUHG ZLWK UHVW ODUJH DPRXQW RI GDWD 'HWHFWLRQ RI VXFK RXWOLHUV LV LPSRUWDQW IRU PDQ\ DSSOLFDWLRQV VXFK DV IUDXG GHWHFWLRQ DQG FXVWRPHU PLJUDWLRQ 0RVW VXFK DSSOLFDWLRQV DUH KLJK GLPHQVLRQDO GRPDLQV LQ ZKLFK WKH GDWD PD\ FRQWDLQ KXQGUHGV RI GLPHQVLRQV +RZHYHU WKH RXWOLHU GHWHFWLRQ SUREOHP LWVHOI LV QRW ZHOO GHILQHG DQG QRQH RI WKH H[LVWLQJ GHILQLWLRQV DUH ZLGHO\ DFFHSWHG HVSHFLDOO\ LQ KLJK GLPHQVLRQDO VSDFH ,Q WKLV SDSHU RXU ILUVW FRQWULEXWLRQ LV WR SURSRVH D XQLILHG IUDPHZRUN IRU RXWOLHU GHWHFWLRQ LQ KLJK GLPHQVLRQDO VSDFHV IURP DQ HQVHPEOH OHDUQLQJ YLHZSRLQW ,Q RXU QHZ IUDPHZRUN WKH RXWO\LQJ QHVV RI HDFK GDWD REMHFW LV PHDVXUHG E\ IXVLQJ RXWOLHU IDFWRUV LQ GLIIHUHQW VXEVSDFHV XVLQJ D FRPELQDWLRQ IXQFWLRQ $FFRUGLQJO\ ZH VKRZ WKDW DOO H[LVWLQJ UHVHDUFKHV RQ RXWOLHU GHWHFWLRQ FDQ EH UHJDUGHG DV VSHFLDO FDVHV LQ WKH XQLILHG IUDPHZRUN ZLWK UHVSHFW WR WKH VHW RI VXEVSDFHV FRQVLGHUHG DQG WKH W\SH RI FRPELQDWLRQ IXQFWLRQ XVHG ,Q DGGLWLRQ WR GHPRQVWUDWH WKH XVHIXOQHVV RI WKH HQVHPEOH OHDUQLQJ EDVHG RXWOLHU GHWHFWLRQ IUDPHZRUN ZH GHYHORSHG D YHU\ VLPSOH DQG IDVW DOJRULWKP QDPHO\ 62( 6XEVSDFH 2XWOLHU (QVHPEOH XVLQJ GLPHQVLRQDO 6XEVSDFHV LQ ZKLFK RQO\ VXEVSDFHV ZLWK RQH GLPHQVLRQ LV XVHG IRU PLQLQJ RXWOLHUV IURP ODUJH FDWHJRULFDO GDWDVHWV 7KH 62( DOJRULWKP QHHGV RQO\ WZR VFDQV RYHU WKH GDWDVHW DQG KHQFH LV YHU\ DSSHDOLQJ LQ UHDO GDWD PLQLQJ DSSOLFDWLRQV ([SHULPHQWDO UHVXOWV RQ UHDO GDWDVHWV DQG ODUJH V\QWKHWLF GDWDVHWV VKRZ WKDW 62( KDV FRPSDUDEOH SHUIRUPDQFH ZLWK UHVSHFW WR WKRVH VWDWH RI DUW RXWOLHU GHWHFWLRQ DOJRULWKPV RQ LGHQWLI\LQJ WUXH RXWOLHUV DQG 62( FDQ EH DQ RUGHU RI PDJQLWXGH IDVWHU WKDQ RQH RI WKH IDVWHVW RXWOLHU GHWHFWLRQ DOJRULWKPV NQRZQ VR IDU
منابع مشابه
Outlier Detection in Random Subspaces over Data Streams: An Approach for Insider Threat Detection
Insider threat detection is an emergent concern for industries and governments due to the growing number of attacks in recent years. Several Machine Learning (ML) approaches have been developed to detect insider threats, however, they still suffer from a high number of false alarms. None of those approaches addressed the insider threat problem from the perspective of stream mining data where a ...
متن کاملUnsupervised Ensemble Learning for Mining Top-n Outliers
Outlier detection is an important and attractive problem in knowledge discovery in large datasets. Instead of detecting an object as an outlier, we study detecting the n most outstanding outliers, i.e. the top-n outlier detection. Further, we consider the problem of combining the top-n outlier lists from various individual detection methods. A general framework of ensemble learning in the top-n...
متن کاملX Less is More: Building Selective Anomaly Ensembles
Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in t...
متن کاملOn Evaluation of Outlier Rankings and Outlier Scores
Outlier detection research is currently focusing on the development of new methods and on improving the computation time for these methods. Evaluation however is rather heuristic, often considering just precision in the top k results or using the area under the ROC curve. These evaluation procedures do not allow for assessment of similarity between methods. Judging the similarity of or correlat...
متن کاملA Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...
متن کاملOutlier Detection Based On Neighborhood Proximity
Outliers, also called anomalies are data patterns that do not conform to the behavior that is expected or differ too much from the rest. In some cases, outliers could be caused by errors in data generating/collecting methods or by inherent data variability. However, in many situations, outliers are indications of interesting events that have never been known before and hence, an adaptation of t...
متن کامل