Data Discovery and Anomaly Detection Using Atypicality: Theory
نویسندگان
چکیده
A central question in the era of ’big data’ is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by ’atypical’ in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets. Index Terms Big Data, atypicality, minimum description length, data discovery, anomaly.
منابع مشابه
Data Discovery and Anomaly Detection Using Atypicality: Signal Processing Methods
The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting” parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we de...
متن کاملAtypicality for Heart Rate Variability Using a Pattern-Tree Weighting Method
Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem’s context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery ...
متن کاملDetection of Mo geochemical anomaly in depth using a new scenario based on spectrum–area fractal analysis
Detection of deep and hidden mineralization using the surface geochemical data is a challenging subject in the mineral exploration. In this work, a novel scenario based on the spectrum–area fractal analysis (SAFA) and the principal component analysis (PCA) has been applied to distinguish and delineate the blind and deep Mo anomaly in the Dalli Cu–Au porphyry mineralization area. The Dalli miner...
متن کاملA Novel Method for Unsupervised Anomaly Detection Using Unlabelled Data
Most current intrusion detection methods cannot process large amounts of audit data for real-time operation. In this paper, anomaly network intrusion detection method based on Principal Component Analysis (PCA) for data reduction and Fuzzy Adaptive Resonance Theory (Fuzzy ART) for classifier is presented. Moreover, PCA is applied to reduce the high dimensional data vectors and distance between ...
متن کاملDynamic anomaly detection by using incremental approximate PCA in AODV-based MANETs
Mobile Ad-hoc Networks (MANETs) by contrast of other networks have more vulnerability because of having nature properties such as dynamic topology and no infrastructure. Therefore, a considerable challenge for these networks, is a method expansion that to be able to specify anomalies with high accuracy at network dynamic topology alternation. In this paper, two methods proposed for dynamic anom...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.03189 شماره
صفحات -
تاریخ انتشار 2017