Data Discovery and Anomaly Detection Using Atypicality: Theory

نویسندگان

  • Anders Høst-Madsen
  • Elyas Sabeti
  • Chad Walton
چکیده

A central question in the era of ’big data’ is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by ’atypical’ in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets. Index Terms Big Data, atypicality, minimum description length, data discovery, anomaly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Discovery and Anomaly Detection Using Atypicality: Signal Processing Methods

The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting” parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we de...

متن کامل

Atypicality for Heart Rate Variability Using a Pattern-Tree Weighting Method

Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem’s context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery ...

متن کامل

Detection of Mo geochemical anomaly in depth using a new scenario based on spectrum–area fractal analysis

Detection of deep and hidden mineralization using the surface geochemical data is a challenging subject in the mineral exploration. In this work, a novel scenario based on the spectrum–area fractal analysis (SAFA) and the principal component analysis (PCA) has been applied to distinguish and delineate the blind and deep Mo anomaly in the Dalli Cu–Au porphyry mineralization area. The Dalli miner...

متن کامل

A Novel Method for Unsupervised Anomaly Detection Using Unlabelled Data

Most current intrusion detection methods cannot process large amounts of audit data for real-time operation. In this paper, anomaly network intrusion detection method based on Principal Component Analysis (PCA) for data reduction and Fuzzy Adaptive Resonance Theory (Fuzzy ART) for classifier is presented. Moreover, PCA is applied to reduce the high dimensional data vectors and distance between ...

متن کامل

Dynamic anomaly detection by using incremental approximate PCA in AODV-based MANETs

Mobile Ad-hoc Networks (MANETs) by contrast of other networks have more vulnerability because of having nature properties such as dynamic topology and no infrastructure. Therefore, a considerable challenge for these networks, is a method expansion that to be able to specify anomalies with high accuracy at network dynamic topology alternation. In this paper, two methods proposed for dynamic anom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.03189  شماره 

صفحات  -

تاریخ انتشار 2017