Entropy Estimations Using Correlated Symmetric Stable Random Projections

نویسندگان

  • Ping Li
  • Cun-Hui Zhang
چکیده

Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the finite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufficient for entropy estimation. Our experiments confirm that this method is able to well approximate the Shannon entropy using small storage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Algorithm for Compressed Counting with Applications in Shannon Entropy Estimation in Dynamic Data

Efficient estimation of the moments and Shannon entropy of data streams is an important task in modern machine learning and data mining. To estimate the Shannon entropy, it suffices to accurately estimate the α-th moment with ∆ = |1 − α| ≈ 0. To guarantee that the error of estimated Shannon entropy is within a ν-additive factor, the method of symmetric stable random projections requires O ( 1 ν...

متن کامل

Periodically correlated and multivariate symmetric stable‎ ‎processes related to periodic and cyclic flows

‎In this work we introduce and study discrete time periodically correlated stable‎ ‎processes and multivariate stationary stable processes related to periodic and cyclic‎ ‎flows‎. ‎Our study involves producing a spectral representation and a‎ ‎spectral identification for such processes‎. ‎We show that the third‎ ‎component of a periodically correlated stable process has a component related to a...

متن کامل

Gaussian mixtures: entropy and geometric inequalities

A symmetric random variable is called a Gaussian mixture if it has the same distribution as the product of two independent random variables, one being positive and the other a standard Gaussian random variable. Examples of Gaussian mixtures include random variables with densities proportional to e−|t| p and symmetric p-stable random variables, where p ∈ (0, 2]. We obtain various sharp moment an...

متن کامل

A Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting

Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...

متن کامل

Small Deviations of Stable Processes and Entropy of the Associated Random Operators

We investigate the relation between the small deviation problem for a symmetric α-stable random vector in a Banach space and the metric entropy properties of the operator generating it. This generalizes former results due to Li and Linde and to Aurzada. It is shown that this problem is related to the study of the entropy numbers of a certain random operator. In some cases an interesting gap app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012