Detecting Low Complexity Clusters by Skewness and Kurtosis in Data Stream Clustering

نویسندگان

  • Mingzhou Song
  • Hongbin Wang
چکیده

Established statistical representations of data clusters employ up to second order statistics including mean, variance, and covariance. Strategies for merging clusters have been largely based on intraand inter-cluster distance measures. The distance concept allows an intuitive interpretation, but it is not designed to merge from the viewpoint of probability distributions. We suggest an alternative strategy to compare clusters based on higher order statistics to capture the underlying probability distributions. Higher order statistics, such as multivariate skewness and kurtosis, enable a more accurate description of the shape of a cluster. Although the original definitions of kurtosis and skewness do require simultaneous involvement of all data points, our finding shows that their estimation can be decomposed into combinations of the cross moments of subsets of data. This decomposable property makes it possible to apply skewness and kurtosis to data stream clustering, where historical data are not accessible. We utilize tests for normality based on skewness and kurtosis to discover cluster pairs that can be merged to produce a less complex normal cluster even if they have different means or covariance structures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

ON FUZZY NEIGHBORHOOD BASED CLUSTERING ALGORITHM WITH LOW COMPLEXITY

The main purpose of this paper is to achieve improvement in thespeed of Fuzzy Joint Points (FJP) algorithm. Since FJP approach is a basisfor fuzzy neighborhood based clustering algorithms such as Noise-Robust FJP(NRFJP) and Fuzzy Neighborhood DBSCAN (FN-DBSCAN), improving FJPalgorithm would an important achievement in terms of these FJP-based meth-ods. Although FJP has many advantages such as r...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection

The data stream model of computation is often used for analyzing huge volumes of continuously arriving data. In this paper, we present a novel algorithm called DUCstream for clustering data streams. Our work is motivated by the needs to develop a single-pass algorithm that is capable of detecting evolving clusters, and yet requires little memory and computation time. To that end, we propose an ...

متن کامل

Implied volatility skews and stock return skewness and kurtosis implied by stock option prices

The Black–Scholes* option pricing model is commonly applied to value a wide range of option contracts. However, the model often inconsistently prices deep in-the-money and deep out-of-the-money options. Options professionals refer to this well-known phenomenon as a volatility ‘skew’ or ‘smile’. In this paper, we examine an extension of the Black–Scholes model developed by Corrado and Su‡ that s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006