An adaptive algorithm for clustering cumulative probability distribution functions using the Kolmogorov-Smirnov two-sample test
نویسندگان
چکیده
This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the KolmogorovSmirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov-Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.
منابع مشابه
Fitting Tree Height Distributions in Natural Beech Forest Stands of Guilan (Case Study: Masal)
In this research, modeling tree height distributions of beech in natural forests of Masal that is located in Guilan province; was investigated. Inventory was carried out using systematic random sampling with network dimensions of 150×200 m and area sample plot of 0.1 ha. DBH and heights of 630 beech trees in 30 sample plots were measured. Beta, Gamma, Normal, Log-normal and Weibull prob...
متن کاملA fast algorithm for two-dimensional Kolmogorov-Smirnov two sample tests
By using the brute force algorithm, the application of the two-dimensional two-sample Kolmogorov–Smirnov test can be prohibitively computationally expensive. Thus a fast algorithm for computing the two-sample Kolmogorov–Smirnov test statistic is proposed to alleviate this problem. The newly proposed algorithm is O(n) times more efficient than the brute force algorithm, where n is the sum of the...
متن کاملFuzzy Empirical Distribution Function: Properties and Application
The concepts of cumulative distribution function and empirical distribution function are investigated for fuzzy random variables. Some limit theorems related to such functions are established. As an application of the obtained results, a method of handling fuzziness upon the usual method of Kolmogorov–Smirnov one-sample test is proposed. We transact the α-level set of imprecise observations in ...
متن کاملModified Kolmogorov-Smirnov Test of Goodness of Fit
A modified version of the Kolmogorov-Smirnov (KS) test is presented as a tool to assess whether a specified, although arbitrary, probability model is unsuitable to describe the underlying distribution of a set of observations. The KS test computes distances between points of the sample cumulative distribution function and the hypothetical one as absolute differences between them, and then consi...
متن کاملA Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-para...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 42 شماره
صفحات -
تاریخ انتشار 2015