Reliably Capture Local Clusters in Noisy Domains From Parallel Universes
نویسندگان
چکیده
When seeking for small local patterns it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We propose a new algorithm [2] that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when the temporal information is ignored. This is achieved by making use of the fact that noise does not reproduce its incidental structure but even small patterns do. In particular, we developed a method to track clusters over time based on an optimal match of data partitions between time periods. Using the terminology of parallel universes in [1], our approach is characterised as follows:
منابع مشابه
Matching Partitions over Time to Reliably Capture Local Clusters in Noisy Domains
When seeking for small clusters it is very intricate to distinguish between incidental agglomeration of noisy points and true local patterns. We present the PAMALOC algorithm that addresses this problem by exploiting temporal information which is contained in most business data sets. The algorithm enables the detection of local patterns in noisy data sets more reliable compared to the case when...
متن کاملFuzzy Clustering in Parallel Universes with Noise Detection
We present an extension of the fuzzy c-Means algorithm that operates on different feature spaces, so-called parallel universes, simultaneously and also incorporates noise detection. The method assigns membership values of patterns to different universes, which are then adopted throughout the training. This leads to better clustering results since patterns not contributing to clustering in a uni...
متن کاملLernen in parallelen Universen
Classical data mining techniques are almost always based on a unique object representation, which is often realized as a high-dimensional attribute vector per object. In many application domains, however, the objects to be analyzed (molecules, 3D models, processes) can be described easily in various ways, which leads to a variety of object representations: so called Parallel Universes. This the...
متن کاملSubspace outlier mining in large multimedia databases
Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...
متن کاملInfall Regions of Galaxy Clusters
In hierarchical clustering, galaxy clusters accrete mass through the aggregation of smaller systems. Thus, the velocity field of the infall regions of clusters contains significant random motion superimposed on radial infall. Because the purely spherical infall model does not predict the amplitude of the velocity field correctly, methods estimating the cosmological density parameter Ω0 based on...
متن کامل