Cluster Characterization through a Representativity Measure
نویسندگان
چکیده
Clustering is an unsupervised learning task which provides a decomposition of a dataset into subgroups that summarize the initial base and give information about its structure. We propose to enrich this result by a numerical coefficient that describes the cluster representativity and indicates the extent to which they are characteristic of the whole dataset. It is defined for a specific clustering algorithm, called Outlier Preserving Clustering Algorithm, opca, which detects clusters associated with major trends but also with marginal behaviors, in order to offer a complete description of the inital dataset. The proposed representativity measure exploits the iterative process of opca to compute the typicality of each identified cluster.
منابع مشابه
Field validation of secondary data sources: a novel measure of representativity applied to a Canadian food outlet database
BACKGROUND Validation studies of secondary datasets used to characterize neighborhood food businesses generally evaluate how accurately the database represents the true situation on the ground. Depending on the research objectives, the characterization of the business environment may tolerate some inaccuracies (e.g. minor imprecisions in location or errors in business names). Furthermore, if th...
متن کاملA Multi-Word Term Extraction Program for Arabic Language
Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several ty...
متن کاملObjective criteria to assess representativity of soil fungal community profiles.
Soil fungal community structures are often highly heterogeneous even among samples taken from small field plots. Sample pooling is widely used in order to overcome this heterogeneity, however, no objective criteria have yet been defined on how to determine the number of samples to be pooled for representatively profiling a field plot. In the present study PCR/RFLP and T-RFLP analysis of fungal ...
متن کاملOptimizing Spatial Declustering Weights – Comparison of Methods
Analysis of a spatial phenomenon is to a great extent affected by the frequent irregular structures and/or the preferential clustering of the sampling schemes. To obtain representative statistics for an area of interest, the influence of clustered measurements needs to be reduced by attributing them lower weights. In this case study, two standard methods, the polygonal and the cell-declustering...
متن کاملNew Developments in Representativity Approach to study Advanced Assembly Concepts in the EOLE Critical Facility
A new representativity approach based on sensitivity analysis of integral parameters to nuclear data, in the field of Advanced Assemblies Concepts (AAC) design is developped. The adopted scheme proposes an original approach to the problem, going from the initial « microscopic » pin-cells integral parameters to the whole « macroscopic » assembly integral parameters.. The originality of the prese...
متن کامل