Statistical approach to numerical databases: clustering using normalised Minkowski metrics
نویسندگان
چکیده
Pre-processing or normalisation of data sets is widely used in a number of fields of machine intelligence. Contrary to the overwhelming majority of other normalisation procedures, when data is scaled to a unit range, it is argued in the paper that after normalisation of a data set, the average contributions of all features to the measure employed to assess the similarity of the data have to be equal to one another. Using the Minkowski distance as an example of a similarity metric, new normalised metrics are introduced such that the means of all attributes are the same and, hence, contributions of the features to similarity measures are approximately equalised. Such a normalisation is achieved by scaling of the numerical attributes, i.e. by dividing the database values by the means of the appropriate components of the metric.
منابع مشابه
A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملConceptual Clustering of Heterogeneous Distributed Databases
With increasingly more databases becoming available on the Internet, there is a growing opportunity to globalise knowledge discovery and learn general patterns, rather than restricting learning to specific databases from which the rules may not be generalisable. Clustering of distributed databases facilitates learning of new concepts that characterise common features of, and differences between...
متن کاملThe New Software Package for Dynamic Hierarchical Clustering for Circles Types of Shapes
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...
متن کاملNumerical and Categorical Attributes Data Clustering Using K- Modes and Fuzzy K-Modes
Most of the existing clustering approaches are applicable to purely numerical or categorical data only, but not the both. In general, it is a nontrivial task to perform clustering on mixed data composed of numerical and categorical attributes because there exists an awkward gap between the similarity metrics for categorical and numerical data. This paper therefore presents a general clustering ...
متن کامل