Generalising Ward’s Method for Use with Manhattan Distances
نویسندگان
چکیده
The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
منابع مشابه
Disguised Face Recognition by Using Local Phase Quantization and Singular Value Decomposition
Disguised face recognition is a major challenge in the field of face recognition which has been taken less attention. Therefore, in this paper a disguised face recognition algorithm based on Local Phase Quantization (LPQ) method and Singular Value Decomposition (SVD) is presented which deals with two main challenges. The first challenge is when an individual intentionally alters the appearance ...
متن کاملClosed Loop Layout
In layout problem of manufacturing cells, rectangular cells to be positioned without overlapping. The objective is to minimize the total transportation cost. The types of layouts are categorized according to the shape of the transportation system’s track. In the case of a closed loop layout, the track has a rectangular shape. A common difficulty of all layout problems is the manner in which dis...
متن کاملImplementation of Face Recognition Algorithm on Fields Programmable Gate Array Card
The evolution of today's application technologies requires a certain level of robustness, reliability and ease of integration. We choose the Fields Programmable Gate Array (FPGA) hardware description language to implement the facial recognition algorithm based on "Eigen faces" using Principal Component Analysis. In this paper, we first present an overview of the PCA used for facial recognition,...
متن کاملPerformance Evaluation of Different Distance Measures Used in Color Iris Authentication
This paper proposes performance evaluation of different distance measures used in color iris authentication. The color iris segmentation is carried out using histogram and circular Hough transform. The color iris features are extracted using histogram method. Different distance measures are used for iris authentication. The experimental evaluation shows that Euclidean and Manhattan distance are...
متن کاملAn approach to rank efficient DMUs in DEA based on combining Manhattan and infinity norms
In many applications, discrimination among decision making units (DMUs) is a problematic technical task procedure to decision makers in data envelopment analysis (DEA). The DEA models unable to discriminate between extremely efficient DMUs. Hence, there is a growing interest in improving discrimination power in DEA yet. The aim of this paper is ranking extreme efficient DMUs in DEA based on exp...
متن کامل