A Flexible Clustering Method Based on UPGMA and ISS

نویسنده

  • M. J. Dallwitz
چکیده

A formula is given for a flexible combinatorial clustering strategy, having UPGMA and ISS as special cases. When numerical clustering methods are used to investigate the structure of taxonomic groups, it is common practice to try several methods, having different clustering intensities. For example, one might use single linkage (weak), UPGMA (medium), and complete linkage (strong) (Sneath and Sokal 1973). Lance and Williams (1967) proposed a ‘flexible’ clustering strategy in which the intensity of clustering can be varied continuously by means of a parameter, β. This strategy has the WPGMA method as a special case (β=0). This could be considered a disadvantage, as the UPGMA method is generally preferred over WPGMA (Sneath and Sokal 1973). Furthermore, experiments with data on grass genera (Watson, Clifford and Dallwitz 1985) have tended to show that the Lance and Williams flexible method, with parameter values giving intense clustering, is inferior (Watson and Dallwitz, unpublished) to other intensely clustering methods such as ISS (Burr 1970) and information analysis (Williams, Lambert and Lance 1966). Abel and Williams (1985) also argue that UPGMA and ISS are preferable to the Lance and Williams flexible method. The flexible clustering strategy suggested here has UPGMA and ISS as special cases. In the notation of Sneath and Sokal (1973), the formula is U (J ,K ) ,L = [(tJ+σtL)UJ ,L+(tK+σtL)UK ,L–σtLUJ ,K]/(tJ+tK+tL) where σ is the intensity parameter. σ=0 gives UPGMA and σ=l gives ISS. The formula is easy to incorporate into clustering programs which use combinatorial fusion strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...

متن کامل

Hybrid Hierarchical Clustering: an Experimental Analysis

In this paper, we present a hybrid clustering method that combines the divisive hierarchical clustering with the agglomerative hierarchical clustering. We used the bisect K-means divisive clustering algorithm in our method. First, we cluster the document collection using bisect K-means clustering algorithm with K’ > K as the total number of clusters. Second, we calculate the centroids of K’ clu...

متن کامل

Efficient algorithms for exact hierarchical clustering of huge datasets: Tackling the entire protein space

Motivation: UPGMA (average-linkage clustering) is probably the most popular algorithm for hierarchical data clustering, especially in computational biology. UPGMA however, is a complete-linkage method, in the sense that all edges between data points are needed in memory. Due to this prohibitive memory requirement UPGMA is not scalable for very large datasets. Results: We present novel memory-co...

متن کامل

DNA Fingerprinting Based on Repetitive Sequences of Iranian Indigenous Lactobacilli Species by (GTG)5- REP-PCR

Background and Objective: The use of lactobacilli as probiotics requires the application of accurate and reliable methods for the detection and identification of bacteria at the strain level. Repetitive sequence-based polymerase chain reaction (rep-PCR), a DNA fingerprinting technique, has been successfully used as a powerful molecular typing method to determine taxonomic and phylogenetic relat...

متن کامل

Obtaining better quality final clustering by merging a collection of clusterings

MOTIVATION Clustering methods including k-means, SOM, UPGMA, DAA, CLICK, GENECLUSTER, CAST, DHC, PMETIS and KMETIS have been widely used in biological studies for gene expression, protein localization, sequence recognition and more. All these clustering methods have some benefits and drawbacks. We propose a novel graph-based clustering software called COMUSA for combining the benefits of a coll...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988