Improvement of density-based clustering algorithm using modifying the density definitions and input parameter
Authors
Abstract:
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically identify the number of clusters. There are advantages and disadvantages in this algorithm. It is difficult to determine the input parameters of this algorithm by the user. Also, this algorithm is unable to detect clusters with different densities in the data set. ISB-DBSCAN algorithm is another example of density-based algorithms that eliminates the disadvantages of the DBSCAN algorithm. ISB-DBSCAN algorithm reduces the input parameters of DBSCAN algorithm and uses an input parameter k as the nearest neighbor's number. This method is also able to identify different density clusters, but according to the definition of the new core point, It is not able to identify some clusters in a different data set. This paper presents a method for improving ISB-DBSCAN algorithm. A proposed approach, such as ISB-DBSCAN, uses an input parameter k as the number of nearest neighbors and provides a new definition for core point. This method performs clustering in three steps, with the difference that, unlike ISB-DBSCAN algorithm, it can create a new cluster in the final stage. In the proposed method, a new criterion, such as the number of dataset dimensions used to detect noise in the used data set. Since the determination of the k parameter in the proposed method may be difficult for the user, a new method with genetic algorithm is also proposed for the automatic estimation of the k parameter. To evaluate the proposed methods, tests were carried out on 11 standard data sets and the accuracy of clustering in the methods was evaluated. The results showe that the proposed method is able to achieve better results in different data sets compare to other available methods. In the proposed method, the automatic determination of k parameter also obtained acceptable results.
similar resources
the effect of traffic density on the accident externality from driving the case study of tehran
در این پژوهش به بررسی اثر افزایش ترافیک بر روی تعداد تصادفات پرداخته شده است. به این منظور 30 تقاطع در شهر تهران بطور تصادفی انتخاب گردید و تعداد تصادفات ماهیانه در این تقاطعات در طول سالهای 89-90 از سازمان کنترل ترافیک شهر تهران استخراج گردید و با استفاده از مدل داده های تابلویی و نرم افزار eviews مدل خطی و درجه دوم تخمین زده شد و در نهایت این نتیجه حاصل شد که تقاطعات پر ترافیک تر تعداد تصادفا...
15 صفحه اولStatistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
full textAn Optimised Density Based Clustering Algorithm
The DBSCAN [1] algorithm is a popular algorithm in Data Mining field as it has the ability to mine the noiseless arbitrary shape Clusters in an elegant way. As the original DBSCAN algorithm uses the distance measures to compute the distance between objects, it consumes so much processing time and its computation complexity comes as O (N). In this paper we have proposed a new algorithm to improv...
full textpiClust: A density based piRNA clustering algorithm
Piwi-interacting RNAs (piRNAs) are recently discovered, endogenous small non-coding RNAs. piRNAs protect the genome from invasive transposable elements (TE) and sustain integrity of the genome in germ cell lineages. Small RNA-sequencing data can be used to detect piRNA activations in a cell under a specific condition. However, identification of cell specific piRNA activations requires sophistic...
full textDensity Based Distribute Data Stream Clustering Algorithm
To solve the problem of distributed data streams clustering, the algorithm DB-DDSC (Density-Based Distribute Data Stream Clustering) was proposed. The algorithm consisted of two stages. First presented the concept of circular-point based on the representative points and designed the iterative algorithm to find the densityconnected circular-points, then generated the local model at the remote si...
full textCommunity Detection in Complex Networks Using Density-based Clustering Algorithm
Like clustering analysis, community detection aims at assigning nodes in a network into different communities. Fdp is a recently proposed density-based clustering algorithm which does not need the number of clusters as prior input and the result is insensitive to its parameter. However, Fdp cannot be directly applied to community detection due to its inability to recognize the community centers...
full textMy Resources
Journal title
volume 16 issue 2
pages 105- 120
publication date 2019-09
By following a journal you will be notified via email when a new issue of this journal is published.
No Keywords
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023