Evolution-Based Clustering Technique for Data Streams with Uncertainty
نویسندگان
چکیده
The evolution-based stream clustering method supports the monitoring and change detection of clustering structures. This paper presented HUE-Stream which extends E-Stream and E-Stream++ by introducing a distance function, cluster representation and histogram management for the different types of clustering structure evolution. Compared with UMicro and LuMicro, HUE-Stream produces higher clustering quality and is more robust over highly uncertain data streams; however, it requires longer processing time due to the fact that HUE-Stream detects change in the clustering structure evolution too frequently (in every round). To improve the processing time, proper periods of clustering structure evolution change detection were determined. With these proper periods, the processing time was greatly improved, while retaining the clustering quality. Compared to actual class of data in the KDDCup 1999 network intrusion detection dataset, a comparable number of clusters was obtained in all stream progressions.
منابع مشابه
E-Stream: Evolution-Based Technique for Stream Clustering
Data streams have recently attracted attention for their applicability to numerous domains including credit fraud detection, network intrusion detection, and click streams. Stream clustering is a technique that performs cluster analysis of data streams that is able to monitor the results in real time. A data stream is continuously generated sequences of data for which the characteristics of the...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملA framework for clustering massive graph streams
In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the ...
متن کاملEvent Streams Clustering Using Machine Learning Techniques
Data streams are usually of unbounded lengths which push users to consider only recent observations by focusing on a time window, and ignore past data. However, in many real world applications, past data must be taken in consideration to guarantee the efficiency, the performance of decision making and to handle data streams evolution over time. In order to build a selectively history to track t...
متن کاملA Sketch-based Clustering Algorithm for Uncertain Data Streams
Due to the inaccuracy and noisy, uncertainty is inherent in time series streams, and increases the complexity of streams clustering. For the continuous arriving and massive data size, efficient data storage is a crucial task for clustering uncertain data streams. With hash-compressed structure, an extended uncertain sketch and update strategy are proposed to store uncertain data streams. And ba...
متن کامل