A Novel High Dimensional and High Speed Data Streams Algorithm: HSDStream
نویسندگان
چکیده
This paper presents a novel high speed clustering scheme for high-dimensional data stream. Data stream clustering has gained importance in different applications, for example, network monitoring, intrusion detection, and real-time sensing. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high dimensionality make it non-trivial. In order to tackle this problem, projected subspace within the high dimensions and limited window sized data per unit of time are used for clustering purpose. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream) which employs exponential moving averages to reduce the size of the memory and speed up the processing of projected subspace data stream. It works in three steps: i) initialization, ii) real-time maintenance of core and outlier micro-clusters, and iii) on-demand offline generation of the final clusters. The proposed algorithm is tested against high dimensional density-based projected clustering (HDDStream) for cluster purity, memory usage, and the cluster sensitivity. Experimental results are obtained for corrected KDD intrusion detection dataset. These results show that HSDStream outperforms the HDDStream in all performance metrics, especially, the memory usage and the processing speed. Keywords—Evolving data stream; high dimensionality; projected clustering; density-based clustering; micro-clustering
منابع مشابه
Scaling up for high dimensional and high speed data streams: HSDStream
This paper presents a novel high speed clustering scheme for high dimensional data streams. Data stream clustering has gained importance in different applications, for example, in network monitoring, intrusion detection, and real-time sensing are few of those. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملHybrid Fuzzy Algorithm for the Novel Yokeless Axial Flux-Switching Permanent-Magnet Motor
Flux switching permanent magnet synchronous motor (FMSM) has the characteristics such as large output torque, fast speed response and high reliability, so it can be widely used in the field of high-performance and high precision control.In the permanent magnet synchronous motor control system, the speed loop usually adopts the PI control algorithm. Although the PI control algorithm is relativel...
متن کاملFast Approximate Wavelet Tracking on Streams
Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis — however, to effectiv...
متن کامل