Parallel Maritime Traffic Clustering Based on Apache Spark
نویسنده
چکیده
Maritime traffic patterns extraction is an essential part for maritime security and surveillance and DBSCANSD is a density based clustering algorithm extracting the arbitrary shapes of the normal lanes from AIS data. This paper presents a parallel DBSCANSD algorithm on top of Apache Spark. The project is an experimental research work and the results shown in this paper is preliminary. The experiment conducted in the paper shows that the proposed method can work well with maritime traffic data although the performance is not satisfying. A discussion about the method’s limitation and potential issues is shown at the end of the paper. key words: Maritime Surveillance; Clustering; Apache Spark
منابع مشابه
Survey and Performance Evaluation of DBSCAN Spatial Clustering Implementations for Big Data and High-Performance Computing Paradigms
Big data is often mined using clustering algorithms. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a popular spatial clustering algorithm. However, it is computationally expensive and thus for clustering big data, parallel processing is required. The two prevalent paradigms for parallel processing are High-Performance Computing (HPC) based on Message Passing Interface ...
متن کاملA Tabu search based clustering algorithm and its parallel implementation on Spark
The well-known K-means clustering algorithm has been employed widely in different application domains ranging from data analytics to logistics applications. However, the K-means algorithm can be affected by factors such as the initial choice of centroids and can readily become trapped in a local optimum. In this paper, we propose an improved K-means clustering algorithm that is augmented by a T...
متن کاملMassively Parallel Algorithms and Hardness for Single-Linkage Clustering Under $\ell_p$-Distances
We present massively parallel (MPC) algorithms and hardness of approximation results for computing Single-Linkage Clustering of n input d-dimensional vectors under Hamming, `1, `2 and `∞ distances. All our algorithms run in O(logn) rounds of MPC for any fixed d and achieve (1 + )-approximation for all distances (except Hamming for which we show an exact algorithm). We also show constant-factor ...
متن کاملAnatomy of machine learning algorithm implementations in MPI, Spark, and Flink
With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on large data sets as well as they need to be executed with minimal time in order to extract useful information in a time constrained environment. MPI...
متن کاملResearch of Performance of Distributed Platforms Based on Clustering Algorithm
With the deep development and application of Internet technology, data need to be processed more and more, when dealing with large amounts of data. Spark is a versatile high-performance and parallel computing framework, which can be applied to data mining. This paper is based on the parallelization of platforms’ K-means algorithm, by building a YARN cluster environment and making experiments to...
متن کامل