An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering
Authors
Abstract:
Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative studies using quantitative and large scale evaluations. In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS clustering were conducted. The 14 well-known similarity measures with their variants and testing their effectiveness on 23 MTS datasets coming from a wide variety of application domains were re-implemented. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all of similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field.
similar resources
Distance Measures for Effective Clustering of ARIMA Time-Series
Many environmental and socioeconomic time–series data can be adequately modeled using Auto-Regressive Integrated Moving Average (ARIMA) models. We call such time–series ARIMA time–series. We consider the problem of clustering ARIMA time–series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time–series for clustering ARIMA time–series, by using the Euclidean distance betwe...
full textAn Empirical Evaluation of Similarity Measures for Time Series Classification
Time series are ubiquitous, and a measure to assess their similarity is a core part of many computational systems. In particular, the similarity measure is the most essential ingredient of time series clustering and classification systems. Because of this importance, countless approaches to estimate time series similarity have been proposed. However, there is a lack of comparative studies using...
full textClustering and Visualization of Multivariate Time Series
The analysis of MTS is an established research area, and methods to carry it out have stemmed both from traditional statistics and from the Machine Learning and Computational Intelligence fields. In this chapter, we are mostly interested in the latter, but considering a mixed approach that can be ascribed to Statistical Machine Learning. MTS are often analyzed for prediction and forecasting and...
full textClustering of Multivariate Time-Series Data
A new methodology for clustering multivariate time-series data is proposed. The methodology is based on calculation of the degree of similarity between multivariate time-series datasets using two similarity factors. One similarity factor is based on principal component analysis and the angles between the principal component subspaces while the other is based on the Mahalanobis distance between ...
full textAn Experiment with Distance Measures for Clustering
Distance measure plays an important role in clustering data points. Choosing the right distance measure for a given dataset is a non-trivial problem. In this paper, we study various distance measures and their effect on different clustering techniques. In addition to the standard Euclidean distance, we use Bit-Vector based, Comparative Clustering based, Huffman code based and Dominance based di...
full textClustering time series under the Fréchet distance
The Fréchet distance is a popular distance measure for curves. We study the problem of clustering time series under the Fréchet distance. In particular, we give (1 + ε)-approximation algorithms for variations of the following problem with parameters k and `. Given n univariate time series P , each of complexity at most m, we find k time series, not necessarily from P , which we call cluster cen...
full textMy Resources
Journal title
volume 31 issue 2
pages 250- 262
publication date 2018-02-01
By following a journal you will be notified via email when a new issue of this journal is published.
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023