- Qualitative Similarity Index
نویسندگان
چکیده
There are different approaches to the temporal study of time evolving systems. In this paper, this study is carried out by means of the comparison of time series. It is proposed as an improvement in the comparison of time series with the inclusion of qualitative knowledge. Taking into account the evolution of the values of the series, our approach uses a similarity index defined by qualitative labels. Every label represents a rank of values that we may consider similar, from a qualitative perspective. The proposed index is defined by means of the matching of qualitative labels. Let be a time series, a label is obtained with every transaction of every two adjacent values. This label depends on the magnitude and the sign of the transaction. If every label is represented by means of a single character, then the evolution of the temporal series is translated into a string. Finally, an index of similarity of the time series is defined according to the similarity of the obtained strings. This proposed index has been applied to the dataset of Australian signs (Australian Sing Language Dataset) of UCI KDD with a correct identification rate superior to the 95 per cent. In this paper, it has been applied to study the different behaviours of a semiqualitative model of logistic growth with a delay. Introduction The study of temporal evolution of systems is an incipient research area. It is necessary the development of new methodologies to analyze and to process the time series obtained from the evolution of those systems. These time series are usually stored in databases. It is necessary to develop new algorithms for its study. A time series is a sequence of real values, each one represents the value of a magnitude at a point of time. A possible field of application is the comparison of time series in numeric databases. We are interested in databases obtained from the evolution of dynamic systems. It is proposed in (Ortega et al. 99) a methodology to simulate semiqualitative dynamic systems. These simulations are stored into a database. This database may also be obtained by means of the data acquire from sensors installed in the real system. Copyright c 2002, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. There are a variety of applications to produce and to store time series. When we are working with time-series databases, one of the biggest problems is to calculate the similarity between two given time series. The interest of a similarity measure is multiple. In this paper, this interest is focused on: finding the different behaviour patterns of the system stored in a database, looking for a particular pattern, reducing the number of relevance series before applying analysis algorithms, etc. Assuming that the similarity is a distance function of the time series, we catalogue the basic queries to manage a time series database in three groups: Range query: given a series, finding those series that are similar in within a distance. Nearest neighbor: given a series, finding in the database the series which is the nearest neighbor in accordance with a defined distance. All-pairs query: finding all the pairs of series in the database that are within a distance of each other. Many approaches have been proposed to solve the problem of an efficient comparison. In this paper, we proposed to carry out this comparison from a qualitative perspective, taking into account the variations of the time series values. The idea of our proposal is to abstract the numerical values of the time series and to concentrate the comparison in the shape of the time series. In this paper, we do not take into account time series with noise, it is postponed for future work. The rest of this paper is structured as follows: first, we analyze some related works that we have used to define our index. Next the Shape Definition Language is introduced, which is appropriate to carry out the translation of the original values, and we also explain the problem of the Longest Common Subsequence ( ). Next section introduces our approach, the Qualitative Similarity Index. Finally, this index is applied to a semiqualitative logistics growth model with a delay. Related Work In the literature, different approximations have been developed to study time series. In (Agrawal et al. 95b) present the shape definition language ( ). ( ), which is suitable for retrieving objects based on shapes contained in the histories associated with these objects. An important feature of the language is its ability to perform blurry matching where the user cares only about the overall shape. This work is the key to translate the original data into a qualitative description of its evolution that allows a subsequent comparison. On the other hand, those works that study the problem of the Longest Common Subsequence ( ) are also related to this paper, because we use ( ) algorithms as the baseline to define our index. (Paterson&Dancı́k94) collect a complete review of most known solutions to this problem. There has been many works on comparison of time series (Faloutsos et al. 94). Most of them propose the definition of indexes, which are applied to a subset of values extracted from the original data. These indexes provide an efficient comparison of time series. They are defined taking into account only some of the original values. This improvement of speed produces a decrease in the accuracy of the comparison. These indexes are obtained applying a transformation from the time series values to a lower dimensionality space. Other approaches differ in the way to carry out this mapping or in the selected target space. One option is to select only a few coefficients of a transformation process to represent all the information of the original series. In this approach, we find the change from the time domain to frequency domain. In (Agrawal et al. 95a), it is used the Discrete Fourier Transform ( ) to reduce the series to the first Fourier Coefficients. In (Chan&Waichee99), it is proposed a solution based in the Discrete Wavelet Transform ( ) in a similar way. Other approaches reduce the original data in the time series, selecting a subset of the original values. In (Keogh&Pazzani98), it uses a piece-wise linear segmentation of the original curve. In (Keogh&Pazzani99), the Dynamic Time Warping ( ) algorithm is applied over the segmented data, and finally in the work (Keogh&Pazzani00) it is made a straight dimensionality reduction with Piecewise Constant Approximation, selecting a fixed number of values of the original data. It is known as PCA-indexing. The last option is to generate a 4-tuple-feature vector extracted from every sequence. In (Kim et al. 01), this vector is proposed and a new distance function is defined as the similarity index. In the paper (Cheung&Stephanopoulos90), it is proposed the study of series with different time scales from a qualitative perspective. Shape Definition Language (SDL) This language proposed in (Agrawal et al. 95b) is very suitable to create queries about the evolution of values or magnitudes along the time. For any set of values stored for a time period, the fundamental idea in is to divide the range of the possible variations between adjacent values in a collection of disjoint ranges and to assign a label for each of them. Figure 1 represents a sample division in three regions of the positive axis. This division depends on the possible variations and the assigned labels. The behaviour of a series Label1
منابع مشابه
Genetic Variation within Iranian Iris Species Using Morphological Traits
Iris belongs toIridaceae family and it is monocotyledon. Iris is one of the important ornamental and medicinal plants. 34 iris genotypes (14 species) collected from different provinces of Iran were planted at National Institute of Ornamental Plants (NIOP) Iran. All of the species evaluated for 15 quantitative traits and 30 qualitative traits. Results showed that the highest positive correlation...
متن کاملSpatial Variation of Plant Succession Oil Impacted Sites in the Niger Delta
The paper investigates communities in Nigeria specifica ecosystem with the aim of generating baseline information on the similarity and diversity of emerging plant species. Hence the Jaccard Index of similarity and diversity was used to assess the variable nature of observed post remediation soil recovery sites. The study shows that the comparative analysis using the Jaccard successio 38.08% si...
متن کاملQSSI: A new Similarity Index for Qualitative Time Series. Application to classifying of voltage sags
This work is focused on defining and implementing a new similarity criterion for sequences of symbolic representations. The proposed algorithm returns a normalized index related to the degree of matching between sequences of qualitative labels. Performance of this method has been tested in the classification of voltage sags (transient reduction of voltage magnitude) gathered at 25kV distributio...
متن کاملA Qualitative Images Fusion
Qualitative techniques usually imply some compromise between the amount of information we can deal with and the simplicity, velocity or easy understanding of the computations . In this paper we focus our attention on the enormous amount of information that any image can supply, and how the exceedingly complex problem of treating it fast can be reduced. In order to accomplish it, we introduce a ...
متن کاملComparative Analysis of B-Mode Breast Ultrasound Image Enhancement Techniques
Acquisition of ultrasound images is cheap and noninvasive as it does not require ionizing radiations as compared to other medical imaging techniques but the problem with these images lies in its inherent characteristics like speckle noise and low contrast. In this paper the performance of various image enhancement techniques are compared by applying them on B-Mode breast ultrasound images (BUS)...
متن کامل