Summarizing Sets of Categorical Sequences - Selecting and Visualizing Representative Sequences
نویسندگان
چکیده
This paper is concerned with the summarization of a set of categorical sequence data. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighborhood. The goal is to yield a representative set that exhibits the key features of the whole sequence data set and permits easy sounded interpretation. We propose an heuristic for determining the representative set that first builds a list of candidates using a representativeness score and then eliminates redundancy. We propose also a visualization tool for rendering the results and quality measures for evaluating them. The proposed tools have been implemented in TraMineR our R package for mining and visualizing sequence data and we demonstrate their efficiency on a real world example from social sciences. The methods are nonetheless by no way limited to social science data and should prove useful in many other domains.
منابع مشابه
THE ENTROPIES OF THE SEQUENCES OF FUZZY SETS AND THE APPLICATIONS OF ENTROPY TO CARDIOGRAPHY
In this paper, rstly we have introduced to entropy of sequences of fuzzy sets and given sometheorems about it. Secondly, the waves P and T which appears in electrocardiograms weretransferred to fuzzy sets, by using denition of entropy for sequences of fuzzy sets, and somenumerical values were obtained for sequences of waves P and T. Thus any person can makea medical predictions for some cardiac...
متن کاملWijsman Statistical Convergence of Double Sequences of Sets
In this paper, we study the concepts of Wijsman statistical convergence, Hausdorff statistical convergence and Wijsman statistical Cauchy double sequences of sets and investigate the relationship between them.
متن کاملOn Lacunary Statistical Limit and Cluster Points of Sequences of Fuzzy Numbers
For any lacunary sequence $theta = (k_{r})$, we define the concepts of $S_{theta}-$limit point and $S_{theta}-$cluster point of a sequence of fuzzy numbers $X = (X_{k})$. We introduce the new sets $Lambda^{F}_{S_{theta}}(X)$, $Gamma^{F}_{S_{theta}}(X)$ and prove some inclusion relaions between these and the sets $Lambda^{F}_{S}(X)$, $Gamma^{F}_{S}(X)$ introduced in ~cite{Ayt:Slpsfn} by Aytar [...
متن کاملExtracting and Rendering Representative Sequences
Abstract. This paper is concerned with the summarization of a set of categorical sequences. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighbourhood. The proposed heuristic for extracting the representative s...
متن کاملA computational method to analyze the similarity of biological sequences under uncertainty
In this paper, we propose a new method to analyze the difference and similarity of biological sequences, based on the fuzzy sets theory. Considering the sequence order and some chemical and structural properties, we present a computational method to cluster the biological sequences. By some examples, we show that the new method is relatively easy and we are able to compare the sequences of arbi...
متن کامل