Discovering and Matching Elastic Rules from Sequence Databases
نویسندگان
چکیده
This paper presents techniques for discovering and matching rules with elastic patterns. Elastic patterns are ordered lists of elements that can be stretched along the time axis. Elastic patterns are useful for discovering rules from data sequences with different sampling rates. For fast discovery of rules whose heads (left-hand sides) and bodies (right-hand sides) are elastic patterns, we construct a trimmed suffix tree from succinct forms of data sequences and keep the tree as a compact representation of rules. The trimmed suffix tree is also used as an index structure for finding rules matched to a target head sequence. When matched rules cannot be found, the concept of rule relaxation is introduced. Using a cluster hierarchy and relaxation error as a new distance function, we find the least relaxed rules that provide the most specific information on a target head sequence. Experiments on synthetic data sequences reveal the effectiveness of our proposed approach.
منابع مشابه
Discovering and Matching Elastic
This paper presents techniques for discovering and matching rules with elastic patterns. Elastic patterns are ordered lists of elements that can be stretched along the time axis. For example, hA; A; B; B; Bi is an instance of an elastic pattern AB while hA; C; Bi is not. Elastic patterns are useful for discovering rules from data sequences with diierent sampling rates. For fast discovery of rul...
متن کاملTwo Approaches to Handling Noisy Variation in Text Mining
Variation and noise in textual database entries can prevent text mining algorithms from discovering important regularities. We present two novel methods to cope with this problem: (1) an adaptive approach to “hardening” noisy databases by identifying duplicate records, and (2) mining “soft” association rules. For identifying approximately duplicate records, we present a domain-independent two-l...
متن کاملText Mining with Information Extraction
The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this paper, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases ...
متن کاملMining association rules from biological databases
area such as bioinformatics. This methodology allows the identification of relationships between low-magnitude similarity (LMS) sequence patterns and other well-contrasted protein characteristics, such as those described by database annotations. We start with the identification of these signals inside protein sequences by exhaustive database searching and automatic pattern recognition strategie...
متن کاملDiscovering sequence motifs of different patterns parallel using DNA operations
Discovery of motifs in biological sequences and various types of subsequences in commercial databases have varied applications and interpretations. This paper proposes a new approach to solve the Combinatorial Pattern Matching (CPM), search for continuous and gapped rigid subsequences and discover Longest Common Rigid Subsequences (LCRS) from the given sequences using DNA operations and modifie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Fundam. Inform.
دوره 47 شماره
صفحات -
تاریخ انتشار 2000