Optimizing Sequential Pattern Mining Within Multiple Streams
نویسندگان
چکیده
Analyzing information is recently becoming much more important than ever, as it is produced massively in every area. In the past years, data streams became more and more important and so were algorithms that can mine hidden patterns out of those non static data bases. Those algorithms can also be used to simulate processes and to find important information step by step. The translation of an English text into German is such a process. Linguists try to find characteristic patterns in this process to better understand it. For this purpose, keystrokes and eye movements during the process are tracked. The StrPMiner was designed to mine sequential patterns from this translation data. One dominant algorithm to find sequential patterns is the PrefixSpan. Though it was created for static data bases, lots of data stream algorithms collect batches and use the algorithm to find sequential patterns. This batch approach is a simple solution, but makes it impossible to find patterns in between two consequent batches. The PBuilder is introduced to find sequential patterns with a higher accuracy and is used by the StrPMiner to find patterns.
منابع مشابه
Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams
Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...
متن کاملPredicting Sequential Pattern Changes in Data Streams
Data streams are utilized in an increasing number of real-time information technology applications. Unlike traditional datasets, data streams are temporally ordered, fast changing and massive. Due to their tremendous volume, performing multiple scans of the entire data stream is impractical. Thus, traditional sequential pattern mining algorithms cannot be applied. Accordingly, the present study...
متن کاملSequential Pattern Mining of Multimodal Streams in the Humanities
Research in the humanities is increasingly attracted by data mining and data management techniques in order to efficiently deal with complex scientific corpora. Particularly, the exploration of hidden patterns within different types of data streams arising from psycholinguistic experiments is of growing interest in the area of translation process research. In order to support psycholinguistic e...
متن کاملCollective Sequential Pattern Mining in Distributed Evolving Data Streams
The advances in processing and communication techniques resulted in a multitude of emerging applications that interact with streams of data. Traditional data mining systems store arriving data, collect them for later mining, and make multiple passes over the collected data. Unfortunately, these systems are prohibitively slow when they deal with data streams with massive amounts of data arriving...
متن کاملMining Sequential Patterns Across Data Streams
There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different data streams. In this paper, we define a challenging problem of mining frequent sequential patterns across multiple data streams. We propose an efficient algorithm MILE to manage the mining process. The propose...
متن کامل