ApproxMAP: Approximate Mining of Consensus Sequential Patterns

نویسندگان

  • Hye-Chung Kum
  • Jian Pei
  • Wei Wang
  • Dean Duncan
چکیده

Sequential pattern mining is an important data mining task with broad applications. However, conventional methods may meet inherent difficulties in mining databases with long sequences and noise. They may generate a huge number of short and trivial patterns but fail to find interesting patterns approximately shared by many sequences. To attack these problems, in this paper, we propose the theme of approximate sequential pattern mining roughly defined as identifying patterns approximately shared by many sequences. We present an efficient and effective algorithm, ApproxMAP (for APPROXimate Multiple Alignment Pattern mining), to mine consensus patterns from large sequence databases. The method works in two steps. First, sequences are clustered by similarity. Then, consensus patterns are mined directly from each cluster through multiple alignment. A novel structure called weighted sequence is used to compress the alignment result. For each cluster, the longest consensus pattern best representing the cluster is generated from its weighted sequence. Our extensive experimental results on both synthetic and real data sets show that ApproxMAP is robust to noise and both effective and efficient in mining approximate sequential patterns from noisy sequence databases with lengthy sequences. In particular, we report a successful case of mining a real data set which triggered important investigations in welfare services.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Approach for Predicting Movement of Mobile Users Based on Data Mining Techniques

Predicting locations of users with transportable devices like informatics phones, smartphones, iPads and iPods public wireless local area networks (WLANs) plays an important role in location management and network resource allocation. Several techniques in machine learning and data processing, like sequential pattern mining and clustering, are wide used. However, these approaches have 2 deficie...

متن کامل

Comparative Study of Sequential Pattern Mining Models

The process of finding interesting, novel, and useful patterns from data is now commonly known as Knowledge Discovery and Data mining (KDD). In this paper, we examine closely the problem of mining sequential patterns and propose a general evaluation method to assess the quality of the mined results. We propose four evaluation criteria, namely (1) recoverability, (2) the number of spurious patte...

متن کامل

Comparative Study of Sequential Pattern Mining Frameworks Support Framework vs. Multiple Alignment Framework

Knowledge discovery and datamining (KDD) is commonly defined as the nontrivial process of finding interesting, novel and useful patterns from data. In this paper, we examine closely the problem of mining sequential patterns and propose a comprehensive evaluation method to assess the quality of the mined results. We propose four evaluation criteria, namely (1) recoverability, (2) the number of s...

متن کامل

REAFUM: Representative Approximate Frequent Subgraph Mining

Noisy graph data and pattern variations are two thorny problems faced by mining frequent subgraphs. Traditional exact-matching based methods, however, only generate patterns that have enough perfect matches in the graph database. As a result, a pattern may either remain undetected or be reported as multiple (almost identical) patterns if it manifests slightly different instances in different gr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003