Mining Disjunctive Sequential Patterns from News Stream

نویسندگان

  • Kazuhiro Shimizu
  • Isamu Shioya
  • Takao Miura
چکیده

Frequent disjunctive pattern is known to be a sophisticated method of text mining in a single document that satisfies anti-monotonicity, by which we can discuss efficient algorithm based on APRIORI. In this work, we propose a new online and single-pass algorithm by which we can extract current frequent disjunctive patterns by a weighting method for past events from a news stream. And we discuss some experimental results. 1 Motivation Recently much attention has been focused on quantitative analysis of several features of text thus a new approach, called text mining, has been proposed to extend traditional methodologies[13]. Text mining approach comes from frequency and co-occurrence among text 1 , the former means important words arise many times while the latter says related words occur at the same time[10]. On the other hand, recently much attention has been focused on data mining on data stream which is huge amount dynamic data on network[6]. Generally we see that data stream has some characteristics;(1)huge amount, (2)high speed, (3)changing data distribution, (4)continuous. Therefor we like algorithms which should be able to output approximate solutions for any time and any stream during the whole process and rapid analysis with limited calculation resources. That's why we cannot apply traditional methods for this problem[6]. Especially data stream such as news articles and broadcast transcription is called news stream. Generally we know that issue interval of news articles are irregular. So we assume that time stamp of news articles correspond to content-time of news articles. Under the news stream environment, we see that news articles always occur, and collections of news articles grow incrementally with new ones. We usually have interests in recent ones, i.e., new ones are more important than old. There have been important research results obtained based on anti-monotonicity such as APRIORI[2, 4]. Generally we can avoid vast range of search but not enough to sequence data, since in text mining the property doesn't hold any more and we can't apply APRIORI technique any more. An interesting approach of disjunctive patterns has been proposed with a new counting method for frequency to get efficient pattern mining algorithm on single sequence data. The counting method satisfies anti-monotonicity thus we can obtain efficient mining algorithm based on APRIORI[11].

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ranking Sequential Patterns with Respect to Significance

We present a reliable universal method for ranking sequential patterns (itemset-sequences) with respect to significance in the problem of frequent sequential pattern mining. We approach the problem by first building a probabilistic reference model for the collection of itemsetsequences and then deriving an analytical formula for the frequency for sequential patterns in the reference model. We r...

متن کامل

Disjunctive Sequential Patterns on Single Data Sequence and Its Anti-monotonicity

In this work, we proposes a novel method for mining frequent disjunctive patterns on single data sequence. For this purpose, we introduce a sophisticated measure that satisfies anti-monotonicity, by which we can discuss efficient mining algorithm based on APRIORI. We discuss some experimental results.

متن کامل

Memory-Bounded High Utility Sequential Pattern Mining over Data Streams

Mining high utility sequential patterns (HUSPs) has emerged as an important topic in data mining. However, the existing studies on this topic focus on static data and do not consider streaming data. Streaming data are fast changing, continuously generated and unbounded in amount. Such data can easily exhaust computer resources (e.g., memory) unless proper resource-aware mining is performed. In ...

متن کامل

Incremental Mining of Closed Sequential Patterns in Multiple Data Streams

Sequential pattern mining searches for the relative sequence of events, allowing users to make predictions on discovered sequential patterns. Due to drastically advanced information technology over recent years, data have rapidly changed, growth in data amount has exploded and real-time demand is increasing, leading to the data stream environment. Data in this environment cannot be fully stored...

متن کامل

An Advanced Model for Mining Time Interval Sequential Patterns in Stream data

Mainly existing sequential pattern mining algorithms are hard to find out long significant time-interval sequential patterns in information stream. In this paper, we propose a new bitmap-based algorithm of mining d time-interval sequential pattern in information stream called DSBMMS, which is based on binary bit counting and multiple time-interval sequential position. We transform the whole seq...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007