Sequential Pattern Mining Using Formal language Tools

نویسندگان

  • Sunil Joshi
  • R. C. Jain
چکیده

In present scenario almost every system and working is computerized and hence all information and data are being stored in Computers. Huge collections of data are emerging. Retrieval of untouched, hidden and important information from this huge data is quite tedious work. Data Mining is a great technological solution which extracts untouched, hidden and important information from vast databases to investigate noteworthy knowledge in the data warehouse. An important problem in data mining is to discover patterns in various fields like medical science, world wide web, telecommunication etc. In the field of Data Mining, Sequential pattern mining is one of the method in which we retrieve hidden pattern linked with instant or other sequences. In sequential pattern mining we extract those sequential patterns whose support count are greater than or equal to given minimum support threshold value. In current scenario users are interested in only specific and interesting pattern instead of entire probable sequential pattern. To control the exploration space users can use many heuristics which can be represented as constraints. Many algorithms have been developed in the fields of constraint mining which generate patterns as per user expectation. In the present work we will be exploring and enhancing the regular expression constraints .Regular expression is one of the constraint and number of algorithm developed for sequential pattern mining which uses regular expression as a constraint. Some constraints are neither regular nor context free like cross-serial pattern abcd used in Swiss German Data. We cannot construct equivalent deterministic finite automata (DFA) or Push down automata (PDA) for such type of patterns. We have proposed a new algorithm PMFLT (Pattern Mining using Formal Language Tools) for sequential pattern mining using formal language tools as constraints. The proposed algorithm finds only user specific frequent sequence in efficient optimized way as compared to other existing algorithm. Our experimental results clearly show that proposed algorithm is quite enhanced and improved and generates optimum frequent sequences as per user expectation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A FCA-based Analysis of Sequential Care Trajectories

This paper presents a research work in the domains of sequential pattern mining and formal concept analysis. Using a combined method, we show how concept lattices and interestingness measures such as stability can improve the task of discovering knowledge in symbolic sequential data. We give example of a real medical application to illustrate how this approach can be useful to discover patterns...

متن کامل

Finding Sequential Patterns from Large Sequence Data

Data mining is the task of discovering interesting patterns from large amounts of data. There are many data mining tasks, such as classification, clustering, association rule mining, and sequential pattern mining. Sequential pattern mining finds sets of data items that occur together frequently in some sequences. Sequential pattern mining, which extracts frequent subsequences from a sequence da...

متن کامل

Supporting Interactive Sequential Pattern Discovery in Databases

One of the most important data mining problems is discovery of sequential patterns. Sequential pattern mining consists in discovering all frequently occurring subsequences in a collection of data sequences. This paper discusses several issues concerning possible extensions to traditional database management systems required to support sequential pattern discovery: a sequential pattern query lan...

متن کامل

Using Answer Set Programming for pattern mining

Serial pattern mining consists in extracting the frequent sequential patterns from a unique sequence of itemsets. This paper explores the ability of a declarative language, such as Answer Set Programming (ASP), to solve this issue efficiently. We propose several ASP implementations of the frequent sequential pattern mining task: a non-incremental and an incremental resolution. The results show ...

متن کامل

Sequential pattern mining and classification of patient path

The purpose of the French DRG’s information system is to describe hospital activity by focusing on hospital stays. However, certain types of healthcare should be analysed as a temporal process covering multiple hospital stays. We propose two data-mining methods for patient path analysis across multiple healthcare institution : sequential pattern mining and formal concept analysis. Our objective...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012