Summarization Techniques for Pattern Collections in Data Mining
نویسنده
چکیده
Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A major difficulty with patterns is that pattern collections summarizing the data well are often very large. In this dissertation we describe methods for summarizing pattern collections in order to make them also more understandable. More specifically, we focus on the following themes: Quality value simplifications. We study simplifications of pattern collections based on simplifying the quality values of the patterns. Especially, we study simplification by discretization. Pattern orderings. It is difficult to find a suitable trade-off between the accuracy of the representation and its size. As a solution to this problem, we suggest that patterns could be ordered in such a way that each prefix of the pattern ordering gives a good summary of the whole collection. Pattern chains and antichains. Virtually all pattern collections have natural underlying partial orders. We exploit the partial orders over pattern collections by clustering the patterns into chains and antichains. Change profiles. We describe how patterns can be related to each other by comparing how their quality values change with re-
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملText Summarization and Discovery of Frames and Relationship from Natural Language Text - A R&D Methodology
The paper deals with the concept of data mining whereby the data resources can be fetched and accessed accordingly with reduced time complexity. Resource sharing is an important aspect in the field of information science. The retrieval techniques are pointed out based on the ideas of binary search tree, Gantt chart, text summarization. A theorem has been cited regarding the summation of total l...
متن کاملExploring Disease Association from the NHANES Data: Data Mining, Pattern Summarization, and Visual Analytics
Finding associations among different diseases is an important task in medical data mining. The NHANES data is a valuable source in exploring disease associations. However, existing studies analyzing the NHANES data focus on using statistical techniques to test a small number of hypotheses. This NHANES data has not been systematically explored for mining disease association patterns. In this reg...
متن کاملPersonal Video Manager: Managing and Mining Home Video Collections
Home video collections constitute an important source of content to be experienced within the digital entertainment context. To make such content easy to access and reuse, various video analysis technologies have been researched and developed to extract video assets for management tasks, including video shot/scene detection, keyframe extraction, and video skimming/summarization. However, one le...
متن کاملCTMS: A Comparative Text Mining System
In many applications, there is often a need for comparing multiple text collections to find commonalities and differences in topical themes, a task we refer to as comparative text mining. In this paper, we present a general comparative mining system (CTMS). The CTMS system takes any two collections of text and generates a list of cross-collection themes and their associated individual collectio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/cs/0505071 شماره
صفحات -
تاریخ انتشار 2005