Characterising the Di↵erence and the Norm between Sequence Databases

نویسندگان

  • Frauke Hinrichs
  • Jilles Vreeken
چکیده

In pattern set mining we are after a small set of patterns that together are characteristic for the data at hand. In this paper we consider the problem of characterizing not one, but a set of sequence databases, such as a collection of articles or the chapters of a book. Our main objective is to find a set of patterns that captures the individual features of each database, while also finding shared characteristics of any subset of the data we are interested in. We formulate this problem in terms of MDL, and propose SqsNorm, an e cient algorithm to extract high quality models directly from data. We devise a heuristic to quickly find and evaluate candidates, as well as a model that takes shared and singular patterns into account. Experiments on synthetic data confirm that SqsNorm ably reconstructs the ground truth model. Experiments on text data, including a set of speeches, several book chapters, and a collection of songs, show that SqsNorm discovers informative, non-redundant and easily interpretable pattern sets that give clear insight in the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Orr Sommerfeld Solver Using Mapped Finite Di?erence Scheme for Plane Wake Flow

Linear stability analysis of the three dimensional plane wake flow is performed using a mapped finite di?erence scheme in a domain which is doubly infinite in the cross–stream direction of wake flow. The physical domain in cross–stream direction is mapped to the computational domain using a cotangent mapping of the form y = ?cot(??). The Squire transformation [2], proposed by Squire, is also us...

متن کامل

Finite Volume Methods for Convection Diffusion Problems

Introduction In this paper we consider cell centered nite di erence approximations for second order convection di usion equations of divergence type Our goal is to construct nite di erence methods of second order of approximation that satisfy the discrete maximum principle The error estimates are in the discrete Sobolev spaces associated with the considered boundary value problem Approximation ...

متن کامل

A new sequence space and norm of certain matrix operators on this space

In the present paper, we introduce the sequence space [{l_p}(E,Delta) = left{ x = (x_n)_{n = 1}^infty : sum_{n = 1}^infty left|  sum_{j in {E_n}} x_j - sum_{j in E_{n + 1}} x_jright| ^p < infty right},] where $E=(E_n)$ is a partition of finite subsets of the positive integers and $pge 1$. We investigate its topological properties and inclusion relations. Moreover, we consider the problem of fin...

متن کامل

Some inequalities involving lower bounds of operators on weighted sequence spaces by a matrix norm

Let A = (an;k)n;k1 and B = (bn;k)n;k1 be two non-negative ma-trices. Denote by Lv;p;q;B(A), the supremum of those L, satisfying the followinginequality:k Ax kv;B(q) L k x kv;B(p);where x 0 and x 2 lp(v;B) and also v = (vn)1n=1 is an increasing, non-negativesequence of real numbers. In this paper, we obtain a Hardy-type formula forLv;p;q;B(H), where H is the Hausdor matrix and 0 < q p 1. Also...

متن کامل

Ciao: a graphical navigator for software and document repositories

Programmers frequently have to retrieve and link information from various software documents to accomplish a maintenance task. Ciao is a graph-based navigator that helps programmers query and browse structural connections embedded in di erent software and document repositories. A repository consists of a collection of source documents with an associated database that describes their structure. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017