Statistically Significant Pattern Mining With Ordinal Utility

نویسندگان

چکیده

Statistically significant pattern mining (SSPM), which evaluates each via a hypothesis test, is an essential and challenging data task for knowledge discovery. We introduce preference relation between patterns aim to discover the most preferred under constraint of statistical significance, has never been considered in existing SSPM problems. propose iterative multiple testing procedure that can alternately reject safely ignore less useful hypotheses than rejected one. By filtering out with low utility, we avoid significance budget consumption rejecting useless (uninteresting) focus on more patterns, leading discoveries. show proposed method control familywise error rate (FWER) certain assumptions, be satisfied by realistic problem class SSPM. also always discovers equally or Tarone-Bonferroni Subfamily-wise Multiple Testing (SMT). Finally, conducted several experiments both synthetic real-world evaluate performance our method. The discovered many datasets all five tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Mining Statistically Significant Attribute Association Information

Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes. Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper, we focus on mining tw...

متن کامل

High-Utility Sequential Pattern Mining with Multiple Minimum Utility Thresholds

High-utility sequential pattern mining is an emerging topic in recent decades and most algorithms were designed to identify the complete set of high-utility sequential patterns under the single minimum utility threshold. In this paper, we first propose a novel framework called high-utility sequential pattern mining with multiple minimum utility thresholds to mine high utility sequential pattern...

متن کامل

Algorithms for Efficient Mining of Statistically Significant Attribute Association Information

Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes and class (if present). Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper...

متن کامل

Mining Statistically Significant Substrings using the Chi-Square Statistic

The problem of identification of statistically significant patterns in a sequence of data has been applied to many domains such as intrusion detection systems, financial models, web-click records, automated monitoring systems, computational biology, cryptology, and text analysis. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to ra...

متن کامل

Mining Statistically Significant Patterns using the Chi-Square Statistic

Statistical significance is used to ascertain whether the outcome of a given experiment can be ascribed to some extraneous factors or is solely due to chance. An observed pattern of events is deemed to be statistically significant if it is unlikely to have occurred due to randomness or chance alone. In the thesis, we study the problem of identifying the statistically relevant patterns in string...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2022

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2022.3208626