Learning Top-k Transformation Rules

نویسندگان

Sunanda Patro

Wei Wang

چکیده

Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex coreference records. This seriously limits the effectiveness of existing approaches. In this work, we propose an automatic method to extract top-k high quality transformation rules given a set of possibly coreferent record pairs. We propose an effective algorithm that performs careful local analyses for each record pair and generates candidate rules; the algorithm finally chooses top-k rules based on a scoring function. We have conducted extensive experiments on real datasets, and our proposed algorithm has substantial advantage over the previous algorithm in both effectiveness and efficiency.

متن کامل

منابع مشابه

DeepEye: Towards Automatic Data Visualization

Data visualization is invaluable for explaining the significance of data to people who are visually oriented. The central task of automatic data visualization is, given a dataset, to visualize its compelling stories by transforming the data (e.g., selecting attributes, grouping and binning values) and deciding the right type of visualization (e.g., bar or line charts). We present DEEPEYE, a nov...

متن کامل

A Quick Method for Querying Top-k Rules from Class Association Rule Set

Finding class association rules (CARs) is one of the most important research topics in data mining and knowledge discovery, with numerous applications in many fields. However, existing techniques usually generate an extremely large number of results, which makes analysis difficult. In many applications, experts are interested in only the most relevant results. Therefore, we propose a method for...

متن کامل

Transformation and Aggregation Preprocessing for Top-k Recommendation GAP Rules Induction

In this paper we describe the KTIML team approach to RuleML 2015 Rule-based Recommender Systems for the Web of Data Challenge Track. The task is to estimate the top 5 movies for each user separately in a semantically enriched MovieLens 1M dataset. We have three results. Best is a domain specific method like "recommend for all users the same set of movies from Spielberg". Our contributions are d...

متن کامل

Ensemble-based Top-k Recommender System Considering Incomplete Data

Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...

متن کامل

Mining Top-K Non-redundant Association Rules

Association rule mining is a fundamental data mining task. However, depending on the choice of the thresholds, current algorithms can become very slow and generate an extremely large amount of results or generate too few results, omitting valuable information. Furthermore, it is well-known that a large proportion of association rules generated are redundant. In previous works, these two problem...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Learning Top-k Transformation Rules

نویسندگان

چکیده

منابع مشابه

DeepEye: Towards Automatic Data Visualization

A Quick Method for Querying Top-k Rules from Class Association Rule Set

Transformation and Aggregation Preprocessing for Top-k Recommendation GAP Rules Induction

Ensemble-based Top-k Recommender System Considering Incomplete Data

Mining Top-K Non-redundant Association Rules

عنوان ژورنال:

اشتراک گذاری