Domain-Aware Multi-Truth Discovery from Conflicting Sources

نویسندگان

  • Xueling Lin
  • Lei Chen
چکیده

In the Big Data era, truth discovery has served as a promising technique to solve conflicts in the facts provided by numerous data sources. The most significant challenge for this task is to estimate source reliability and select the answers supported by high quality sources. However, existing works assume that one data source has the same reliability on any kinds of entity, ignoring the possibility that a source may vary in reliability on different domains. To capture the influence of various levels of expertise in different domains, we integrate domain expertise knowledge to achieve a more precise estimation of source reliability. We propose to infer the domain expertise of a data source based on its data richness in different domains. We also study the mutual influence between domains, which will affect the inference of domain expertise. Through leveraging the unique features of the multi-truth problem that sources may provide partially correct values of a data item, we assign more reasonable confidence scores to value sets. We propose an integrated Bayesian approach to incorporate the domain expertise of data sources and confidence scores of value sets, aiming to find multiple possible truths without any supervision. Experimental results on two real-world datasets demonstrate the feasibility, efficiency and effectiveness of our approach. PVLDB Reference Format: Xueling LIN, Lei CHEN. Domain-Aware Multi-Truth Discovery from Conflicting Sources. PVLDB, 11(5): 635 647, 2018. DOI: https://doi.org/10.1145/3177732.3177739

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Confidence-Aware Approach for Truth Discovery on Long-Tail Data

In many real world applications, the same item may be described by multiple sources. As a consequence, conflicts among these sources are inevitable, which leads to an important task: how to identify which piece of information is trustworthy, i.e., the truth discovery task. Intuitively, if the piece of information is from a reliable source, then it is more trustworthy, and the source that provid...

متن کامل

Exploring Relevance as Truth Criterion on the Web and Classifying Claims in Belief Levels

The Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this paper...

متن کامل

Integrating Conflicting Data: The Role of Source Dependence

Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conf...

متن کامل

Truth Discovery from Conflicting Multi-Valued Objects

Truth discovery is a fundamental research topic, which aims at identifying the true value(s) of objects of interest given the conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, we can still point out two significant issues unsolved: i) single-valued assumption, i.e., current methods assume only one true value for each object, while in reali...

متن کامل

Truth Discovery to Resolve Object Conflicts in Linked Data

In the community of Linked Data, anyone can publish their data as Linked Data on the web because of the openness of the Semantic Web. As such, RDF (Resource Description Framework) triples described the same real-world entity can be obtained from multiple sources; it inevitably results in conflicting objects for a certain predicate of a real-world entity. The objective of this study is to identi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • PVLDB

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2018