Explaining Data Incompleteness in Knowledge Aggregation
نویسندگان
چکیده
Knowledge aggregation is the problem of taking information from multiple heterogeneous sources and aggregating it into a unified knowledge base. One of the main challenges in that work has been dealing with data incompleteness because data sources seldom contain complete answers to a user’s query. Current approaches leverage users’ preferences over data sources when trying to aggregate incomplete data. Nevertheless, these approaches are not adequate to satisfy users’ needs to trust aggregated data before they can use them with confidence in the presence of incomplete information. We believe such trust may be earned by providing users with the explanations for incomplete data. In this paper, we build a decision tree-based classification system to acquire context knowledge about the sources and present techniques for applying the knowledge to explain incomplete data. Our experiments suggest the decision trees we built being 87% accurate in predicting unseen data. Further, context knowledge provides good characterizations of sources that we show to be valuable and often critical to users.
منابع مشابه
Utilizing Goal-Directed Data Mining For Incompleteness Repair In Knowledge Bases
In this paper we present a methodology for goal-directed data mining of association rules and incorporation of these rules into a probabilistic knowledge base. The purpose of the data mining and rule extraction process is to repair knowledge base incompleteness uncovered during validation. We discuss how this incompleteness is uncovered and show the fundamental forms this incompleteness can tak...
متن کاملLinguistic Aggregation Functions using the MapReduce Paradigm
We explore the possible benefit that provides a linguistic approach to Big Data. The proposal illustrates how implement Linguistic Aggregation Functions using the MapReduce paradigm. The best known paradigm applied to Big Data. The proposal allows several benefits to Big Data e.g., it allows to interpret data in a more intuitive way, reduce data size into different levels of granularity, and ma...
متن کاملExplaining Heterogeneity in Risk Preferences Using a Finite Mixture Model
This paper studies the effect of the space (distance) between lotteries' outcomes on risk-taking behavior and the shape of estimated utility and probability weighting functions. Previously investigated experimental data shows a significant space effect in the gain domain. As compared to low spaced lotteries, high spaced lotteries are associated with higher risk aversion for high probabilities o...
متن کاملA Novel Type-2 Adaptive Neuro Fuzzy Inference System Classifier for Modelling Uncertainty in Prediction of Air Pollution Disaster (RESEARCH NOTE)
Type-2 fuzzy set theory is one of the most powerful tools for dealing with the uncertainty and imperfection in dynamic and complex environments. The applications of type-2 fuzzy sets and soft computing methods are rapidly emerging in the ecological fields such as air pollution and weather prediction. The air pollution problem is a major public health problem in many cities of the world. Predict...
متن کاملExplaining the relationship between components of knowledge management based on M. Mc olrvy model and social capital Ministry of Energy
In this study, the relationship between components of knowledge management on the basis of M. Mc olrvy model and social capital has been studied, statistical population consisted of employees of Ministry of Energy, who were selected among employees of Hamedan regional power and were 420 subjects. Research tool is a questionnaire and Spss statistical software is used for data analysis and struct...
متن کامل