Learning from Incomplete Data
نویسنده
چکیده
Survey non-response is an important problem in statistics, economics and social sciences. The paper reviews the missing data framework of Little & Rubin [Little and Rubin, 1986]. It presents a survey of techniques to deal with non-response in surveys using a likelihood based approach. The focuses on the case where the probability of a data missing depends on its value. The paper uses the two-step model introduced by Heckman to illustrate and analyze three popular techniques; maximum likelihood, expectation maximization and Heckman correction. We formulate solutions to heckman’s model and discuss their advantages and disadvantages. The paper also compares and contrasts Heckman’s procedure with the EM algorithm, pointing out why Heckman correction is not an EM algorithm.
منابع مشابه
Ensemble-based Top-k Recommender System Considering Incomplete Data
Recommender systems have been widely used in e-commerce applications. They are a subclass of information filtering system, used to either predict whether a user will prefer an item (prediction problem) or identify a set of k items that will be user-interest (Top-k recommendation problem). Demanding sufficient ratings to make robust predictions and suggesting qualified recommendations are two si...
متن کاملMining from incomplete quantitative data by fuzzy rough sets
Machine learning can extract desired knowledge from existing training examples and ease the development bottleneck in building expert systems. Most learning approaches derive rules from complete data sets. If some attribute values are unknown in a data set, it is called incomplete. Learning from incomplete data sets is usually more difficult than learning from complete data sets. In the past, t...
متن کاملLearning to Classify Incomplete Examples
Most research on supervised learning assumes the attributes of training and test examples are completely speciied. Real-world data, however, is often incomplete. This paper studies the task of learning to classify incomplete test examples, given incomplete (resp., complete) training data. We rst show that the performance task of classifying incomplete examples requires the use of default classi...
متن کاملStructure Learning of Probabilistic Relational Models from Incomplete Relational Data
Existing relational learning approaches usually work on complete relational data, but real-world data are often incomplete. This paper proposes the MGDA approach to learn structures of probabilistic relational model (PRM) from incomplete relational data. The missing values are filled in randomly at first, and a maximum likelihood tree (MLT) is generated from the complete data sample. Then, Gibb...
متن کامل