Data Analysis Project: Σ-Optimality for Active Learning on Gaussian Random Fields
نویسندگان
چکیده
A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1− 1/e) globally optimal solution. However, L loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification.
منابع مشابه
Σ-Optimality for Active Learning on Gaussian Random Fields
A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1− 1/e)...
متن کاملSubmodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields
Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the prediction covariance Σ (V-optimality) on GRFs has proven successful in batch active learn...
متن کاملActive Search and Bandits on Graphs using Sigma-Optimality
Many modern information access problems involve highly complex patterns that cannot be handled by traditional keyword based search. Active Search is an emerging paradigm that helps users quickly find relevant information by efficiently collecting and learning from user feedback. We consider active search on graphs, where the nodes represent the set of instances users want to search over and the...
متن کاملOn Temporal Evolution in Data Streams
The future of CiteSeer : CiteSeer[superscript x] p. 2 Learning to have fun p. 3 Winning the DARPA grand challenge p. 4 Challenges of urban sensing p. 5 Learning in one-shot strategic form games p. 6 A selective sampling strategy for label ranking p. 18 Combinatorial Markov random fields p. 30 Learning stochastic tree edit distance p. 42 Pertinent background knowledge for learning protein gramma...
متن کاملFactors Influencing Robustness and Effectiveness of Conditional Random Fields in Active Learning Frameworks
Active learning approaches reduce the annotation cost required by traditional supervised approaches to reach the same effectiveness by actively selecting informative instances during the learning phase. However, effectiveness and robustness of the learnt models are influenced by a number of factors. In this paper we investigate the factors that affect the effectiveness, more specifically in ter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014