Semi-supervised learning using greedy max-cut
نویسندگان
چکیده
Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes.
منابع مشابه
A Graph-based Semi-Supervised Learning Approach and its Feature Selection
With the lack of labeled data, the learning accuracy of a supervised learning algorithm deteriorates. Meanwhile, it is more easy to collect plenty of unlabeled data. Furthermore, a graph can be used to express the underlying distribution of data in the dataset. Thus, a classification problem is converted to a graph partition problem. One typical graph-based semi-supervised learning algorithm is...
متن کاملSemi-Supervised Learning with Max-Margin Graph Cuts
This paper proposes a novel algorithm for semisupervised learning. This algorithm learns graph cuts that maximize the margin with respect to the labels induced by the harmonic function solution. We motivate the approach, compare it to existing work, and prove a bound on its generalization error. The quality of our solutions is evaluated on a synthetic problem and three UCI ML repository dataset...
متن کاملConvergence rate of the semi-supervised greedy algorithm
This paper proposes a new greedy algorithm combining the semi-supervised learning and the sparse representation with the data-dependent hypothesis spaces. The proposed greedy algorithm is able to use a small portion of the labeled and unlabeled data to represent the target function, and to efficiently reduce the computational burden of the semi-supervised learning. We establish the estimation o...
متن کاملSubmodular Optimization for Efficient Semi-supervised Support Vector Machines
In this work we present a quadratic programming approximation of the Semi-Supervised Support Vector Machine (S3VM) problem, namely approximate QP-S3VM, that can be efficiently solved using off the shelf optimization packages. We prove that this approximate formulation establishes a relation between the low density separation and the graph-based models of semi-supervised learning (SSL) which is ...
متن کاملFast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problem
The rise of convex programming has changed the face of many research fields in recent years, machine learning being one of the ones that benefitted the most. A very recent developement, the relaxation of combinatorial problems to semi-definite programs (SDP), has gained considerable attention over the last decade (Helmberg, 2000; De Bie and Cristianini, 2004a). Although SDP problems can be solv...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 14 شماره
صفحات -
تاریخ انتشار 2013