Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

نویسندگان

  • Yang Song
  • Jun Zhu
چکیده

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices. Introduction Matrix completion has found applications in many situations, such as collaborative filtering. Let Zm×n denote the data matrix with m rows and n columns, of which only a small number of entries are observed, indexed by Ω ⊂ [m] × [n]. We denote the possibly noise corrupted observations of Z on Ω as PΩ(X), where PΩ is a projection operator that retains entries with indices from Ω and replaces others with 0. The matrix completion task aims at completing missing entries of Z based on PΩ(X), under the low-rank assumption rank(Z) min(m,n). When a squared-error loss is adopted, it can be written as solving: min Z 1 2σ2 ‖PΩ(X − Z)‖2F + λ rank(Z), (P0) where ‖PΩ(A)‖2F = ∑ (i,j)∈Ω a 2 ij ; λ is a positive regularization parameter; and σ is the noise variance. Unfortunately, the term rank(Z) makes P0 NP-hard. Therefore, the nuclear norm ‖Z‖∗ has been widely adopted as a convex surrogate (Fazel 2002) to the rank function to turn P0 to a convex problem: min Z 1 2σ2 ‖PΩ(X − Z)‖2F + λ ‖Z‖∗ . (P1) Copyright © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Though P1 is convex, the definition of nuclear norm makes the problem still not easy to solve. Based on a variational formulation of the nuclear norm, it has been popular to solve an equivalent and easier low-rank matrix factorization (MF) form of P1: min A,B 1 2σ2 ‖PΩ(X −AB)‖2F + λ 2 (‖A‖2F + ‖B‖ 2 F ). (1) Though not joint convex, this MF formulation can be solved by alternately optimizing over A and B for local optima. As the regularization terms of MF are friendlier than the nuclear norm, many matrix factorization methods have been proposed to complete matrices, including maximummargin matrix factorization (M3F) (Srebro, Rennie, and Jaakkola 2004; Rennie and Srebro 2005) and Bayesian probabilistic matrix factorization (BPMF) (Lim and Teh 2007; Salakhutdinov and Mnih 2008). Furthermore, the simplicity of the MF formulation helps people adapt it and generalize it; e.g., (Xu, Zhu, and Zhang 2012; Xu, Zhu, and Zhang 2013) incorporate maximum entropy discrimination (MED) and nonparametric Bayesian methods to solve a modified MF problem. In contrast, there are relatively fewer algorithms to directly solve P1 without the aid of matrix factorization. Such methods need to handle the spectrum of singular values. These spectral regularization algorithms require optimization on a Stiefel manifold (Stiefel 1935; James 1976), which is defined as the set of k-tuples (u1,u2, · · · ,uk) of orthonormal vectors in R. This is the main difficulty that has prevented the attempts, if any, to develop Bayesian methods based on the spectral regularization formulation. Though matrix completion via spectral regularization is not easy, there are potential advantages over the matrix factorization approach. One of the benefits is the direct control over singular values. By imposing various priors on singular values, we can incorporate abundant information to help matrix completion. For example, Todeschini et al. (Todeschini, Caron, and Chavent 2013) put sparsity-inducing priors on singular values, naturally leading to hierarchical adaptive nuclear norm (HANN) regularization, and they reported promising results. In this paper, we aim to develop a new formulation of the nuclear norm, hopefully having the same simplicity as MF and retaining some good properties of spectral regularar X iv :1 51 2. 01 11 0v 2 [ cs .N A ] 2 5 D ec 2 01 5 ization. The idea is to prove the orthonormality insignificance property of P1. Based on the new formulation, we develop a novel Bayesian model via a sparsity-inducing prior on singular values, allowing various dimensions to have different regularization parameters and automatically infer them. This involves some natural modifications to our new formulation to make it more flexible and adaptive, as people typically do in Bayesian matrix factorization. Empirical Bayesian methods are then employed to avoid parameter tuning. Experiments about rank recovery on synthetic matrices and collaborative filtering on some popular benchmark datasets demonstrate competitive results of our method in comparison with various state-of-the-art competitors. Notably, experiments on synthetic data show that our method performs considerably better when the matrices are very sparse, suggesting the robustness offered by using sparsityinducing priors. Relaxed Spectral Regularization Bayesian matrix completion based on matrix factorization is relatively easy, with many examples (Lim and Teh 2007; Salakhutdinov and Mnih 2008). In fact, we can view (1) as a maximum a posterior (MAP) estimate of a simple Bayesian model, whose likelihood is Gaussian, i.e., for (i, j) ∈ Ω, Xij ∼ N ((AB)ij , σ), and the priors on A and B are also Gaussian, i.e., p(A) ∝ exp(−λ ‖A‖2F /2) and p(B) ∝ exp(−λ ‖B‖2F /2). It is now easy to do the posterior inference since the prior and likelihood are conjugate. However, the same procedure faces great difficulty when we attempt to develop Bayesian matrix completion based on the more direct spectral regularization formulation P1. This is because the prior p(Z) ∝ exp(−λ ‖Z‖∗) is not conjugate to the Gaussian likelihood (or any other common likelihood). To analyze p(Z) more closely, we can conduct singular value decomposition (SVD) on Z to get Z = ∑r k=1 dkukv T k , where ~ d := {dk : k ∈ [r]} are singular values; U := {uk : k ∈ [r]} and V := {vk : k ∈ [r]} are orthonormal singular vectors on Stiefel manifolds. Though we can define a factorized prior p(Z) = p(~ d)p(U)p(V ), any prior on U or V (e.g., the uniform Haar prior (Todeschini, Caron, and Chavent 2013)) needs to deal with a Stiefel manifold, which is highly nontrivial. In fact, handling distributions embedded on Stiefel manifolds still remains a largely open problem, though some results (Byrne and Girolami 2013; Hoff 2009; Dobigeon and Tourneret 2010) exist in the literature of directional statistics. Fortunately, as we will prove in Theorem 1 that the orthonormality constraints on U and V are not necessary for spectral regularization. Rather, the unit sphere constraints ‖uk‖ ≤ 1 and ‖vk‖ ≤ 1, for all k ∈ [r], are sufficient to get the same optimal solutions to P1. We call this phenomenon orthonormality insignificance. We will call spectral regularization with orthonormality constraints relaxed by unit sphere constraints relaxed spectral regularization. Orthonormality insignificance for spectral regularization We now present an equivalent formulation of the spectral regularization in P1 by proving its orthornormality insignificance property. With the SVD of Z, we first rewrite P1 equivalently as P1′ to show all constraints explicitly: min ~ d,U,V 1 2σ2 ∥∥∥∥PΩ ( X − r ∑ k=1 dkukv T k ∥∥∥∥ 2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Primal-Dual methods for sparse constrained matrix completion

We develop scalable algorithms for regular and non-negative matrix completion. In particular, we base the methods on trace-norm regularization that induces a low rank predicted matrix. The regularization problem is solved via a constraint generation method that explicitly maintains a sparse dual and the corresponding low rank primal solution. We provide a new dual block coordinate descent algor...

متن کامل

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

We propose a novel class of algorithms for low rank matrix completion. Our approach builds on novel penalty functions on the singular values of the low rank matrix. By exploiting a mixture model representation of this penalty, we show that a suitably chosen set of latent variables enables to derive an ExpectationMaximization algorithm to obtain a Maximum A Posteriori estimate of the completed l...

متن کامل

Big Learning with Bayesian Methods

The explosive growth in data volume and the availability of cheap computing resources have sparked increasing interest in Big learning, an emerging subfield that studies scalable machine learning algorithms, systems and applications with Big Data. Bayesian methods represent one important class of statistical methods for machine learning, with substantial recent developments on adaptive, flexibl...

متن کامل

Pruning from Adaptive Regularization

Inspired by the recent upsurge of interest in Bayesian methods we consider adaptive regularization. A generalization based scheme for adaptation of regularization parameters is introduced and compared to Bayesian regularization. We show that pruning arises naturally within both adaptive regularization schemes. As model example we have chosen the simplest possible: estimating the mean of a rando...

متن کامل

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution

Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. Due to the highly nonconvex nature of the empirical loss, stateof-the-art procedures often require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. For vanilla procedures such as gradient descen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016