Lazy Sparse Stochastic Gradient Descent for Regularized Mutlinomial Logistic Regression
نویسنده
چکیده
Stochastic gradient descent efficiently estimates maximum likelihood logistic regression coefficients from sparse input data. Regularization with respect to a prior coefficient distribution destroys the sparsity of the gradient evaluated at a single example. Sparsity is restored by lazily shrinking a coefficient along the cumulative gradient of the prior just before the coefficient is needed. 1 Multinomial Logistic Model A multinomial logistic model classifies d-dimensional real-valued input vectors x ∈ R into one of k outcomes c ∈ {0, . . . , k − 1} using k − 1 parameter vectors β0, . . . , βk−2 ∈ R: p(c | x, β) = exp(βc · x) Zx if c < k − 1 1 Zx if c = k − 1 (1) where the linear predictor is inner product: βc · x = ∑
منابع مشابه
Asynchronous Stochastic Proximal Optimization Algorithms with Variance Reduction
Regularized empirical risk minimization (R-ERM) is an important branch of machine learning, since it constrains the capacity of the hypothesis space and guarantees the generalization ability of the learning algorithm. Two classic proximal optimization algorithms, i.e., proximal stochastic gradient descent (ProxSGD) and proximal stochastic coordinate descent (ProxSCD) have been widely used to so...
متن کاملFast Implementation of l 1 Regularized Learning Algorithms Using Gradient Descent Methods ∗
With the advent of high-throughput technologies, l1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that l1 regularized learning problems can be easily solved by using gradient-descent techniques. The basic idea is to transform a ...
متن کاملEfficient Elastic Net Regularization for Sparse Linear Models
We extend previous work on efficiently training linear models by applying stochastic updates to non-zero features only, lazily bringing weights current as needed. To date, only the closed form updates for the l1, l∞, and the rarely used l2 norm have been described. We extend this work by showing the proper closed form updates for the popular l22 and elastic net regularized models. We show a dyn...
متن کاملFast Implementation of ℓ1Regularized Learning Algorithms Using Gradient Descent Methods
With the advent of high-throughput technologies, l1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that l1 regularized learning problems can be easily solved by using gradient-descent techniques. The basic idea is to transform a ...
متن کاملA coordinate gradient descent method for ℓ1-regularized convex minimization
In applications such as signal processing and statistics, many problems involve finding sparse solutions to under-determined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing `1-regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated a...
متن کامل