Sparse Online Learning via Truncated Gradient: Appendix

ثبت نشده

چکیده

In the setting of standard online learning, we are interested in sequential prediction problems where for i = 1, 2, . . .: 1. An unlabeled example xi = [xi , . . . , x d i ] ∈ R arrives. 2. We make a prediction ŷi based on the current weights wi = [w i , . . . , w d i ] ∈ R. 3. We observe yi, let zi = (xi, yi), and incur some known loss L(wi, zi) convex in parameter wi. 4. We update weights according to some rule: wi+1 ← f(wi). We want an update rule f that allows us to bound the sum of losses, ∑t i=1 L(wi, zi), as well as achieving sparsity. For this purpose, we start with the standard stochastic gradient descent (SGD) rule, which is of the form:

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional sparse data, however, this method suffers from slow convergence and high variance due to heterogeneity in feature sparsity. To mitigate this issue, we introd...

متن کامل

Sparse Online Learning via Truncated Gradient

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can ...

متن کامل

Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply

The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,00...

متن کامل

Unbiased Online Recurrent Optimization

The novel Unbiased Online Recurrent Optimization (UORO) algorithm allows for online learning of general recurrent computational graphs such as recurrent network models. It works in a streaming fashion and avoids backtracking through past activations and inputs. UORO is computationally as costly as Truncated Backpropagation Through Time (truncated BPTT), a widespread algorithm for online learnin...

متن کامل