Learning representations through stochastic gradient descent in cross-validation error
نویسندگان
چکیده
Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning the representations directly from the incoming data stream reduces the human labour involved in designing a learning system. More importantly, this allows in scaling of a learning system for difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, which learns incoming weights of hidden units based on the meta-gradient descent approach, that was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.
منابع مشابه
Gradient-based Hyperparameter Optimization through Reversible Learning
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization...
متن کاملSimultaneous Model Selection and Optimization through Parameter-free Stochastic Learning
Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more andmore importance, thanks to their scalability. While various methods have been proposed to speed up theirconvergence, the model selection phase is often ignored. In fact, in theoretical works most of the timeassumptions are made, for example, on the prior knowledge of the norm of ...
متن کاملPredictive State Smoothing (PRESS): Scalable non-parametric regression for high-dimensional data with variable selection
We introduce predictive state smoothing (PRESS), a novel semi-parametric regression technique for high-dimensional data using predictive state representations. PRESS is a fully probabilistic model for the optimal kernel smoothing matrix. We present efficient algorithms for the joint estimation of the state space as well as the non-linear mapping of observations to predictive states and as an al...
متن کاملCrossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
Representations are fundamental to artificial intelligence. The performance of a learning system depends on the type of representation used for representing the data. Typically, these representations are hand-engineered using domain knowledge. More recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. ...
متن کاملLearning Similarity with Operator-valued Large-margin Classifiers
A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classifiers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1612.02879 شماره
صفحات -
تاریخ انتشار 2016