Normalized Direction-preserving Adam
نویسندگان
چکیده
Optimization algorithms for training deep models not only affects the convergence rate and stability of the training process, but are also highly related to the generalization performance of the models. While adaptive algorithms, such as Adam and RMSprop, have shown better optimization performance than stochastic gradient descent (SGD) in many scenarios, they often lead to worse generalization performance than SGD, when used for training deep neural networks (DNNs). In this work, we identify two problems of Adam that may degrade the generalization performance. As a solution, we propose the normalized direction-preserving Adam (ND-Adam) algorithm, which combines the best of both worlds, i.e., the good optimization performance of Adam, and the good generalization performance of SGD. In addition, we further improve the generalization performance in classification tasks, by using batch-normalized softmax. This study suggests the need for more precise control over the training process of DNNs.
منابع مشابه
Polynomial Preserving Recovery for Meshes from Delaunay Triangulation or with High Aspect Ratio
A newly developed polynomial preserving gradient recovery technique is further studied. The results are twofold. First, error bounds for the recovered gradient are established on the Delaunay type mesh when the major part of the triangulation is made of near parallelogram triangle pairs with -perturbation. It is found that the recovered gradient improves the leading term of the error by a facto...
متن کاملOn strongly Jordan zero-product preserving maps
In this paper, we give a characterization of strongly Jordan zero-product preserving maps on normed algebras as a generalization of Jordan zero-product preserving maps. In this direction, we give some illustrative examples to show that the notions of strongly zero-product preserving maps and strongly Jordan zero-product preserving maps are completely different. Also, we prove that the direct p...
متن کاملA Step Size Preserving Directed Mutation Operator
The main idea of the directed mutation is to focus on mutating into the most beneficial direction by using a customizable asymmetrical distribution. In this way the optimization strategy can adopt the most promising mutation direction over the generations. It thus becomes nearly as flexible as with Schwefel’s correlated mutation [2] but causes only linear growth of the strategy parameters inste...
متن کاملNotes on Property - Preserving Encryption
The first type of specialized encryption scheme that can be used in secure outsourced storage we will look at is property-preserving encryption. This is encryption where some desired property of the plaintexts is intentionally leaked by the ciphertexts. The two main examples we will study are deterministic encryption, which preserves the equality property, and order preserving encryption, which...
متن کاملLattice Embedding of Direction-Preserving Correspondence over Integrally Convex Set
We consider the relationship of two fixed point theorems for direction-preserving discrete correspondences. We show that, for space of no more than three dimensions, the fixed point theorem [5] of Iimura, Murota and Tamura, on integrally convex sets can be derived from Chen and Deng’s fixed point theorem [1] on lattices by expanding every direction-preserving discrete correspondence over an int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.04546 شماره
صفحات -
تاریخ انتشار 2017