sgd

Deep Neural Networks with Multistate Activation Functions

2015

Chenghao Cai Yanyan Xu Dengfeng Ke Kaile Su

We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs p...

متن کامل

Submarine fresh groundwater discharge into Laizhou Bay comparable to the Yellow River flux.

Journal: :Scientific reports 2015

Xuejing Wang Hailong Li Jiu Jimmy Jiao D A Barry Ling Li Xin Luo Chaoyue Wang Li Wan Xusheng Wang Xiaowei Jiang Qian Ma Wenjing Qu

Near- and off-shore fresh groundwater resources become increasingly important with the social and economic development in coastal areas. Although large scale (hundreds of km) submarine groundwater discharge (SGD) to the ocean has been shown to be of the same magnitude order as river discharge, submarine fresh groundwater discharge (SFGD) with magnitude comparable to large river discharge is nev...

متن کامل

Triply Stochastic Gradients on Multiple Kernel Learning

2017

Xiang Li Bin Gu Shuang Ao Huaimin Wang Charles X. Ling

Multiple Kernel Learning (MKL) is highly useful for learning complex data with multiple cues or representations. However, MKL is known to have poor scalability because of the expensive kernel computation. Dai et al (2014) proposed to use a doubly Stochastic Gradient Descent algorithm (doubly SGD) to greatly improve the scalability of kernel methods. However, the algorithm is not suitable for MK...

متن کامل

CyEBP« mediates myeloid differentiation and is regulated by the CCAAT displacement protein (CDPycut)

2001

Arati Khanna-Gupta Theresa Zibello Hong Sun Julie Lekstrom-Himes Nancy Berliner

Neutrophils from CCAAT enhancer binding protein epsilon (Cy EBP«) knockout mice have morphological and biochemical features similar to those observed in patients with an extremely rare congenital disorder called neutrophil-specific secondary granule deficiency (SGD). SGD is characterized by frequent bacterial infections attributed, in part, to the lack of neutrophil secondary granule proteins (...

متن کامل

On Scalable Inference with Stochastic Gradient Descent

Journal: :CoRR 2017

Yixin Fang Jinfeng Xu Lei Yang

In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains m...

متن کامل

Distributed stochastic optimization for deep learning (thesis)

Journal: :CoRR 2016

Sixin Zhang

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the existing ADMM method in the round-robin sche...

متن کامل

The Hidden Vulnerability of Distributed Learning in Byzantium

Journal: :CoRR 2018

El Mahdi El Mhamdi Rachid Guerraoui Sébastien Rouault

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantin...

متن کامل

Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling

Journal: :CoRR 2017

Qi Meng Wei Chen Yue Wang Zhiming Ma Tie-Yan Liu

When using stochastic gradient descent (SGD) to solve large-scale machine learning problems, a common practice of data processing is to shuffle the training data, partition the data across multiple threads/machines if needed, and then perform several epochs of training on the re-shuffled (either locally or globally) data. The above procedure makes the instances used to compute the gradients no ...

متن کامل

Theory of Deep Learning III: Generalization Properties of SGD

2017

Chiyuan Zhang Qianli Liao Alexander Rakhlin Brando Miranda Noah Golowich Tomaso Poggio

In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...

متن کامل

Musings on Deep Learning: Properties of SGD

2017

Chiyuan Zhang Qianli Liao Alexander Rakhlin Brando Miranda Noah Golowich Tomaso Poggio

We ruminate with a mix of theory and experiments on the optimization and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predictive performance when overparametrization relative to the number of training data suggests overfitting. We dream an explanation of these...

متن کامل