نتایج جستجو برای: sgd

تعداد نتایج: 1169  

2015
Chenghao Cai Yanyan Xu Dengfeng Ke Kaile Su

We propose multistate activation functions (MSAFs) for deep neural networks (DNNs). These MSAFs are new kinds of activation functions which are capable of representing more than two states, including the N-order MSAFs and the symmetrical MSAF. DNNs with these MSAFs can be trained via conventional Stochastic Gradient Descent (SGD) as well as mean-normalised SGD. We also discuss how these MSAFs p...

Journal: :Scientific reports 2015
Xuejing Wang Hailong Li Jiu Jimmy Jiao D A Barry Ling Li Xin Luo Chaoyue Wang Li Wan Xusheng Wang Xiaowei Jiang Qian Ma Wenjing Qu

Near- and off-shore fresh groundwater resources become increasingly important with the social and economic development in coastal areas. Although large scale (hundreds of km) submarine groundwater discharge (SGD) to the ocean has been shown to be of the same magnitude order as river discharge, submarine fresh groundwater discharge (SFGD) with magnitude comparable to large river discharge is nev...

2017
Xiang Li Bin Gu Shuang Ao Huaimin Wang Charles X. Ling

Multiple Kernel Learning (MKL) is highly useful for learning complex data with multiple cues or representations. However, MKL is known to have poor scalability because of the expensive kernel computation. Dai et al (2014) proposed to use a doubly Stochastic Gradient Descent algorithm (doubly SGD) to greatly improve the scalability of kernel methods. However, the algorithm is not suitable for MK...

2001
Arati Khanna-Gupta Theresa Zibello Hong Sun Julie Lekstrom-Himes Nancy Berliner

Neutrophils from CCAAT enhancer binding protein epsilon (Cy EBP«) knockout mice have morphological and biochemical features similar to those observed in patients with an extremely rare congenital disorder called neutrophil-specific secondary granule deficiency (SGD). SGD is characterized by frequent bacterial infections attributed, in part, to the lack of neutrophil secondary granule proteins (...

Journal: :CoRR 2017
Yixin Fang Jinfeng Xu Lei Yang

In many applications involving large dataset or online updating, stochastic gradient descent (SGD) provides a scalable way to compute parameter estimates and has gained increasing popularity due to its numerical convenience and memory efficiency. While the asymptotic properties of SGD-based estimators have been established decades ago, statistical inference such as interval estimation remains m...

Journal: :CoRR 2016
Sixin Zhang

We study the problem of how to distribute the training of large-scale deep learning models in the parallel computing environment. We propose a new distributed stochastic optimization method called Elastic Averaging SGD (EASGD). We analyze the convergence rate of the EASGD method in the synchronous scenario and compare its stability condition with the existing ADMM method in the round-robin sche...

Journal: :CoRR 2018
El Mahdi El Mhamdi Rachid Guerraoui Sébastien Rouault

While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending poisoned gradients during the training phase. Some of these approaches have been proven Byzantin...

Journal: :CoRR 2017
Qi Meng Wei Chen Yue Wang Zhiming Ma Tie-Yan Liu

When using stochastic gradient descent (SGD) to solve large-scale machine learning problems, a common practice of data processing is to shuffle the training data, partition the data across multiple threads/machines if needed, and then perform several epochs of training on the re-shuffled (either locally or globally) data. The above procedure makes the instances used to compute the gradients no ...

2017
Chiyuan Zhang Qianli Liao Alexander Rakhlin Brando Miranda Noah Golowich Tomaso Poggio

In Theory III we characterize with a mix of theory and experiments the consistency and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predicitve performance when overparametrization relative to the number of training data suggests overfitting. We describe an exp...

2017
Chiyuan Zhang Qianli Liao Alexander Rakhlin Brando Miranda Noah Golowich Tomaso Poggio

We ruminate with a mix of theory and experiments on the optimization and generalization properties of deep convolutional networks trained with Stochastic Gradient Descent in classification tasks. A present perceived puzzle is that deep networks show good predictive performance when overparametrization relative to the number of training data suggests overfitting. We dream an explanation of these...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید