Distributed Systems for ML Lecturer : Eric P . Xing Scribes
نویسندگان
چکیده
Coordinate descent is a general strategy for convex optimization problems. The basic idea is iteratively solve the problem by optimizing the objective only with respect to one optimization variable at a time while keeping all other dimensions fixed. While the order in which the dimensions are optimized can be chosen arbitrarily, it is crucial for convergence guarantees that updates occur sequentially.
منابع مشابه
Lectures 7, 8 and 9: October 11, 13 and 18, 1999 Lecturer: Mona Singh Scribes: Ching Law and Casim A. Sarkar
متن کامل
13 : Theory of Variational Inference
We then went on to introduce mean field variational inference, in which the variational distribution over latent variables is assumed to factor as a product of functions over each latent variable. In this sense, the joint approximation is made using the simplifying assumption that all latent variables are independent. Alternatively, we can use approximations in which only groups of variables ar...
متن کاملConsistent Bounded-Asynchronous Parameter Servers for Distributed ML
In distributed ML applications, shared parameters are usually replicated among computing nodes to minimize network overhead. Therefore, proper consistency model must be carefully chosen to ensure algorithm’s correctness and provide high throughput. Existing consistency models used in generalpurpose databases and modern distributed ML systems are either too loose to guarantee correctness of the ...
متن کاملSpectral Algorithms for Graphical Models Lecturer : Eric
Modern machine learning tasks often deal with high-dimensional data. One typically makes some assumption on structure, like sparsity, to make learning tractable over high-dimensional instances. Another common assumption on structure is that of latent variables in the generative model. In latent variable models, one attempts to perform inference not only on observed variables, but also on unobse...
متن کامل17 : Markov Chain Monte
which decreases as J gets larger. So the approximation will be more accurate as we obtain more samples. Here is an example of using Monte Carlo methods to integrate away weights in Bayesian neural networks. Let y(x) = f(x,w) for response y and input x, and let p(w) be the prior over the weights w. The posterior distribution of w given the data D is p(w|D) ∝ p(D|w)p(w) where p(D|w) is the likeli...
متن کامل