Bayesian Optimization with Gradients
نویسندگان
چکیده
Bayesian optimization has been successful at global optimization of expensiveto-evaluate multimodal objective functions. However, unlike most optimization methods, Bayesian optimization typically does not use derivative information. In this paper we show how Bayesian optimization can exploit derivative information to find good solutions with fewer objective function evaluations. In particular, we develop a novel Bayesian optimization algorithm, the derivative-enabled knowledgegradient (d-KG), which is one-step Bayes-optimal, asymptotically consistent, and provides greater one-step value of information than in the derivative-free setting. d-KG accommodates noisy and incomplete derivative information, comes in both sequential and batch forms, and can optionally reduce the computational cost of inference through automatically selected retention of a single directional derivative. We also compute the d-KG acquisition function and its gradient using a novel fast discretization-free technique. We show d-KG provides state-of-the-art performance compared to a wide range of optimization procedures with and without gradients, on benchmarks including logistic regression, deep learning, kernel learning, and k-nearest neighbors.
منابع مشابه
Optimization of thermal curing cycle for a large epoxy model
Heat generation in an exothermic reaction during the curing process and low thermal conductivity of the epoxy resin produces high peak temperature and temperature gradients which result in internal and residual stresses, especially in large epoxy samples. In this paper, an optimization algorithm was developed and applied to predict the thermal cure cycle to minimize the temperature peak and the...
متن کاملDo we need “Harmless” Bayesian Optimization and “First-Order” Bayesian Optimization?
A recent empirical study highlighted the shocking result that, for many hyperparameter tuning problems, Bayesian optimization methods can be outperformed by random guessing run for twice as many iterations [1]. This is supported by theoretical results showing the optimality of random search under certain assumptions, but disagrees with other theoretical and empirical results showing that Bayesi...
متن کامل Structure Learning in Bayesian Networks Using Asexual Reproduction Optimization
A new structure learning approach for Bayesian networks (BNs) based on asexual reproduction optimization (ARO) is proposed in this letter. ARO can be essentially considered as an evolutionary based algorithm that mathematically models the budding mechanism of asexual reproduction. In ARO, a parent produces a bud through a reproduction operator; thereafter the parent and its bud compete to survi...
متن کاملSmoothed Gradients for Stochastic Variational Inference
Stochastic variational inference (SVI) lets us scale up Bayesian computation to massive data. It uses stochastic optimization to fit a variational distribution, following easy-to-compute noisy natural gradients. As with most traditional stochastic optimization methods, SVI takes precautions to use unbiased stochastic gradients whose expectations are equal to the true gradients. In this paper, w...
متن کاملComparative Analysis of Machine Learning Algorithms with Optimization Purposes
The field of optimization and machine learning are increasingly interplayed and optimization in different problems leads to the use of machine learning approaches. Machine learning algorithms work in reasonable computational time for specific classes of problems and have important role in extracting knowledge from large amount of data. In this paper, a methodology has been employed to opt...
متن کامل