نتایج جستجو برای: policy iterations

تعداد نتایج: 276392  

Journal: :Journal of Parallel and Distributed Computing 2022

Many problems can be solved by iteration multiple participants (processors, servers, routers etc.). Previous mathematical models for such asynchronous iterations assume a single function being iterated fixed set of participants. We will call static since the system's configuration does not change. However in several real-world examples, as inter-domain routing, both and change frequently while ...

Background: There is limited understanding about the development of the online one-stop shops for evidence in a limited-resource setting, such as Uganda. This study aimed to provide a comprehensive account of the development process of the online resource for local policy and systems-relevant information in this setting. Methods: We utilized a case study design to address our objective where ...

Journal: :CoRR 2016
Ke Li Jitendra Malik

Algorithm design is a laborious process and often requires many iterations of ideation and validation. In this paper, we explore automating algorithm design and present a method to learn an optimization algorithm. We approach this problem from a reinforcement learning perspective and represent any particular optimization algorithm as a policy. We learn an optimization algorithm using guided pol...

Journal: :Automatica 2021

Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, reinforcement learning (RL) problem. PI has also served as the fundamental developing RL methods. In this paper, we propose two methods, called differential (DPI) integral (IPI), their variants, general framework continuous time space (CTS)...

Journal: :Nature Reviews Cancer 2012

Journal: :Journal of Humanistic Mathematics 2021

Journal: :Pacific Journal of Mathematics 1977

2012
Mohammad Ghavamzadeh Alessandro Lazaric

The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: direct policy iteration (DPI) methods that directly assign the output of the classifier (the approximate greedy policy w.r.t. the current policy) to the next policy, and conservative policy iteration (CPI) methods in which the new policy is a mixture distribution of the current policy and th...

2017
Xavier Alameda-Pineda Andrea Pilzer Dan Xu Nicu Sebe Elisa Ricci Bruno Kessler

We implemented our LENA pooling layer within the Caffe framework and ran all our experiments using a Tesla K40 GPU. All the networks were fine-tuned from the convolutional filters obtained when training these networks for the 1,000 image classification task on the ImageNet dataset. We iterated the stochastic gradient descent algorithm for 10,000 iterations with a momentum of μ = 0.9 and a weigh...

Journal: :Journal of Computational and Applied Mathematics 2000

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید