نتایج جستجو برای: policy iterations
تعداد نتایج: 276392 فیلتر نتایج به سال:
Many problems can be solved by iteration multiple participants (processors, servers, routers etc.). Previous mathematical models for such asynchronous iterations assume a single function being iterated fixed set of participants. We will call static since the system's configuration does not change. However in several real-world examples, as inter-domain routing, both and change frequently while ...
Background: There is limited understanding about the development of the online one-stop shops for evidence in a limited-resource setting, such as Uganda. This study aimed to provide a comprehensive account of the development process of the online resource for local policy and systems-relevant information in this setting. Methods: We utilized a case study design to address our objective where ...
Algorithm design is a laborious process and often requires many iterations of ideation and validation. In this paper, we explore automating algorithm design and present a method to learn an optimization algorithm. We approach this problem from a reinforcement learning perspective and represent any particular optimization algorithm as a policy. We learn an optimization algorithm using guided pol...
Policy iteration (PI) is a recursive process of policy evaluation and improvement for solving an optimal decision-making/control problem, or in other words, reinforcement learning (RL) problem. PI has also served as the fundamental developing RL methods. In this paper, we propose two methods, called differential (DPI) integral (IPI), their variants, general framework continuous time space (CTS)...
The existing classification-based policy iteration (CBPI) algorithms can be divided into two categories: direct policy iteration (DPI) methods that directly assign the output of the classifier (the approximate greedy policy w.r.t. the current policy) to the next policy, and conservative policy iteration (CPI) methods in which the new policy is a mixture distribution of the current policy and th...
We implemented our LENA pooling layer within the Caffe framework and ran all our experiments using a Tesla K40 GPU. All the networks were fine-tuned from the convolutional filters obtained when training these networks for the 1,000 image classification task on the ImageNet dataset. We iterated the stochastic gradient descent algorithm for 10,000 iterations with a momentum of μ = 0.9 and a weigh...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید