Visualizing the Loss Landscape of Actor Critic Methods with Applications in Inventory Optimization

نویسندگان

چکیده

Continuous control is a widely applicable area of reinforcement learning. The main players this are actor-critic methods that utilize policy gradients neural approximators as common practice. focus our study to show the characteristics actor loss function which essential part optimization. We exploit low dimensional visualizations and provide comparisons for landscapes various algorithms. Furthermore, we apply approach multi-store dynamic inventory control, notoriously difficult problem in supply chain operations, explore shape associated with optimal policy. modelled solved using learning while having landscape favor optimality.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Least Squares Temporal Difference Actor-Critic Methods with Applications

We consider the problem of finding a control policy for a Markov Decision Process (MDP) to maximize the probability of reaching some states while avoiding some other states. This problem is motivated by applications in robotics, where such problems naturally arise when probabilistic models of robot motion are required to satisfy temporal logic task specifications. We transform this problem into...

متن کامل

Basis Expansion in Natural Actor Critic Methods

In reinforcement learning, the aim of the agent is to find a policy that maximizes its expected return. Policy gradient methods try to accomplish this goal by directly approximating the policy using a parametric function approximator; the expected return of the current policy is estimated and its parameters are updated by steepest ascent in the direction of the gradient of the expected return w...

متن کامل

Boosting the Actor with Dual Critic

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC. It is derived in a principled way from the Lagrangian dual form of the Bellman optimality equation, which can be viewed as a two-player game between the actor and a critic-like function, which is named as dual critic. Compared to its actor-critic relatives, Dual-AC has the desired property that the actor...

متن کامل

Visualizing the Loss Landscape of Neural Nets

Neural network training relies on our ability to find “good” minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e.g., skip connections) produce loss functions that train easier, and wellchosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Social Science Research Network

سال: 2022

ISSN: ['1556-5068']

DOI: https://doi.org/10.2139/ssrn.4059578