A prescriptive Dirichlet power allocation policy with deep reinforcement learning
نویسندگان
چکیده
Prescribing optimal operation based on the condition of system, and thereby potentially prolonging its remaining useful lifetime, has tremendous potential in terms actively managing availability, maintenance, costs complex systems. Reinforcement learning (RL) algorithms are particularly suitable for this type problem given their capabilities. A special case a prescriptive is power allocation task, which can be considered as sequential whereby action space bounded by simplex constraint. general continuous action-space solution such problems still remained an open research question RL algorithms. In space, standard Gaussian policy applied reinforcement does not support constraints, while Gaussian-softmax introduces bias during training. work, we propose Dirichlet tasks analyze variance gradients. We demonstrate that bias-free provides significantly faster convergence, better performance, robustness to hyperparameter changes compared policy. Moreover, applicability proposed algorithm evaluate performance study set multiple lithium-ion (Li-I) battery The experimental results prescribe operation, improving efficiency sustainability multi-power source • framework proposed. with method outperforms current SOTA methods shows robustness. End-to-end learning. good scalability transferability.
منابع مشابه
On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning
Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...
متن کاملDeep Reinforcement Learning for Resource Allocation in V2V Communications
In this article, we develop a decentralized resource allocation mechanism for vehicle-to-vehicle (V2V) communication systems based on deep reinforcement learning. Each V2V link is considered as an agent, making its own decisions to find optimal sub-band and power level for transmission. Since the proposed method is decentralized, the global information is not required for each agent to make its...
متن کاملDeep Reinforcement Learning with POMDPs
Recent work has shown that Deep Q-Networks (DQNs) are capable of learning human-level control policies on a variety of different Atari 2600 games [1]. Other work has looked at treating the Atari problem as a partially observable Markov decision process (POMDP) by adding imperfect state information through image flickering [2]. However, these approaches leverage a convolutional network structure...
متن کاملReinforcement Learning with Deep Architectures
There is both theoretical and empirical evidence that deep architectures may be more appropriate than shallow architectures for learning functions which exhibit hierarchical structure, and which can represent high level abstractions. An important development in machine learning research in the past few years has been a collection of algorithms that can train various deep architectures effective...
متن کاملReinforcement Learning with Policy Constraints
This paper addresses the problem of knowledge transfer in lifelong reinforcement learning. It proposes an algorithm which learns policy constraints, i.e., rules that characterize action selection in entire families of reinforcement learning tasks. Once learned, policy constraints are used to bias learning in future, similar reinforcement learning tasks. The appropriateness of the algorithm is d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Reliability Engineering & System Safety
سال: 2022
ISSN: ['1879-0836', '0951-8320']
DOI: https://doi.org/10.1016/j.ress.2022.108529