Finite-horizon dynamic optimisation when the terminal reward is a concave functional of the distribution of the final state
نویسنده
چکیده
We consider a problem similar in many respects to a finite horizon Markov decision process, except that the reward to the individual is a strictly concave functional of the distribution of the state of the individual at final time T . Reward structures such as these are of interest to biologists studying the fitness of different strategies in a fluctuating environment. The problem fails to satisfy the usual optimality equation and cannot be solved directly by dynamic programming. We establish equations characterising the optimal final distribution and an optimal policy π∗. We show that in general π∗ will be a Markov randomised policy (or equivalently a mixture of Markov deterministic policies) and we develop an iterative, policy improvement based algorithm which converges to π∗. We also consider an infinite population version of the problem, and show that the population cannot do better using a coordinated policy than by each individual independently following the individual optimal policy π∗.
منابع مشابه
Using Markov decision processes to optimise a non-linear functional of the final distribution, with manufacturing applications
We consider manufacturing problems which can be modelled as finite horizon Markov decision processes for which the effective reward function is either a strictly concave or strictly convex functional of the distribution of the final state. Reward structures such as these often arise when penalty factors are incorporated into the usual expected reward objective function. For convex problems ther...
متن کاملOptimal Finite-time Control of Positive Linear Discrete-time Systems
This paper considers solving optimization problem for linear discrete time systems such that closed-loop discrete-time system is positive (i.e., all of its state variables have non-negative values) and also finite-time stable. For this purpose, by considering a quadratic cost function, an optimal controller is designed such that in addition to minimizing the cost function, the positivity proper...
متن کاملThe Finite Horizon Economic Lot Scheduling in Flexible Flow Lines
This paper addresses the common cycle multi-product lot-scheduling problem in flexible flow lines (FFL) where the product demands are deterministic and constant over a finite planning horizon. Objective is minimizing the sum of setup costs, work-in-process and final products inventory holding costs per time unite while satisfying the demands without backlogging. This problem consists of a combi...
متن کاملThe Finite Element Transient Structure Analysis of the Startup of the Sugarcane Harvester Transfer Case
The broken bearings and great noise and vibration often occurs with the small sugarcane harvester transfer case when it starts up working. To analyze the startup status of the transfer case conveniently and quickly, the finite element transient structure analysis is carried out. with virtual prototype technology to simulate the transfer case's startup dynamic process and measure the instantaneo...
متن کاملOptimizing the Static and Dynamic Scheduling problem of Automated Guided Vehicles in Container Terminals
The Minimum Cost Flow (MCF) problem is a well-known problem in the area of network optimisation. To tackle this problem, Network Simplex Algorithm (NSA) is the fastest solution method. NSA has three extensions, namely Network Simplex plus Algorithm (NSA+), Dynamic Network Simplex Algorithm (DNSA) and Dynamic Network Simplex plus Algorithm (DNSA+). The objectives of the research reported in this...
متن کامل