Adaptive Control of Constrained Markov Chains : Criteria and Policies
نویسندگان
چکیده
We consider the constrained optimization of a nite-state, nite action Markov chain. In the adaptive problem, the transition probabilities are assumed to be unknown, and no prior distribution on their values is given. We consider constrained optimization problems in terms of several cost criteria which are asymptotic in nature. For these criteria we show that it is possible to achieve the same optimal cost as in the non-adaptive case. We rst formulate a constrained optimization problem under each of the cost criteria and establish the existence of optimal stationary policies. Since the adaptive problem is inherently non-stationary, we suggest a class of \Asymptotically Stationary" (AS) policies, and show that, under each of the cost criteria, the costs of an AS policy depend only on it's limiting behavior. This property implies that there exist optimal AS policies. A method for generating adaptive policies is then suggested, which leads to strongly consistent estimators for the unknown transition probabilities. A way to guarantee that these policies are also optimal is to couple them with the adaptive algorithms of 3]. This leads to optimal policies for each of the adaptive constrained optimization problems under discussion.
منابع مشابه
Stochastic Dynamic Programming with Markov Chains for Optimal Sustainable Control of the Forest Sector with Continuous Cover Forestry
We present a stochastic dynamic programming approach with Markov chains for optimal control of the forest sector. The forest is managed via continuous cover forestry and the complete system is sustainable. Forest industry production, logistic solutions and harvest levels are optimized based on the sequentially revealed states of the markets. Adaptive full system optimization is necessary for co...
متن کاملTime-Sharing Policies for Controlled Markov Chains
We propose a class of non-stationary policies called \policy time sharing" (p.t.s.), which possess several desirable properties for problems where the criteria are of the average-cost type; an optimal policy exists within this class, the computation of optimal policies is straightforward, and the implementation of this policy is easy. While in the nite state case stationary policies are also kn...
متن کاملTime and Ratio Expected Average Cost Optimality for Semi-Markov Control Processes on Borel Spaces
We deal with semi-Markov control models with Borel state and control spaces, and unbounded cost functions under the ratio and the time expected average cost criteria. Under suitable growth conditions on the costs and the mean holding times together with stability conditions on the embedded Markov chains, we show the following facts: (i) the ratio and the time average costs coincide in the class...
متن کاملSensitivity of Constrained Markov Decision Processes
We consider the optimization of nite-state, nite-action Markov Decision processes, under constraints. Costs and constraints are of the discounted or average type, and possibly nite-horizon. We investigate the sensitivity of the optimal cost and optimal policy to changes in various parameters. We relate several optimization problems to a generic Linear Program, through which we investigate sensi...
متن کاملEmpirical Bayes Estimation in Nonstationary Markov chains
Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical Bayes estimators for the transition probability matrix of a finite nonstationary Markov chain. The data are assumed to be of a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...
متن کامل