Towards Exploiting Duality in Approximate Linear Programming for MDPs

نویسندگان

  • Dmitri A. Dolgov
  • Edmund H. Durfee
چکیده

A weakness of classical methods for solving Markov decision processes is that they scale very poorly because of the flat state space, which subjects them to the curse of dimensionality. Fortunately, many MDPs are well-structured, which makes it possible to avoid enumerating the state space. To this end, factored MDP representations have been proposed (Boutilier, Dearden, & Goldszmidt 1995; Koller & Parr 1999) that model the state space as a cross product of state features, represent the transition function as a Bayesian network, and assume the rewards can be expressed as sums of compact functions of the state features. A challenge in creating algorithms for the factored representations is that well-structured problems do not always lead to compact and well-structured solutions (Koller & Parr 1999); that is, an optimal policy does not, in general, retain the structure of the problem. Because of this, it becomes necessary to resort to approximation techniques. Approximate linear programming (ALP) has recently emerged as a very promising MDP-approximation technique (Schweitzer & Seidmann 1985; de Farias & Roy 2003). As such, ALP has received a significant amount of attention, which has led to a theoretical foundation (de Farias & Roy 2003) and efficient solution techniques (e.g., (de Farias & Roy 2004; Guestrin et al. 2003; Patrascu et al. 2002)). However, this work has focused only on approximating the primal LP, and no effort has been invested in approximating the dual LP, which is the basis for solving a wide range of constrained MDPs (e.g., (Altman 1999; Dolgov & Durfee 2004)). Unfortunately, as we demonstrate, linear approximations do not interact with the dual LP as well as they do with the primal LP, because the constraint coefficients cannot be computed efficiently (the operation does not maintain the compactness of the representation). To address this, we propose an LP formulation, which we call a composite ALP, that approximates both the primal and the dual optimization coordinates (the value function and the occupation measure), which is equivalent to approximating both the objective functions and the feasible regions of the LPs. This method provides a basis for efficient approximations of constrained MDPs and also serves as a new approach to a widely-discussed problem of dealing with exponentially many constraints in ALPs, which plagues both the primal

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs (Extended Version)

Many exact and approximate solution methods for Markov Decision Processes (MDPs) attempt to exploit structure in the problem and are based on factorization of the value function. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, meaning that approximation architectures are restricted in the problem s...

متن کامل

Efficient Solution Algorithms for Factored MDPs

This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MD...

متن کامل

Symmetric Primal-Dual Approximate Linear Programming for Factored MDPs

A weakness of classical Markov decision processes is that they scale very poorly due to the flat state-space representation. Factored MDPs address this representational problem by exploiting problem structure to specify the transition and reward functions of an MDP in a compact manner. However, in general, solutions to factored MDPs do not retain the structure and compactness of the problem rep...

متن کامل

Exploiting Anonymity in Approximate Linear Programming: Scaling to Large Multiagent MDPs

Many solution methods for Markov Decision Processes (MDPs) exploit structure in the problem and are based on value function factorization. Especially multiagent settings, however, are known to suffer from an exponential increase in value component sizes as interactions become denser, restricting problem sizes and types that can be handled. We present an approach to mitigate this limitation for ...

متن کامل

Exponential membership function and duality gaps for I-fuzzy linear programming problems

Fuzziness is ever presented in real life decision making problems. In this paper, we adapt the pessimistic approach tostudy a pair of linear primal-dual problem under intuitionistic fuzzy (I-fuzzy) environment and prove certain dualityresults. We generate the duality results using exponential membership and non-membership functions to represent thedecision maker’s satisfaction and dissatisfacti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005