mdp

Near-optimal Reinforcement Learning in Factored MDPs

2014

Ian Osband Benjamin Van Roy

Any learning algorithm over Markov decision processes (MDPs) will have worst-case regret Ω( √ SAT ) where T is the elapsed time and S and A are the cardinalities of the state and action spaces. In many settings of interest S and A may be so huge that it is impossible to guarantee good performance for an arbitrary MDP on any practical timeframe T . We show that, if we know the true system can be...

متن کامل

Central pyrogenic activity of muramyl dipeptide

Journal: :The Journal of Experimental Medicine 1980

G Riveau K Masek M Parant L Chedid

Fever can be elicited in the rabbit by the intravenous administration of relatively large doses of a synthetic immunoadjuvant, N-acetylmuramyl-L-alanyl-D-isoglutamine, or muramyl dipeptide (MDP). This response could be mediated by endogenous pyrogen because MDP has been shown to induce their production both in vivo and in vitro. The results reported here show that intracisternal injection of mi...

متن کامل

Moderate Deviations for Degenerate U-processes

2000

PETER EICHELSBACHER

Su cient conditions for a rank-dependent moderate deviations principle (MDP) for degenerate U -processes are presented. The MDP for VC classes of functions is obtained under exponential moments of the envelope. Among other techniques, randomization, decoupling inequalities and integrability of Gaussian and Rademacher chaos are used to present new Bernstein-type inequalities for U -processes whi...

متن کامل

Exploration for Multi-task Reinforcement Learning with Deep Generative Models

Journal: :CoRR 2016

Sai Praveen Bangaru J. S. Suhas Balaraman Ravindran

Exploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as E, Rmax, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep gen...

متن کامل

REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs

2009

Peter L. Bartlett Ambuj Tewari

We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of Õ(HS √ AT )...

متن کامل

Diffuse Hepatic and Spleen Uptake of Tc-99m MDP on Bone Scintigraphy Resembling Liver-Spleen Scintigraphy in a Patient of Plasma Cell Tumor

2014

Mohammad Reza Ravanbod Reza Nemati Hamid Javadi Iraj Nabipour Majid Assadi

The present case demonstrates a diffuse intense hepatic and, to a lesser degree, spleen, Tc-99m MDP uptake on a routine bone scintigraphy resembling liver-spleen imaging. A 49-year-old female with a history of anaplastic plasma cell tumor and suffering from bone pain was referred for bone scintigraphy to evaluate possible bone metastases. The bone scintigraphy showed diffuse hepatic and spleen ...

متن کامل

INTEGRATING DECISION-THEORETIC PLANNING AND PROGRAMMING FOR ROBOT CONTROL IN HIGHLY DYNAMIC DOMAINS von

2003

Christian Fritz

MDPs Yet, the application of options/macros has only been discussed by intuition. One of the models of usage proposed in [21] is the following: Definition 3.2.3 Let Π = {S1, . . . , Sn} be a decomposition of MDP M = 〈A,S,Tr ,R〉, and let A = {Ai : i ≤ n} be a collection of macroaction sets, where Ai = {π1 i , . . . , πi i } is a set of macros for region Si. The abstract MDP M ′ = 〈A′, S′,Tr ′,R′...

متن کامل

Symmetry in Markov Decision Processes and its Implications for Single Agent and Multiagent Learning

2001

Martin Zinkevich Tucker R. Balch

This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric opt...

متن کامل

Probabilistic Systems with LimSup and LimInf Objectives

2007

Krishnendu Chatterjee Thomas A. Henzinger

We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decis...

متن کامل

New conjugates of muramyl dipeptide and nor-muramyl dipeptide linked to tuftsin and retro-tuftsin derivatives significantly influence their biological activity.

Journal: :Pharmacological reports : PR 2012

Krystyna Dzierzbicka Anna Wardowska Małgorzata Rogalska Piotr Trzonkowski

The synthesis and biological activity of new conjugates of muramyl dipeptide (MDP) and nor-muramyl dipeptide (nor-MDP) with tuftsin and retro-tuftsin derivatives containing isopeptide bond between ε-amino group of lysine and carboxyl group of simple amino acids such as Ala, Gly and Val are presented. We presumed, based on the cytokine profile, that the examined conjugates of tuftsin and MDP wer...

متن کامل