نتایج جستجو برای: mdp
تعداد نتایج: 3240 فیلتر نتایج به سال:
Any learning algorithm over Markov decision processes (MDPs) will have worst-case regret Ω( √ SAT ) where T is the elapsed time and S and A are the cardinalities of the state and action spaces. In many settings of interest S and A may be so huge that it is impossible to guarantee good performance for an arbitrary MDP on any practical timeframe T . We show that, if we know the true system can be...
Fever can be elicited in the rabbit by the intravenous administration of relatively large doses of a synthetic immunoadjuvant, N-acetylmuramyl-L-alanyl-D-isoglutamine, or muramyl dipeptide (MDP). This response could be mediated by endogenous pyrogen because MDP has been shown to induce their production both in vivo and in vitro. The results reported here show that intracisternal injection of mi...
Su cient conditions for a rank-dependent moderate deviations principle (MDP) for degenerate U -processes are presented. The MDP for VC classes of functions is obtained under exponential moments of the envelope. Among other techniques, randomization, decoupling inequalities and integrability of Gaussian and Rademacher chaos are used to present new Bernstein-type inequalities for U -processes whi...
Exploration in multi-task reinforcement learning is critical in training agents to deduce the underlying MDP. Many of the existing exploration frameworks such as E, Rmax, Thompson sampling assume a single stationary MDP and are not suitable for system identification in the multi-task setting. We present a novel method to facilitate exploration in multi-task reinforcement learning using deep gen...
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Markov Decision Process (MDP). The algorithm proceeds in episodes where, in each episode, it picks a policy using regularization based on the span of the optimal bias vector. For an MDP with S states and A actions whose optimal bias vector has span bounded by H, we show a regret bound of Õ(HS √ AT )...
The present case demonstrates a diffuse intense hepatic and, to a lesser degree, spleen, Tc-99m MDP uptake on a routine bone scintigraphy resembling liver-spleen imaging. A 49-year-old female with a history of anaplastic plasma cell tumor and suffering from bone pain was referred for bone scintigraphy to evaluate possible bone metastases. The bone scintigraphy showed diffuse hepatic and spleen ...
MDPs Yet, the application of options/macros has only been discussed by intuition. One of the models of usage proposed in [21] is the following: Definition 3.2.3 Let Π = {S1, . . . , Sn} be a decomposition of MDP M = 〈A,S,Tr ,R〉, and let A = {Ai : i ≤ n} be a collection of macroaction sets, where Ai = {π1 i , . . . , πi i } is a set of macros for region Si. The abstract MDP M ′ = 〈A′, S′,Tr ′,R′...
This paper examines the notion of symmetry in Markov decision processes (MDPs). We define symmetry for an MDP and show how it can be exploited for more effective learning in single agent systems as well as multiagent systems and multirobot systems. We prove that if an MDP possesses a symmetry, then the optimal value function andQ function are similarly symmetric and there exists a symmetric opt...
We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decis...
The synthesis and biological activity of new conjugates of muramyl dipeptide (MDP) and nor-muramyl dipeptide (nor-MDP) with tuftsin and retro-tuftsin derivatives containing isopeptide bond between ε-amino group of lysine and carboxyl group of simple amino acids such as Ala, Gly and Val are presented. We presumed, based on the cytokine profile, that the examined conjugates of tuftsin and MDP wer...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید