نتایج جستجو برای: reward packages
تعداد نتایج: 46029 فیلتر نتایج به سال:
We present and solve a real-world problem of learning to drive a bicycle. We solve the problem by online reinforcement learning using the Sarsa( )-algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to a goal. In our approach the reinforcement function is independent of the task the agent tries to learn to solve.
Many factory optimization problems, from inventory control to scheduling and reliability , can be formulated as continuous-time Markov decision processes. A primary goal in such problems is to nd a gain-optimal policy that minimizes the long-run average cost. This paper describes a new average-reward algorithm called SMART for nd-ing gain-optimal policies in continuous time semi-Markov decision...
Introduction Stochastic Petri net based Markov modeling is a potentially very powerful and generic approach for evaluating the performance and depend ability of many di erent systems such as computer systems communication networks manufacturing sys tems etc As a consequence of their general appli cability SPN based Markov models form the basic solution approach for several software packages tha...
When it is not possible to distribute resources equitably to everyone, people look for an equitable or just procedure. In the current study, we investigated young children's sense of procedural justice. We tested 32 triads of 5-year-olds in a new resource allocation game. Triads were confronted with three unequal reward packages and then agreed on a procedure to allocate them among themselves. ...
ABSTRACT BACKGROUND AND OBJECTIVE: Evolution and innovation packages in medical science education are the main program of medical education and it is necessary to pay attention to the provision of infrastructure of their implementation. This study was conducted to identify effective strategies for optimal implementation of evolution and innovation packages in medical education. METHODS: The met...
Previous animal experiments have shown that serotonin is involved in the control of impulsive choice, as characterized by high preference for small immediate rewards over larger delayed rewards. Previous human studies under serotonin manipulation, however, have been either inconclusive on the effect on impulsivity or have shown an effect in the speed of action-reward learning or the optimality ...
This paper reviews and compares two R packages ``FPV" and ``Fuzzy.p.value".These packages are designed for testing hypotheses in a fuzzy environment using a fuzzy $p$-value based approach.In fact, the packages ``FPV" and ``Fuzzy.p.value" propose some useful functions for testing hypotheses when the data / hypotheses are fuzzy rather than crisp.The proposed methods and function...
Introduction: The education field is one of the infrastructural fields of the health system and in order to evolving this field training of the human resources should be evolved. The evolution and innovation document is a special opportunity for education practitioners and universities' authorities to take a step towards the promotion of medical education in the country. Proper and timely patho...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید