نتایج جستجو برای: t policy
تعداد نتایج: 957380 فیلتر نتایج به سال:
This paper studies the following online replacement problem. There is a real function f(t), called the ow rate, deened over a nite time horizon 0; T]. It is known that m f(t) M for some reals 0 m < M. At time 0 an online player starts to pay money at the rate f(0). At each time 0 < t T the player may changeover and continue paying money at the rate f(t). The complication is that each such chang...
National science and technology (S&T) systems are often mentioned as a condition for competitiveness of high technology sectors. Therefore, public S&T policies should actively support the development of national S&T systems. In particular in Eastern Europe an active S&T policy is often demanded to support the development of the supposed domestic "high technology potential". This paper shows tha...
The Chronicle learned today t h a t Monday's recommendation to ihe President from the Student FacultyAdm in lstraition Committee suggested an administrative rei-interpretation of present policy on University group usage of segregated facilities. "Hie present policy was recommended by the University Policy and Planning Advisory Committee in September and imtmediately accepted by President Dougla...
this paper analyzes a controllable discrete-time machine repair problem withl operating machines and two repairmen. the number of working servers can be adjusteddepending on the number of failed machines in the system one at a time at machine's failure orat service completion epochs. analytical closed-form solutions of the stationary probabilities ofthe number of failed machines in the sys...
S exual dimorphism of testosterone (T) in elite athletes was at the center of a recent case at the “Supreme Court of Sport,” the Court of Arbitration for Sport in Switzerland, after teenage Indian sprinter Dutee Chand challenged a sports policy regulating competition eligibility of women with naturally high T. The idea of a “sex gap” in T is a cornerstone of this policy ( 1). Policymakers infer...
A computing policy is a sequence of rules, where each rule consists of a predicate and an action, and where each action is either “accept” or “reject”. A policy P is said to accept (or reject, respectively) a request iff the action of the first rule in P , that is matched by the request is “accept” (or “reject”, respectively). A pair of policies (P , Q) is called an accept-implication pair iff ...
Since the Chinese government’s rapid increase in expenditure on science and technology (S&T) during the 2000s, numerous related policies have been implemented by national-, provincial-, city-, and prefecture-level governments in China. Each level of government aims to promote innovation activities; however, few empirical evaluations have been conducted on each policy level and category. This pa...
We consider a firm that sells a large number of products to its customers in an online fashion. Each product is described by a high dimensional feature vector, and the market value of a product is assumed to be linear in the values of its features. Parameters of the valuation model are unknown and can change over time. The firm sequentially observes a product’s features and can use the historic...
An Online Policy Gradient Algorithm for Markov Decision Processes with Continuous States and Actions
We consider the learning problem under an online Markov decision process (MDP) aimed at learning the time-dependent decision-making policy of an agent that minimizes the regret-the difference from the best fixed policy. The difficulty of online MDP learning is that the reward function changes over time. In this letter, we show that a simple online policy gradient algorithm achieves regret O(√T)...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an r-dimensional random vector Z ∈ Rr, where r ≥ 2. The objective is to minimize the cumulative regret and Bayes risk. When the set of arms corresponds to the unit sphere, we prove that the regret and Bayes risk is of order Θ(r √ T ), by establis...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید