نتایج جستجو برای: q value
تعداد نتایج: 842664 فیلتر نتایج به سال:
State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment....
Let N be a prime and let A be a quotient of J0(N) over Q associated to a newform f such that the special L-value of A (at s = 1) is non-zero. Suppose that the algebraic part of special L-value is divisible by an odd prime q such that q does not divide the numerator of N−1 12 . Then the Birch and Swinnerton-Dyer conjecture predicts that q divides the algebraic part of special L value of A, as we...
In this article, we calculate the scalar form-factors fππ(Q 2) and fKK(Q 2) in the framework of the light-cone QCD sum rules approach. The numerical value of the fππ(Q 2) changes quickly with variation ofQ2 near zero momentum transfer, while the fKK(Q 2) has rather good behavior at small momentum transfer. The value fKK(0) = 2.21 +0.35 −0.19GeV is compatible with the result from the leading ord...
In this paper, we deal with the distribution of zeros of q-shift difference polynomials of transcendental entire functions of zero order. At the same time we also investigate the uniqueness problems when two difference products of entire functions share one value with finite weight. The results of the paper improve and generalize some recent results due to Xu, Liu and Cao [Math. Commun. 20 (201...
The Dec-POMDP is a model for multi-agent planning under uncertainty that has received increasingly more attention over the recent years. In this work we propose a new heuristic QBG that can be used in various algorithms for Dec-POMDPs and describe differences and similarities with QMDP and QPOMDP. An experimental evaluation shows that, at the price of some computation, QBG gives a consistently ...
BACKGROUND Storage issues and bandwidth over networks have led to a need to optimally compress medical imaging files while leaving clinical image quality uncompromised. METHODS To determine the range of clinically acceptable medical image compression across multiple modalities (CT, MR, and XR), we performed psychometric analysis of image distortion thresholds using physician readers and also ...
Value-based reinforcement learning typically involves the repeated application of an update rule, such as the Bellman operator TB, to an action-value function. Recent work has explored the use of alternative operators, which remain optimality-preserving and may result in improved performance. In this report, I study in particular the advantage learning operator, TALQ = TBQ − α(V − Q). A theoret...
A quantitative model for the damping of oscillations of the semiquinone absorption after successive light flashes is presented. It is based on the equilibrium between the states Q(A)-Q(B) and Q(A) Q(-B). A fit of the model to the experimental results obtained for reaction centers from Rhodopseudomonas sphaeroides gave a value of α = [Q(A)-Q(B)I/(IQ(A)-Q(Bl)+ [Q(A)Q(-B)I) = 0.065 +/- 0.005 (T= 2...
in this paper, we prove the existence of the solution for boundary value prob-lem(bvp) of fractional dierential equations of order q 2 (2; 3]. the kras-noselskii's xed point theorem is applied to establish the results. in addition,we give an detailed example to demonstrate the main result.
We study in this paper the first-order behavior of value functions in parametric dynamic programming with linear constraints and nonconvex cost functions. By establishing an abstract result on the Fréchet subdifferential of value functions of parametric mathematical programming problems, some new formulas on the Fréchet subdifferential of value functions in parametric dynamic programming are ob...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید