Faster Rates for Policy Learning
نویسندگان
چکیده
This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme.
منابع مشابه
A Hybrid Machine Learning Method for Intrusion Detection
Data security is an important area of concern for every computer system owner. An intrusion detection system is a device or software application that monitors a network or systems for malicious activity or policy violations. Already various techniques of artificial intelligence have been used for intrusion detection. The main challenge in this area is the running speed of the available implemen...
متن کاملProtecting infant industries: Canadian manufacturing and the national policy, 1870¬タモ1913 ¬リニ
Infant industry protection has been the cornerstone of a debate on tariff policy that extends at least from the eighteenth century to the current day. In contrast to traditional neo-classical models of international trade that imply net negative effects, industrial organization and learning-by-doing trade models describe how protective tariffs can encourage output expansion, productivity improv...
متن کاملLearning and International Policy Diffusion – The Case of Corporate Tax Policy ∗
A recent empirical literature has arisen documenting the response of one nation’s policy choices, including tax, environmental, and labour policies, to those of others. This has been largely interpreted as evidence of competition, be it for mobile resources (like FDI, taxable book income, etc.) or yardstick. We present a third explanation based on learning. When countries’ tax choices reflect p...
متن کاملLearning Curve and Industry Structure: Evidences from Iranian Manufacturing Industries
he empirical studies have shown that cost advantages can occur due to economies of scale and economies of learning. However, a few studies have attempted to distinguish between these two effects on reducing costs. This paper is the first attempt on recognizing the impact of learning on reducing the cost with distinguishing the effect of economies of scale in Iran. Therefore, this study aims to ...
متن کاملCompetitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling
The speed and performance of learning depend on the complexity of the learner. A simple learner with few parameters and no internal states can quickly obtain a reactive policy, but its performance is limited. A learner with many parameters and internal states may finally achieve high performance, but it may take enormous time for learning. Therefore, it is difficult to decide in advance which a...
متن کامل