Instance-by-instance optimal identity testing
نویسندگان
چکیده
We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete support p = (p1, p2, . . . , pn), how many samples (independent draws) must one obtain from an unknown distribution, q, to distinguish, with high probability, the case that p = q from the case that the total variation distance (L1 distance) ||p− q||1 ≥ ε? We resolve this question, up to constant factors, on an instance by instance basis: there exist universal constants c, c′ and a function f(p, ε) on distributions and error parameters, such that our tester distinguishes p = q from ||p− q||1 ≥ ε using f(p, ε) samples with success probability > 2/3, but no tester can distinguish p = q from ||p− q||1 ≥ c · ε when given c′ · f(p, ε) samples. The function f(p, ε) is upper-bounded by a multiple of ||p||2/3 ε2 , but is more complicated, and is significantly smaller in cases when p has many small domain elements, or a single large one. This result significantly generalizes and tightens previous results: since distributions of support at most n have L2/3 norm bounded by √ n, this result immediately shows that for such distributions, O( √ n/ε) samples suffice, tightening the previous bound of O( √ npolylog n ε4 ) for this class of distributions, and matching the (tight) known results for the case that p is the uniform distribution over support n. The analysis of our very simple testing algorithm involves several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities— generalizing Cauchy-Schwarz, Hölder’s inequality, and the monotonicity of Lp norms. Specifically, we characterize the set of sequences a = a1, . . . , am, b = b1, . . . , bm, c = c1 . . . , cm, for which it holds that for all finite sequences of positive numbers x = x1, . . . and y = y1, . . . , ∏
منابع مشابه
Wasserstein Identity Testing
Uniformity testing and the more general identity testing are well studied problems in distributional property testing. Most previous work focuses on testing under L1-distance. However, when the support is very large or even continuous, testing under L1-distance may require a huge (even infinite) number of samples. Motivated by such issues, we consider the identity testing in Wasserstein distanc...
متن کاملTesting Poisson Binomial Distributions
A Poisson Binomial distribution over n variables is the distribution of the sum of n independent Bernoullis. We provide a sample near-optimal algorithm for testing whether a distribution P supported on {0, . . . , n} to which we have sample access is a Poisson Binomial distribution, or far from all Poisson Binomial distributions. The sample complexity of our algorithm is O(n) to which we provid...
متن کاملIRDDS: Instance reduction based on Distance-based decision surface
In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملA note on the socle of certain types of f-rings
For any reduced commutative $f$-ring with identity and bounded inversion, we show that a condition which is obviously necessary for the socle of the ring to coincide with the socle of its bounded part, is actually also sufficient. The condition is that every minimal ideal of the ring consist entirely of bounded elements. It is not too stringent, and is satisfied, for instance, by rings of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Electronic Colloquium on Computational Complexity (ECCC)
دوره 20 شماره
صفحات -
تاریخ انتشار 2013