COMS 6998 - 4 Fall 2017 Presenter : Geelon So

نویسنده

Yanlin Duan

چکیده

Recall that in the active learning setting, learner is provided with unlabeled samples, and can query teacher for the label. The goal is to learn a concept close enough to the target concept while minimizing the number of labels queried. Ideally the number of labels needed is much smaller than Ω(1/ ), which is the number of labeled examples in the passive learning setting. To characterize the sample complexity, in this lecture we discussed another quantity to measure the effectiveness of active learning on particular concept classes and distributions: splitting index. We provide motivating examples, definition of splitting index, and a (coarse) lower and upper bound for label complexity based on it.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

COMS 6998 - 4 Fall 2017 Presenter : Geelon So

In the setting of active learning, the data comes unlabeled and querying the label of a data point is expensive. The goal of an active learner is to reduce the number of labels needed and output a hypothesis with error rate ≤ . Recall that the usual sample complexity of supervised learning is Ω(1/ ). The motivation for defining splitting index is to characterize the sample complexity of active ...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Daniel Hsu

+ errPn(hn)− errPn(h) + errPn(h ∗)− errP (h∗). The second part is less than or equals to 0. We can disregard it when we aim at deriving an upper bound of the regret. Since the target function h∗ is independent of the sample pairs, the third part can be bounded easily by analyzing the binomial distribution with success probability errP (h∗) and n trials. To analyze the remaining first part, we b...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Yanlin

In former lectures, we have learned a lot about online learning. The basic idea is to keep a subset of hypothesis space as the version space and reduce the version space by new data or queries. And we consider the data in any arbitrary form, which means we don’t have any specific hypothesis on data’s schema itself. Although it can be generalized easily, still we want to make a practical effort ...

متن کامل

COMS 6998 - 4 Fall 2017 Presenter : Wenxi Chen

This lecture is delivered in more philosophical sense rather than technically. In the past, we have learned a series of algorithms which can dig deeper in a training dataset and generate a model based on such a dataset. However, without knowing of what knowledge is contained in the model, it is generally hard for human beings to trust the trained model and apply it in reality. Sometimes, biased...

متن کامل