Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets
نویسندگان
چکیده
Creating large amounts of annotated data to train statistical PCFG parsers is expensive, and the performance of such parsers declines when training and test data are taken from different domains. In this paper we use selftraining in order to improve the quality of a parser and to adapt it to a different domain, using only small amounts of manually annotated seed data. We report significant improvement both when the seed and test data are in the same domain and in the outof-domain adaptation scenario. In particular, we achieve 50% reduction in annotation cost for the in-domain case, yielding an improvement of 66% over previous work, and a 20-33% reduction for the domain adaptation case. This is the first time that self-training with small labeled datasets is applied successfully to these tasks. We were also able to formulate a characterization of when selftraining is valuable.
منابع مشابه
Sample-oriented Domain Adaptation for Image Classification
Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it. The conventional image processing algorithms cannot perform well in scenarios where the training images (source domain) that are used to learn the model have a different distribution with test images (target domain). Also, many real world applicat...
متن کاملSelf-Training without Reranking for Parser Domain Adaptation and Its Impact on Semantic Role Labeling
We compare self-training with and without reranking for parser domain adaptation, and examine the impact of syntactic parser adaptation on a semantic role labeling system. Although self-training without reranking has been found not to improve in-domain accuracy for parsers trained on the WSJ Penn Treebank, we show that it is surprisingly effective for parser domain adaptation. We also show that...
متن کاملSelf-Training Tree Substitution Grammars for Domain Adaptation
Parsing is the process of inferring the syntactic structure of a sentence, based on a model of syntax that specifies which sentences are possible or likely. The field of statistical parsing concerns itself with learning probabilistic syntactic models from corpora. Ideally, it should be possible to parse any grammatical sentence of any natural language. Because different languages have wildly di...
متن کاملBootstrapping statistical parsers from small datasets
We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers when the manually parsed training materia...
متن کاملA Pointwise Approach to Training Dependency Parsers from Partially Annotated Corpora
We introduce a word-based dependency parser for Japanese that can be trained from partially annotated corpora, allowing for effective use of available linguistic resources and reduction of the costs of preparing new training data. This is especially important for domain adaptation in a real-world situation. We use a pointwise approach where each edge in the dependency tree for a sentence is est...
متن کامل