Active Learning-Based Elicitation for Semi-Supervised Word Alignment
نویسندگان
چکیده
Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, marginand query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
منابع مشابه
Active Semi-Supervised Learning for Improving Word Alignment
Word alignment models form an important part of building statistical machine translation systems. Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial alignments acquired from humans. Such dedicated elicitation effort is often expensive and depends on availability of bilingual speakers for the language-pair. In this paper we st...
متن کاملSemi-Supervised Block ITG Models for Word Alignment
Labeled training data for the word alignment task, in the form of word-aligned sentence pairs, is hard to come by for many language-pairs. Hence, it is natural to draw upon semi-supervised learning methods (Fraser and Marcu, 2006). We introduce a semisupervised learning method for word alignment using conditional entropy regularization (Grandvalet and Bengio, 2005) on top of a BITG-based discri...
متن کاملBoosting Statistical Word Alignment Using Labeled and Unlabeled Data
This paper proposes a semi-supervised boosting approach to improve statistical word alignment with limited labeled data and large amounts of unlabeled data. The proposed approach modifies the supervised boosting algorithm to a semisupervised learning algorithm by incorporating the unlabeled data. In this algorithm, we build a word aligner by using both the labeled data and the unlabeled data. T...
متن کاملJoint Prediction of Word Alignment with Alignment Types
Current word alignment models do not distinguish between different types of alignment links. In this paper, we provide a new probabilistic model for word alignment where word alignments are associated with linguistically motivated alignment types. We propose a novel task of joint prediction of word alignment and alignment types and propose novel semi-supervised learning algorithms for this task...
متن کاملSemi-supervised Word Alignment with Mechanical Turk
Word alignment is an important preprocessing step for machine translation. The project aims at incorporating manual alignments from Amazon Mechanical Turk (MTurk) to help improve word alignment quality. As a global crowdsourcing service, MTurk can provide flexible and abundant labor force and therefore reduce the cost of obtaining labels. An easyto-use interface is developed to simplify the lab...
متن کامل