Data point selection for self-training
نویسنده
چکیده
Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instances selected with regard to their similarity to the annotated data. Our similarity measure is based on the perplexity of part-of-speech trigrams of new instances measured against the annotated training data. Preliminary results show that our method outperforms a self-training setting where instances are simply selected by order of occurrence in the corpus and argue that selftraining is a cheap and effective method for improving parsing accuracy for morphologically rich languages.
منابع مشابه
Negative Selection Based Data Classification with Flexible Boundaries
One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...
متن کاملA Boundary-aware Negative Selection Algorithm
Negative selection algorithms generate their detector sets based on the points of self data. In the approach described in this paper, the continuous self region is defined by the collection of self data. This has important differences from the negative selection algorithms that simply take each self point and its vicinity as the self region: when the training self points are used together as a ...
متن کاملPower-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models
Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m, using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implem...
متن کاملCorrelation between self-concept and academic achievement of students
Introduction: Nowadays the most important problem of our educational system is educational subsidence phenomenon. Therefore knowing factors which improves education and prevents is of special importance. Self-concept is among factors studied a lot. Many studies indicate a direct relationship between self-concept and educational subsidence but some experts doubt the direct relationship of...
متن کاملComparison of the Effects of Self-Determination Skills Training and Parent Management Training on Externalizing Behavior Problems of Students
This study was carried out to compare the effects of self-determination skills training and parent management training on the externalizing behavior problems of students. This quasi-experimental research had a pretest-posttest, control group design. To achieve research goals, 45 students with externalizing behavior problems who were identified through Child Behavior Checklist (CBCL) and via ran...
متن کامل