Semi-Supervised Feature Transformation for Dependency Parsing
نویسندگان
چکیده
In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.
منابع مشابه
Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
Recent work on joint word segmentation, POS (Part Of Speech) tagging, and dependency parsing in Chinese has two key problems: the first is that word segmentation based on character and dependency parsing based on word were not combined well in the transition-based framework, and the second is that the joint model suffers from the insufficiency of annotated corpus. In order to resolve the first ...
متن کاملLearning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning
This paper proposes a novel approach for effectively utilizing unsupervised data in addition to supervised data for supervised learning. We use unsupervised data to generate informative ‘condensed feature representations’ from the original feature set used in supervised NLP systems. The main contribution of our method is that it can offer dense and low-dimensional feature spaces for NLP tasks w...
متن کاملAn Empirical Study of Semi-supervised Structured Conditional Models for Dependency Parsing
This paper describes an empirical study of high-performance dependency parsers based on a semi-supervised learning approach. We describe an extension of semisupervised structured conditional models (SS-SCMs) to the dependency parsing problem, whose framework is originally proposed in (Suzuki and Isozaki, 2008). Moreover, we introduce two extensions related to dependency parsing: The first exten...
متن کاملSyntax-based Semi-Supervised Named Entity Tagging
We report an empirical study on the role of syntactic features in building a semisupervised named entity (NE) tagger. Our study addresses two questions: What types of syntactic features are suitable for extracting potential NEs to train a classifier in a semi-supervised setting? How good is the resulting NE classifier on testing instances dissimilar from its training data? Our study shows that ...
متن کاملTitle of Thesis: Learning Structured Classifiers for Statistical Dependency Parsing Learning Structured Classifiers for Statistical Dependency Parsing
In this thesis, I present three supervised and one semi-supervised machine learning approach for improving statistical natural language dependency parsing. I first introduce a generative approach that uses a strictly lexicalised parsing model where all the parameters are based on words, without using any part-of-speech (POS) tags or grammatical categories. Then I present an improved large margi...
متن کامل