Is Bad Structure Better Than No Structure?: Unsupervised Parsing for Realisation Ranking
نویسندگان
چکیده
In natural language generation using symbolic grammars, state-of-the-art realisation rankers use statistical models incorporating both language model and structural features. The rankers depend on multiple structures produced by the particular large-scale symbolic grammars to rank the output; for languages with smaller resources and in-development grammars, we look at the feasibility of an alternative source of structural features, unsupervised parsers. We show that, in spite of their lower quality of structure, raw sets of unsupervised parse features can be helpful with smaller language models; and that the parses do contain particular elements that can be highly useful, improving performance on our classification task by up to 10% on 60% of the test set leading to an overall improvement under a back-off model. Title and Abstract in French Une mauvaise structure est-elle mieux que pas de structure du tout? L’analyse non supervisée pour la sélection des réalisations Dans plusieurs systèmes récents de génération de texte basés sur des grammaires symboliques, les résultats sont ordonnés selon leur acceptabilité par des modèles statistiques qui incorporent des modèles de Markov et des traits structurels. Ces modules d’ordonnancement dépendent de diverses structures produites par la grammaire, ce qui présuppose une grammaire suffisamment développée. Pour les langues à faibles ressources ou pour les grammaires en cours de développement, nous étudions ici la viabilité d’une source alternative de traits structurels: les analyseurs non supervisés. Nous démontrons que, en dépit de la faible qualité des structures produites, elles contiennent des éléments qui peuvent être très utiles pour les langues peu dotées, permettant d’améliorer de 10% la performance de notre classificateur pour 60% des phrases de notre corpus de test.
منابع مشابه
Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کاملUnsupervised Parse Selection for HPSG
Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as t...
متن کاملRanking of units by anti-ideal DMU with common weights
Data envelopment analysis (DEA) is a powerful technique for performance evaluation of decision making units (DMUs). One of the main objectives that is followed in performance evaluation is discriminating among efficient DMUs to provide a complete ranking of DMUs. DEA successfully divides them into two categories: efficient DMUs and inefficient DMUs. The DMUs in the efficient category have ident...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملDesigning Features for Parse Disambiguation and Realisation Ranking
We present log-linear models for use in the tasks of parse disambiguation and realisation ranking in German. Forst (2007a) shows that by extending the set of features used in parse disambiguation to include more linguistically motivated information, disambiguation results can be significantly improved for German data. The question we address in this paper is to what extent this improved set of ...
متن کامل