Enhancing tagging performance by combining knowledge sources
نویسنده
چکیده
The topic of this paper is an ongoing effort to exploit combinations of existing natural language processing (NLP) resources in order to reach part-of-speech (POS) tagging performance in excess of that which any single resource is able to provide. The context of the effort is the ETAP project, a parallel translation corpus project funded by the Bank of Sweden Tercentenary Foundation. The aim of the project is to create an annotated and aligned multilingual translation corpus which will be used as the basis for the development of methods and tools for the automatic extraction of translation equivalents for applications such as machine translation systems. To this end, we are investigating to which extent it is possible to reuse existing – meaning either developed in our department in some other context, or freely available on the WWW – NLP resources for the task of tagging the languages of the project. As a general rule, we may say that the amount of such resources is growing quite fast at the present time. On the other hand, their availability is highly dependent on the language, from almost unlimited numbers for English,
منابع مشابه
Combining Knowledge Sources For Automatic Semantic Tagging
In this working session, we will discuss methods which could plausibly be used for combining evidence for assigning semantic tags to words in a text. We will discuss methods that apply at knowledge acquisition time to produce a single static knowledge source to be used by a single, complete, semantic tagger, as well as methods for dynamically combining outputs of a set of independent, possibly ...
متن کاملAn evaluation of enhancing social tagging with a knowledge organization system
Traditional subject indexing and classification are considered infeasible in many digital collections. Automated means and social tagging are often suggested as the two possible solutions. Both, however, have disadvantages and, depending on the purpose of use or context, require additional manual input. This study investigates ways of enhancing social tagging via knowledge organization systems,...
متن کاملWord Sense Disambiguation using Optimised Combinations of Knowledge Sources
Word sense disambiguation algorithms, with few exceptions, have made use of only one lexical knowledge source. We describe a system which performs unrestricted word sense disambiguation (on all content words in free text) by combining different knowledge sources: semantic preferences, dictionary definitions and subject/domain codes along with part-of-speech tags. The usefulness of these sources...
متن کاملInvestigating the Use of Paratactic and Hypotactic Conjunctions among Iranian Pre-university Students
In an attempt to dispel the persisting fallacy that an individual’s grammar knowledge is indicative of the way they put this knowledge into practice, this study seeks to highlight the inconsistency which resides between one’s competence and performance in the domain of conjunctions. It aims to shed light on the discrepancy which lies between the knowledge and production of conjunctions. The res...
متن کاملOld Swedish Part-of-Speech Tagging between Variation and External Knowledge
We present results on part-of-speech and morphological tagging for Old Swedish (1225–1526). In a set of experiments we look at the difference between withincorpus and across-corpus accuracy, and explore ways of mitigating the effects of variation and data sparseness by adding different types of dictionary information. Combining several methods, together with a simple approach to handle spelling...
متن کامل