Automatic Lexicon Generation through WordNet

نویسندگان

  • Nitin Verma
  • Pushpak Bhattacharyya
چکیده

A lexicon is the heart of any language processing system. Accurate words with grammatical and semantic attributes are essential or highly desirable for any application – be it machine translation, information extraction, various forms of tagging or text mining. However, good quality lexicons are difficult to construct requiring enormous amount of time and manpower. In this paper, we present a method for automatically generating the dictionary from an input document – making use of the WordNet. The dictionary entries are in the form of Universal Words (UWs) which are language words (primarily English) concatenated with disambiguation information. The entries are associated with syntactic and semantic properties – most of which too are generated automatically. In addition to the WordNet, the system uses a word sense disambiguator, an inferencer and the knowledge base (KB) of the Universal Networking Language which is a recently proposed interlingua. The lexicon so constructed is sufficiently accurate and reduces the manual labour substantially.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WordNet-based lexical simplification of a document

We explore algorithms for the automatic generation of a limited-size lexicon from a document, such that the lexicon covers as much as possible of the semantic space of the original document, as specifically as possible. We evaluate six related algorithms that automatically derive limited-size vocabularies from Wikipedia articles, focusing on nouns and verbs. The proposed algorithms combine Pers...

متن کامل

Usage of WordNet in Natural Language Generation

WordNet has rarely been applied to natural language generation, despite of its wide application in other fields. In this paper, we address three issues in the usage of WordNet in generation: adapting a general lexicon like WordNet to a specific application domain, how the information in WordNet can be used in generation, and augmenting WordNet with other types of knowledge that are helpful for ...

متن کامل

Automatic Generation Of Multiple Choice Questions From Domain Ontologies

The aim of this paper is to present an innovative approach for generating multiple choice questions in automatic way. Although other approaches have been already reported in the literature, the approach presented in this paper is based on domain specific ontologies and it is independent of lexicons such as WordNet or other linguistic resources. The paper also reports on a first prototype implem...

متن کامل

Tree-Cut and a Lexicon Based on Systematic Polysemy

This paper describes a lexicon organized around systematic polysemy: a set of word senses that are related in systematic and predictable ways. The lexicon is derived by a fully automatic extraction method which utilizes a clustering technique called tree-cut. We compare our lexicon to WordNet cousins, and the inter-annotator disagreement observed between WordNet Semcor and DSO corpora.

متن کامل

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004