The Acquisition of Lexical Semantic Knowledge from Large Corpora

نویسنده

  • James Pustejovsky
چکیده

Machine-readable dictionaries provide the raw material from which to construct computationaily useful representations of the generic vocabulary contained within it. Many sublanguages, however, are poorly represented in on-line dictionaries, ff represented at all. Vocabularies geared to specialized domains are necessary for many applications, such as t ex t categorization and information retrieval. In this paper I describe research devoted to developing techniques for building sublanguage lexicons via syntactic and statistical corpus analysis coupled with analytic techniques based on the tenets of a generative lexicon. 1. I n t r o d u c t i o n Machine-readable dictionaries provide the raw material from which to construct computationally useful representations of the generic vocabulary contained within it. Many sublanguages, however, are poorly represented in on-line dictionaries, if represented at all (cf. Grishman et al (1986)). Yet vocabularies geared to specialized domains are necessary for many applications, such as text categorization and information retrieval. In this paper I describe research devoted to developing techniques for building sublanguage lexicons via syntactic and statistical corpus analysis coupled with analytic techniques based on the tenets of a generative theory of the lexicon

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of Lexical Semantics to Knowledge Acquisition from Corpora

In this paper, we describe a program of research designed to explore.' how a lexical semantic theory may be exploited for extracting information from corpora suitable for use in Information Retrieval applications. Unlike with purely statistical collocational analyses, the framework of a semantic theory allows the ~ultomatic construction of predictions about semantic relationships among words ap...

متن کامل

Combining NLP and statistical techniques for lexical acquisition

The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However the methods by which lexical knowledge should be extracted from plain texts are still matter of debate and experimentation. In this paper it is presented an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines ...

متن کامل

Lexical Database for Multiple Languages: Multilingual Word Semantic Network

Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators an...

متن کامل

Corpus-Based Induction of Lexical Representation and Meaning

The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selec...

متن کامل

Lexical acquisition from corpora: the case of subcategorization frames in French

We present in this paper a method to automatically acquire a syntactic lexicon of subcategorization frames for French verbs directly from large corpora. The method is evaluated against existing lexical resources: we show that our system is capable of producing new frames that were not previously registered. Lastly, we show that it is possible to induce lexico-semantic classes « à la Levin » (19...

متن کامل

Using Web Corpora for the Automatic Acquisition of Lexical-Semantic Knowledge

This article presents two case studies to explore whether and how web corpora can be used to automatically acquire lexical-semantic knowledge from distributional information. For this purpose, we compare three German web corpora and a traditional newspaper corpus on modelling two types of semantic relatedness: (1) Assuming that free word associations are semantically related to their stimuli, w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992