Discovery of linguistic relations using lexical attraction

نویسنده

  • Deniz Yuret
چکیده

This work has been motivated by two long term goals: to understand how humans learn language and to build programs that can understand language. Using a representation that makes the relevant features explicit is a prerequisite for successful learning and understanding. Therefore, I chose to represent relations between individual words explicitly in my model. Lexical attraction is defined as the likelihood of such relations. I introduce a new class of probabilistic language models named lexical attraction models which can represent long distance relations between words and I formalize this new class of models using information theory. Within the framework of lexical attraction, I developed an unsupervised language acquisition program that learns to identify linguistic relations in a given sentence. The only explicitly represented linguistic knowledge in the program is lexical attraction. There is no initial grammar or lexicon built in and the only input is raw text. Learning and processing are interdigitated. The processor uses the regularities detected by the learner to impose structure on the input. This structure enables the learner to detect higher level regularities. Using this bootstrapping procedure, the program was trained on 100 million words of Associated Press material and was able to achieve 60% precision and 50% recall in finding relations between content-words. Using knowledge of lexical attraction, the program can identify the correct relations in syntactically ambiguous sentences such as “I saw the Statue of Liberty flying over New York.” Thesis Supervisor: Patrick H. Winston Title: Ford Professor of Artificial Intelligence and Computer Science

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Attraction Models of Language

This paper presents lexical attraction models of language, in which the only explicitly represented linguistic knowledge is the likelihood of pairwise relations between words. This is in contrast with models that represent linguistic knowledge in terms of a lexicon, which assigns categories to each word, and a grammar, which expresses possible combinations in terms of these categories. The word...

متن کامل

Linguistic Means of Description of Family Relations in the Novel “In Chancery” By J. Galsworthy

The article is devoted to the study of the evaluative component of the meaning of lexical means used to describe relations between family members in the novel “In Chancery” by J. Galsworthy. The relevance of t &he study can be attributed to the lack of works devoted to this problem. As the results of our study demonstrate, the words of the lexical-semantic group “family” were mainly used to ver...

متن کامل

Learning Lexical Semantic Relations using Lexical Analogies — Extended Abstract

Linguistic ontologies, most notably WordNet [1], have been shown to be a valuable resource for a variety of natural language processing applications. Presently, linguistic ontologies are largely constructed by hand, which is both difficult and expensive. A central problem that demands an automated solution is the discovery and incorporation of lexical semantic relations, or semantic relations b...

متن کامل

Lexical Discovery with an Enriched Semantic Network

The study of lexical semantics has produced a systematic analysis of relationships between content words that has greatly bene ted both lexical search tools and natural language processing systems. We describe research toward a common algorithmic core for these two applications. We rst introduce a database system called FreeNet that facilitates the description and exploration nite binary relati...

متن کامل

Constrained Lexical Attraction Models

Lexical Attraction Models (LAMs) were first introduced by Deniz Yuret in (Yuret 1998) to exemplify how an algorithm can learn word dependencies from raw text. His general thesis is that lexical attraction is the likelihood of a syntactic relation. However, the lexical attraction acquisition algorithm from (Yuret 1998) does not take into account the morpho-syntactical information provided by a p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره cmp-lg/9805009  شماره 

صفحات  -

تاریخ انتشار 1998