Refining the Automatic Identification of Conceptual Relations in Large-scale Corpora

نویسندگان

  • Alex Collier
  • Mike Pacey
  • Antoinette Renouf
چکیده

In the ACRONYM Project, we have taken the Firthian view (e.g. Firth 1957) that context is part of the meaning of the word, and measured similarity of meaning between words through second-order collocation. Using large-scale, free text corpora of UK journalism, we have generated collocational data for all words except for highfrequency grammatical words, and have found that semantically related word pairings can be identified, whilst syntactic relations are disfavoured. We have then moved on to refine this system, to deal with multi-word terms and identify changing conceptual relationships across time. The system, conceived in the late 80's and developed in 1994-97, differs from others of the 90's in purpose, scope, methodology and results, and comparisons will be drawn in the course of the paper.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-Scale Acquisition of Feature-Based Conceptual Representations from Textual Corpora

Methods for estimating people’s conceptual knowledge have the potential to be very useful to theoretical research on conceptual semantics. Traditionally, feature-based conceptual representations have been estimated using property norm data; however, computational techniques have the potential to build such representations automatically. The automatic acquisition of feature-based conceptual repr...

متن کامل

Towards Unrestricted, Large-Scale Acquisition of Feature-Based Conceptual Representations from Corpus Data

In recent years a number ofmethods have been proposed for the automatic acquisition of feature-based conceptual representations from text corpora. Such methods could offer valuable support for theoretical research on conceptual representation. However, existing methods do not target the full range of concept-relation-feature triples occurring in human-generated norms (e.g. flute produce sound) ...

متن کامل

Automatic Identification of AltLexes using Monolingual Parallel Corpora

The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signaled by markers outside these inventories (i.e. AltLexes) are...

متن کامل

Acquiring Human-like Feature-Based Conceptual Representations from Corpora

The automatic acquisition of feature-based conceptual representations from text corpora can be challenging, given the unconstrained nature of human-generated features. We examine large-scale extraction of conceptrelation-feature triples and the utility of syntactic, semantic, and encyclopedic information in guiding this complex task. Methods traditionally employed do not investigate the full ra...

متن کامل

Automatic extraction of property norm-like data from large text corpora

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of uncons...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998