Lexicons for Human Language Technology

نویسنده

  • Mark Liberman
چکیده

Information about words--their pronunciation, syntax and meaning--is a crucial and costly part of human language technology. Many questions remain about the best way to express and use such lexical information. Nevertheless, much of this information is common to all current approaches, and therefore the effort to collect it can usefully be shared. The Linguistic Data Consortium (LDC) has undertaken to provide such common lexical information for the community of HLT researchers. The purpose of this paper is to sketch the various LDC lexical projects now underway or planned, and to solicit feedback from the community of HLT researchers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIMPLE: A General Framework for the Development of Multilingual Lexicons

The project LE-SIMPLE is an innovative attempt of building harmonized syntactic-semantic lexicons for 12 European languages, aimed at use in different Human Language Technology applications. SIMPLE provides a general design model for the encoding of a large amount of semantic information, spanning from ontological typing, to argument structure and terminology. SIMPLE thus provides a general fra...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

The EAGLES/ISLE initiative for setting standards: the Computational Lexicon Working Group for Multilingual Lexicons

ISLE (International Standards for Language Engineering), a transatlantic standards oriented initiative under the Human Language Technology (HLT) programme, is a continuation of the long standing EAGLES (Expert Advisory Group for Language Engineering Standards) initiative, and is carried out by European and American groups within the EU-US International Research Co-operation, supported by EC and...

متن کامل

Efficient Development of Lexical Language Resources and their Representation

Statistical approaches in speech technology, whether used for statistical language models, trees, hidden Markov models or neural networks, represent the driving forces for the creation of language resources (LR), e.g., text corpora, pronunciation and morphology lexicons, and speech databases. This paper presents a system architecture for the rapid construction of morphologic and phonetic lexico...

متن کامل

Session 1: Lexicons, Corpora, and Evaluation

Our technologies for collecting, storing, and disseminating vast amounts of information have gotten ahead of our technologies for collating and analyzing it, and that situation has posed a serious challenge for human language technology. As a consequence, natural language processing has been moving rapidly towards large-scale systems addressed to real tasks. Demos that won't scale up are no lon...

متن کامل

lex4all: A language-independent tool for building and evaluating pronunciation lexicons for small-vocabulary speech recognition

This paper describes lex4all, an opensource PC application for the generation and evaluation of pronunciation lexicons in any language. With just a few minutes of recorded audio and no expert knowledge of linguistics or speech technology, individuals or organizations seeking to create speech-driven applications in lowresource languages can build lexicons enabling the recognition of small vocabu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994