Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language

نویسندگان

  • David M. Zajic
  • Michael Maxwell
  • David S. Doermann
  • Paul Rodrigues
  • Michael Bloodgood
چکیده

We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data. Modifications to the structure and underlying text of the lexicographic data are expressed in a simple, interpreted programming language. Dictionary Manipulation Language (DML) commands identify nodes by unique identifiers, and manipulations are performed using simple commands such as create, move, set text, etc. Corrected lexicons are produced by applying sequences of DML commands to the source version of the lexicon. DML commands can be written manually to repair one-off errors or generated automatically to correct recurring problems. We discuss advantages of the paradigm for the task of editing digital bilingual dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Supervised Method for Constructing Sentiment Lexicon in Persian Language

Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

Word Formation and the Validation of Lexical Resources

In the framework of Word Manager (WM), morphological dictionaries are produced by the classification of lexemes in terms of a rule database. The intricate structure of the resulting lexical resources, conceived primarily for flexible use, also offers novel opportunities for the validation of the lexical specification. Many of the inconsistencies and errors encountered in lexical specification i...

متن کامل

Waldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages

We introduce Waldayu and Waldayu Mobile, web and mobile front-ends for endangered language dictionaries. The Waldayu products are designed with the needs of novice users in mind – both novices in the language and technological novices – and work in tandem with existing lexicographic databases. We discuss some of the unique problems that endangeredlanguage dictionary software products face, and ...

متن کامل

An Analysis towards the Development of Electronic Bilingual Dictionary (Manipuri-English) -A Report

Meaningful sentences are composed of meaningful words; any system that hopes to process natural languages as people do must have information about words, their meaning and relative words in another language. This information is traditionally provided through electronic dictionaries. But these dictionary entries evolved for the convenience of human readers, not for machines. So, machine readable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1410.7787  شماره 

صفحات  -

تاریخ انتشار 2011