Correcting Errors in Digital Lexicographic Resources Using a Dictionary Manipulation Language
نویسندگان
چکیده
We describe a paradigm for combining manual and automatic error correction of noisy structured lexicographic data. Modifications to the structure and underlying text of the lexicographic data are expressed in a simple, interpreted programming language. Dictionary Manipulation Language (DML) commands identify nodes by unique identifiers, and manipulations are performed using simple commands such as create, move, set text, etc. Corrected lexicons are produced by applying sequences of DML commands to the source version of the lexicon. DML commands can be written manually to repair one-off errors or generated automatically to correct recurring problems. We discuss advantages of the paradigm for the task of editing digital bilingual dictionaries.
منابع مشابه
A Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملDesign and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملWord Formation and the Validation of Lexical Resources
In the framework of Word Manager (WM), morphological dictionaries are produced by the classification of lexemes in terms of a rule database. The intricate structure of the resulting lexical resources, conceived primarily for flexible use, also offers novel opportunities for the validation of the lexical specification. Many of the inconsistencies and errors encountered in lexical specification i...
متن کاملWaldayu and Waldayu Mobile: Modern digital dictionary interfaces for endangered languages
We introduce Waldayu and Waldayu Mobile, web and mobile front-ends for endangered language dictionaries. The Waldayu products are designed with the needs of novice users in mind – both novices in the language and technological novices – and work in tandem with existing lexicographic databases. We discuss some of the unique problems that endangeredlanguage dictionary software products face, and ...
متن کاملAn Analysis towards the Development of Electronic Bilingual Dictionary (Manipuri-English) -A Report
Meaningful sentences are composed of meaningful words; any system that hopes to process natural languages as people do must have information about words, their meaning and relative words in another language. This information is traditionally provided through electronic dictionaries. But these dictionary entries evolved for the convenience of human readers, not for machines. So, machine readable...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1410.7787 شماره
صفحات -
تاریخ انتشار 2011