Screffva: A Lexicographer's Workbench
نویسنده
چکیده
This paper describes the implementation of Screffva, a computer system written in Prolog that employs a parallel corpus for the automatic generation of bilingual dictionary entries. Screffva provides a lemmatised interface between a parallel corpus and its bilingual dictionary. The system has been trialled with a parallel corpus of Cornish-English bitext. Screffva is able to retrieve any given segment of text, and uniquely identifies lexemes and the equivalences that exist between the lexical items in a bitext. Furthermore the system is able to cope with discontinuous multiword lexemes. The system is thus able to find glosses for individual lexical items or to produce longer lexical entries which include part-of-speech, glosses and example sentences from the corpus. The corpus is converted to a Prolog text database and lemmatised. Equivalents are then aligned. Finally Prolog predicates are defined for the retrieval of glosses, part-of-speech and example sentences to illustrate usage. Lexemes, including discontinuous multiword lexemes, are uniquely identified by the system and indexed to their respective segments of the corpus. Insofar as the system is able to identify specific translation equivalents in the bitext, the system provides a much more powerful research tool than existing concordancers such as ParaConc, WordSmith, XCorpus and Multiconcord. The system is able to automatically generate a bilingual dictionary which can be exported and used as the basis for a paper dictionary. Alternatively the system can be used directly as an electronic bilingual dictionary.
منابع مشابه
An Evaluation of a Lexicographer's Workbench: building lexicons For Machine Translation
NLP system developers and corpus lexicographers would both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation (MT). We have developed the WASPBENCH, a tool that (1) presents a "word sketch", a summary of the corpus evi...
متن کاملWASPBENCH: a lexicographer's workbench incorporating state-of-the-art word sense disambiguation
Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and synt...
متن کاملAn Evaluation of a Lexicographer's Workbench Incorporating Word Sense Disambiguation
NLP system developers and corpus lexicographers would both bene t from a tool for nding and organizing the distinctive patterns of use of words in texts Such a tool would be an asset for both language research and lexicon development particularly for lexicons for Machine Translation We have developed the waspbench a tool that presents a word sketch a summary of the corpus evidence for a word to...
متن کاملDevelopment of a Corpus Workbench for the METU Turkish Corpus
We will introduce a corpus workbench designed and implemented for the METU Turkish Corpus. The workbench design introduces a number of useful features and the workbench itself is basically usable with any TEI and XML compliant corpus, provided that it can be indexed in the format required by the workbench.
متن کاملTowards a workbench for schema-TAGs
In the following the components of a workbench for the grommar formalism of Schema-Tree Adjoining Grammars (S-TAGs) are outlined. This workbench can also serve as a workbench for pure TA Gs because it provides a component which transforms an arbitrory TAG into an S-TAG in a non-trivial manner. Another interesting property of the workbench is that it provides a parser, which is realized as a rev...
متن کامل