Multiword Units In An MT Lexicon
نویسنده
چکیده
Multiword units significantly contribute to the robustness of MT systems as they reduce the inevitable ambiguity inherent in word to word matching. The paper focuses on a relatively little studied kind of MW units which are partially fixed and partially productive. In fact, MW units will be shown to form a continuum between completely frozen expression where the lexical elements are specified at the level of particular word forms and those which are produced by syntactic rules defined in terms of general part of speech categories. The paper will argue for the use of local grammars proposed by Maurice Gross to capture the productive regularity of MW units and will illustrate a uniform implementation of them in the NooJ grammar development framework.
منابع مشابه
On multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کاملUnsupervised Multiword Segmentation of Large Corpora using Prediction-Driven Decomposition of n-grams
We present a new, efficient unsupervised approach to the segmentation of corpora into multiword units. Our method involves initial decomposition of common n-grams into segments which maximize within-segment predictability of words, and then further refinement of these segments into a multiword lexicon. Evaluating in four large, distinct corpora, we show that this method creates segments which c...
متن کاملMore Than Words: The Role of Multiword Sequences in Language Learning and Use
The ability to convey our thoughts using an infinite number of linguistic expressions is one of the hallmarks of human language. Understanding the nature of the psychological mechanisms and representations that give rise to this unique productivity is a fundamental goal for the cognitive sciences. A long-standing hypothesis is that single words and rules form the basic building blocks of lingui...
متن کاملManaging Multiword Expressions in a Lexicon-Based Sentiment Analysis System for Spanish
This paper describes our approach to managing multiword expressions in Sentitext, a linguistically-motivated, lexicon-based Sentiment Analysis (SA) system for Spanish whose performance is largely determined by its coverage of MWEs. We defend the view that multiword constructions play a fundamental role in lexical Sentiment Analysis, in at least three ways. First, a significant proportion convey...
متن کاملDictionary of Multiword Expressions for Translation into highly Inflected Languages
Treatment of Multiword Expressions (MWEs) is one of the most complicated issues in natural language processing, especially in Machine Translation (MT). The paper presents dictionary of MWEs for a English-Latvian MT system, demonstrating a way how MWEs could be handled for inflected languages with rich morphology and rather free word order. The proposed dictionary of MWEs consists of two constit...
متن کامل