Interlingua Approximation: A Generation-Heavy Approach
نویسندگان
چکیده
To date, construction of interlingual resources continues to be a labor-intensive process—often resulting in knowledge-based systems that suffer from a lack of robustness. Such systems may work well on certain types of phenomena, but their complex knowledge-based foundation makes them difficult to extend to new phenomena or languages. We adopt the view that it is possible to approximate the depth of knowledge-based interlingual systems by tapping into the richness of target-language (TL) resources (i.e., English, in our projects) and using this information to map the source-language (SL) input to the English output. A key feature of our approach is the use of some, but not all, components of an interlingual representation (e.g., the top-level primitives and basic argument structure) to map representations associated with a resource-poor language into those of a resource-rich language. The approach lends itself to the generation of multiple sentences that are statistically pared down so that the most likely sentence is generated according to the constraints of the TL. Consider the oft-cited Spanish example, “Yo le di puñaladas a John” (I gave knifewounds to John, i.e., “I stabbed John”). Such cases are traditionally handled in interlingual systems by means of decomposition into a conceptual representation (Dorr, 1993). We espouse a more economical approach that uses the structure of syntactic dependencies coupled with knowledge encoded in the Lexical Conceptual Structure Verb Database (LVD) of (Dorr, 2001). More specifically, rather than mapping the SL input into a representation with the full range of interlingual components, this simpler approach uses only the argument structure of the input dependency tree and top-level conceptual nodes (such as the “CAUSE GO”) coupled with thematic-role information. In order to produce a TL (English) sentence from this representation, the top-level conceptual nodes are first checked for possible matches—and then conflated arguments (the STABN node below) are potentially absorbed into other predicate positions, as long as there is a relation between the conflated argument and the new predicate node, disregarding part-of-speech (in this case STABV). This process is shown pictorially below.
منابع مشابه
Generating Arabic Text from Interlingua
In this paper, we describe a grammarbased generation approach for taskoriented interlingua-based spoken dialogue that transforms a shallow semantic interlingua representation called Interchange Format (IF) into Arabic Text that corresponds to the intentions underlying the speakers' utterances. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating throu...
متن کاملA Language-Independent System for Generating Feature Structures from Interlingua Representations
Two main problems in natural language generation are lexical selection and syntactic structure determination In interlingua approach to machine translation determining sentence structures becomes more di cult especially when the interlingua does not contain any syntactic information In this paper a knowledge based computational model which handles these two problems in interlingua approach is p...
متن کاملApproximating an Interlingua in a Principled Way
We address the problem of constructing in a principled way an ontology of terms to be used in an interlingua for machine translation. Given our belief that the a true language-neutral ontology of terms can only be approached asymp-totically, the construction method outlined involves a step-wise folding in of one language at a time. This is effected in three steps: first building for each langua...
متن کاملApproach to interchange-format based Chinese generation
Interlingua-based machine translation is an important approach to implement multi-lingual speech-to-speech (S2S) translation. The natural language generation (NLG) is one of the key components in the interlingua-based machine translation systems. This paper introduces our approach to Chinese generation based on the Interchange Format (IF) developed by the C-STAR organization. In our approach, t...
متن کاملCapturing Language-Specific Semantic Distinctions in Interlingua-Based MT
We describe an interlingua-based approach to machine translation, in which a DRS representation of the source text is used as the interlingua representation. A target DRS is then created and used to construct the target text. We describe several advantages of this level of representation. We also argue that problems of translation mismatch and divergence should properly bo viewed not as transla...
متن کامل