Interaction between syntax and semantics: The case of gerund translation

نویسنده

  • Stephan Mehl
چکیده

Standard architectures favor sequential processes for the semantic and syntactic part of natural language generation. Some semantic decisions, however, require information from the syntactic part, as is shown at the example of translating English gerunds into German. As a solution to this problem, a model is proposed in which syntactic and semantic modules work in parallel, computing preference values for different translation variants. On the basis of these preferences, one variant is selected and further elaborated. This model draws on a maximum of information, including stylistic considerations, while avoiding the computational load of backtracking processes. 1. The need for interaction 1.1 The role of lexical choice Traditionally, the process of text generation is split into a strategic and a tactical phase. Both may be subdivided; among others, the tactical component contains two distinct phases of lexical choice and syntactic realization. This paper will challenge two features usually coming along with this division of labour: first, that there is usually not much choice involved in lexical choice; and second, that these phases are arranged in a strictly sequential way. Fortunately, almost any content can be formulated in a number of different ways. These variants of saying the same in other words may differ lexically, syntactically, stylistically, or, to a certain degree, even semantically (e.g. by a slight change of focus that may be irrelevant in most contexts). Sometimes, there are pragmatic reasons for repeating a certain content in other words, i.e. paraphrasing it (cf. Lenke 1995). In most cases, however, the tactical component of a generation system will simply have to decide on one or the other way of formulating a given content. Since the syntactic part of generation largely depends on the syntactic properties of the words involved, it is the phase of lexical choice that carries most of the responsibility for the resulting text. Many natural language generation systems, however, content themselves with producing one single form per content. Concepts are assigned to lexical items in a 1:1-fashion, conceptual roles corresponding to case frames of verbs. (The term "lexical item" is intended to cover both single words as well as phrases, may they be frozen or subject to morphosyntactic variation. This includes even descriptive phrases used in case of lexical gaps (cf. e.g. Sondheimer et al. 1990). The following discussion will focus on words, but can easily be extended to phrases.) If there is some choice, i.e. a 1:n-assignment of concepts to words, it is important to distinguish whether this choice is determined semantically or by syntactic and/or stylistic considerations. The first case typically appears if the concepts in use are so abstract that they do not specify semantic features that distinguish different lexemes of a particular language. The most extreme example of this is conceptual dependency theory as embodied in Goldman's (1975) generation system BABEL. However, this is no real situation of choice in the sense that whatever alternative one chooses, the result might be acceptable. Instead, selecting a different word will yield different contents of a sentence. This does not mean that choice will be considered an entirely arbitrary matter here; since no two linguistic forms can be used equally well in all situations, there will always be good reasons for deciding on one or the other alternative. But our interpretation of lexical choice will be that of choosing between semantically (near-)equivalent words or phrases. A number of researchers have recently suggested improvements of the classical 1:1assignment. The following variants have been considered: • synonyms: the choice among synonymous or near-synonymous words and phrases • different combinations of concepts: A single word may cover a combination of concepts. An example of this approach is Horacek's (1990) model of zoom schemata. Since concepts might be combined in different ways, alternative formulations of the same content might arise. • mapping of concepts to syntactic functions: In some cases, parts of the conceptual representation might either be rendered by lexemes or by syntactic functions (as in Peter's car vs. The car owned by Peter; cf. Horacek 1990). While conveying the same semantic content, one aspect of these variants is that they may differ with respect to their syntactic behavior. The focus of this paper will be on different syntactic features of these synonyms as a criterion for the choice among them. Other researchers have investigated different criteria; cf. e.g. the pragmatic features discussed by Hovy 1988. Furthermore, this paper will concentrate on the comparison of lexemes (namely, a verb and its nominalization) rather than phrases. The determination of nominalizations is taken for granted; research in the paradigm of Mel'cuk's Meaning-Text Theory has provided many details on this subject (see e.g. Iordanskaja et al. 1991). For further aspects of lexicalization, see the survey by Stede (forthcoming). 1.2 Syntax and semantics Since near-synonyms may slightly deviate from the contents stipulated by the strategic component, the latter might have to sanction the choice of such a lexeme. This is a first indication that the classical sequential architecture should be replaced by some kind of interaction between the components. The aim of this paper is to investigate the need for such an interaction between the phases of lexical choice and syntactic generation. In general, it is only possible to combine the chosen lexemes to a well-formed sentence if exactly one of them is a verb, and the others correspond to the pattern of optional and obligatory complements of this verb. It is part of the tactical component's duty to partition the contents to be verbalized into sentences in such a way that the lexemes assigned to form one sentence fulfil the above-mentioned conditions. Because of the usual correspondence between concepts and words as well as between conceptual roles, semantic case frames and syntactic government frames, this process is no problem in most generation systems. However, difficulties arise in the following two cases: a) in machine translation and in multilingual generation, since government frames are language-specific, b) in choosing between synonyms or near-synonyms with different syntactic features. An obvious case for b) is the nominalization of verbs and adjectives. Most generation systems work on a restricted domain in which a simplistic view of parts of speech is employed: nouns correspond to objects, verbs correspond to actions or states, adjectives correspond to properties. Of course, this need not be the case in general; the lexical system of every language allows for a multitude of derivations so that a concept denoting an action might as well be represented by a noun or an adjective instead of a verb. This is a powerful instrument that might give the text planner ample possibilities for combining contents into one sentence, thus avoiding the production of tedious chains of sentences. Unfortunately, these possibilities are constrained by a number of language-specific idiosyncrasies: Not always is a desired derivative part of the lexical system, or if it is, it might be unusual or ambiguous. In addition to that, since most concepts are associated with role fillers, their lexical counterparts must be attached to the corresponding words as complements or modifiers, too. The resulting construction may be either ambiguous or clumsy (or both). Even worse, a derivative might only allow for a smaller number of complements than the original verb did. In those cases, the lexical choice component must avoid using derivatives. However, it can only do so on the basis of syntactic knowledge; that is, a failure of the syntactic generation component must trigger a revision of earlier decisions. In the following section 2, a detailed example for this catalogue of criteria will be given. Section 3 will discuss possibilities of designing the interface between syntax and semantics accordingly, as well as describe the implementation of a detailed model. Note that most of the above-mentioned criteria don't yield binary ratings that accept or completely discard a solution. Instead, there will be a tradeoff between better and worse values on different dimensions of evaluation. In section 3 below, it will be shown how the best result might evolve in the course of the generation process. 2. A case study: Translating gerunds As an illustration of the complex of problems described in section 1, an example from the field of machine translation shall be used, namely, the translation of English gerund constructions into German. German does not permit gerund constructions; instead, either subordinate clauses or nominalizations have to be used. Not always are both alternatives possible, and if they are, they may not be equally fortunate. The following examples illustrate the factors mentioned in section 1.2; they are taken from Lyons 1977 (vol. I,II) and its translation into German (vol. I, 1980; vol. II,1983; page numbers added in brackets): 1. availability of a nominal derivative Before embarking upon the discussion of this question, ... (429) There is no German noun expressing the act of embarking (upon a discussion), hence the only translation possible is a subordinate clause with a finite verb: Bevor wir uns der Diskussion dieser Frage widmen, ... (II:59) Note that the German subordinate clause requires a subject that is not explicitly mentioned in the English gerund clause. 2. ambiguity (in the respective context) and stylistic features of this noun For example, 'Abiogenesis is spontaneous generation' can be understood as expressing, indirectly, a proposition about 'abiogenesis' ... (417) As a matter of fact, there is a German noun denoting the act of expressing, namely Ausdruck. However, this noun is ambiguous in a similar way as the English expression: it can mean an act (as in Take these flowers as an expression of our gratitude.) as well as an object (as in This is not a well-formed logical expression .) Using this noun as a description of a sentence will lead to an ambiguity that can be avoided easily by translating the gerund as a subordinate clause: Zum Beispiel kann der Satz 'Urzeugung ist selbsttätiges Entstehen von Leben' so verstanden werden, daß mit ihm indirekt eine Proposition über das Wort 'Urzeugung' ausgedrückt werden soll ... (II:48) Note that not every lexical ambiguity will enforce this decision: Readings that can easily be discarded in a particular context will not hinder the interpretation of an ambiguous word. In the same vein, rare words (or readings) as well as words that do not fit the stylistic register of the text will decrease its readability (see also Mehl 1994). 3. possibility to combine the noun with all complements The fact that the term 'expression' is in existence does not, of course, constitute sufficient reason for distinguishing it from 'lexeme', on the one hand, and from 'form', on the other. (23) In this example, one complement of the gerund verb consists of a pronoun. The respective German noun, however, does not permit the use of a pronoun complement (*seine Unterscheidung von ...). This leaves us with a subordinate clause: Die Tatsache, daß der Terminus 'Ausdruck' existiert, stellt jedoch natürlich keinen genügenden Grund dafür dar, ihn von 'Lexem' einerseits und 'Form' andererseits zu unterscheiden. (I:36) 4. syntactic and semantic ambiguity of this combination It is not always the case that pronouns exclude the use of a noun. The following example Having made this point and given it due emphasis, ...(12) might be translated as Nach der Feststellung dieses Punktes und seiner gebührenden Betonung ... However, seiner gebührenden Betonung might as well mean its having given emphasis. This is a systematic ambiguity that always occurs when a constituent in the German genitive might fit different semantic roles. In such cases, the subordinate clause variant makes relations more explicit. (In this example, the translator chose to drop some of the original contents: Nachdem wir diesen Punkt gebührend betont haben ...(I:25)). 5. stylistic evaluation of both alternatives Even if its interpretation is unambiguous, a noun phrase containing several modifiers and complements is hard to understand. In any case, constructions with multiply embedded noun phrases (which are easy to build in German) should be avoided. In the following example, a gerund verb with two complements, one of which is very complex, has actually been translated as a noun, yielding a clause in which head (Einfügung [insertion]) and second modifier (vor 'John' [before 'John']) have been torn apart by the first modifier: [...] it can be made clear by inserting the phrase 'the name', or some similar descriptive expression, before 'John '. (6) [...] dies kann durch die Einfügung des Ausdrucks 'der Name', oder eines ähnlichen beschreibenden Ausdrucks vor 'John' klar werden. (I:20) Translating the gerund by a subordinate clause would have produced a text that is by far easier to understand: [...] dies kann klar werden, wenn man den Ausdruck 'der Name' oder einen ähnlichen beschreibenden Ausdruck vor 'John' einfügt. On the other hand, subordinate clauses may become equally confusing if two many of them are lined up or embedded into each other: One of the problems that arises in describing precisely the relationship that holds between lexemes and expressions [...] (25) Instead of a series of subordinate clauses such as Eines der Probleme, das entsteht, wenn man die Beziehung, die zwischen Lexemen und Ausdrücken besteht, genau beschreibt, [...] the translator chose a nominalization in addition to dropping a subordinate verb (hold): Eines der Probleme, das gerade bei der Beschreibung der Beziehung zwischen Lexemen und Ausdrücken [...] existiert [...] (I:38) The next section will show how this complex of criteria can be brought to bear on the decision for a certain lexical item.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Genre Awareness on English Translation Quality and Pedagogy: A Case of News Reports Translation as an Academic Curriculum

To produce an adequate translation, language students are required to learn varieties of language features including syntax, semantics and pragmatics. Considering the curriculum language learners are face with, one can claim that almost all language students in Iran are taught these features in their academic settings including linguistic courses. Yet, there are some aspects of language which a...

متن کامل

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Translation and Hybridity in Scenes and Frames Semantics

 The present study is a theoretical attempt to illustrate how Fillmore's Scenes and Frames Semantics (SFS) could be employed as a framework to portray the process of understanding and translating hybrid texts. It first reviews the origin of SFS; then it maps SFS onto Nida’s linguistic model of translation process and the Interpretive Theory of Translation; it examines in the next section, withi...

متن کامل

A Proper Treatmemt Of Syntax And Semantics In Machine Translation

A proper treatment of syntax and semantics in machine translation is introduced and discussed from the empirical viewpoint. For EnglishJapanese machine translation, the syntax directed approach is effective where the Heuristic Parsing Model (HPM) and the Syntactic Role System play important roles. For Japanese-English translation, the semantics directed approach is powerful where the Conceptual...

متن کامل

An Investigation of the Linguistic, Paralinguistic and Sociocultural Effects of Input on the Perception and Translation of Gerunds by Persian Speakers of English

In this study, it was intended to investigate the Persian native speakers’ perception of gerunds by three different elicitation techniques i.e., written, audio, and pictorial through translation. Eighty intermediate learners of English were asked to select Persian translation of the gerund formsin these elicitation techniques. They were asked to choose one option from a pair of written first la...

متن کامل

The English Gerund vs . The to - infinitive : The Case of Aspectual Constructions

The present paper is concerned with the complementation patterns of four aspectual verbs, i.e. begin, start, continue and cease, each of which co-occurs with two types of non-finite complementizers – the gerund and the to-infinitive. The paper analyses the distributional properties of each pair of the aspectual constructions (e.g. begin to do/ begin doing) with the corpus-based method known as ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995