Collocations in Multilingual Generation
نویسندگان
چکیده
We present a proposal for the structuring of collocation knowledge 1 in the lexicon of a multilingual generation system and show to what extent it can be used in the process of lexical selection. This proposal is part of Polygloss, a new research project on multilingual generation, and it has been inspired by work carried out in the S EMSYN project (see e.g. [I~(~SNEtt 198812). The descriptive approach presented in this proposal is based on a combination of results from recent lexicographical research and the application of Meaning-Text-Theory (MTT) (see e.g. [MEL'CUK et al. 1981], [MEL'CUK et al. 1984]). We first outline the overall structure of the dictionary system that is needed by a multilingual generator; section 2 gives an overview of the results of lexicographical work on collocations and compares them with "lexical functions" as used in MeaningText-Theory. Section 3 shows how we intend to integrate collocations in the generation dic1We use the term "collocation" in the sense of [HAUSMANN 1985] referring to constraints on the cooccurrence of two lexeme words; the two elements are not completely freely combined, but one of them semantically determines the other one. Examples are for instance solve a problem, turn dark, expose someone to a risk, etc. For a more detailed definition see section 2. 2 Research reported in this paper is supported by the German Bundesministerium fiir Forschung und Technologie, BMFT, under grant No. 08 B 3116 3. The views and conclusions contained herein are those of the authors and should not be interpreted as positions of the project as a whole. tionary and how "lexical functions" can be used in generation. 1 Lexical knowledge for multilingual generation Within a multilingual generation system, it seems necessary to keep the dictionary as modular as possible, separating information that pertains to different levels of linguistic description 3. We assume that the system's lexical knowledge is stored in the following types of "specialized dictionaries": • semantic: inventory of possible lexicalizations of a concept in a given language; syntactic: one inventory of realization classes per language, providing information about number, type and realization of the arguments of a given lexeme; • morphological: one inventory of inflectional classes per language. Since none of these levels of decsription is completely independent, the dictionaries should be linked to each other by means of cross-references and reference to class membership. Templates and mechanisms allowing for explicit inheritance of shared properties, e.g. redundancy rules, will be used within aFor more details on the dictionary structure see [HEID/MOMMA 1989].
منابع مشابه
Collocations in Multilingual Natural Language Generation: Lexical Functions meet Lexical Functional Grammar
In a collocation, the choice of one lexical item depends on the choice made for another. This poses a problem for simple approaches to lexicalisation in natural language generation systems. In the Meaning-Text framework, recurrent patterns of collocations have been characterised by lexical functions, which offer an elegant way of describing these relationships. Previous work has shown that usin...
متن کاملA Recursive Treatment of Collocations
This article discusses the treatment of collocations in the context of a long-term project on the development of multilingual NLP tools. Besides “classical” two-word collocations, we will focus on the case of complex collocations (3 words or more) for which a recursive design is presented in the form of collocation of collocations. Although comparatively less numerous than two-word collocations...
متن کاملFipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
We introduce FipsCoView, an on-line interface for dictionary-like visualisation of collocations detected from parallel corpora using a syntactically-informed extraction method.
متن کاملExtracting collocations and their translations from parallel corpora
Identifying collocations in a text (e.g., break record) and correctly translating them (battre record vs. *casser record) represent key issues in machine translation, notably because of their prevalence in language and their syntactic flexibility. This article describes a method for discovering translation equivalents for collocations from parallel corpora, aimed at increasing the lexical cover...
متن کاملOn-line Multilingual Linguistic Services
In this demo, we present our free on-line multilingual linguistic services which allow to analyze sentences or to extract collocations from a corpus directly on-line, or by uploading a corpus. They are available for 8 European languages and can also be accessed as web services by programs.
متن کامل