Pattern Dictionary Development Based on Non-compositional Language Model for Japanese Compound and Complex Sentences
نویسندگان
چکیده
A large-scale sentence pattern dictionary (SP-dictionary) for Japanese compound and complex sentences has been developed. The dictionary has been compiled based on the non-compositional language model. Sentences with 2 or 3 predicates are extracted from a Japanese-to-English parallel corpus of 1 million sentences, and the compositional constituents contained within them are generalized to produce a SP-dictionary containing a total of 215,000 pattern pairs. In evaluation tests, the SP-dictionary achieved a syntactic coverage of 92% and a semantic coverage of 70%.
منابع مشابه
Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences
To realize high quality machine translation, we proposed a Non-Compositional Language Model, and developed a sentence pattern dictionary of 226,800 pattern pairs for Japanese compound and complex sentences consisting of 2 or 3 clauses. In pattern generation from a parallel corpus, Compositional Constituents that could be generalized were 74% of independent words, 24% of phrases and only 15% of ...
متن کاملDevelopment of Semantic Pattern Dictionary for Non-linear Structures of Complex and Compound Sentences
has been compiled on Semantically Classified Sentence Pattern Dictionary Semantic Typology Analogical Mapping the basis of in order to develop an for MT. This dictionary includes 221,563 which Method Semantic Patterns have been generated from Japanese compound and complex sentences. The patterns have been made up in the semi-automatic manner using a set of variables (of full words) and function...
متن کاملAnalogical Mapping Method and Semantic Categorization of Japanese Compound and Complex Sentence Patterns
To overcome the limit of the conventional machine translation (MT) method based on compositional semantics, we proposed an Analogical Mapping (AM) method based on Semantic Typology and built a semantic category system for Japanese compound and complex sentences. The AM-method maps linguistic expressions into other expressions with the same meaning with semantic categorization (based on concepts...
متن کاملStress Pattern System in Central Sarawani Balochi
The present article investigates the stress pattern system of Central Sarawani Balochi (CSB), spoken in Sarawan located in Sistan and Baluchestan province of the Islamic Republic of Iran, based on metrical theory as developed in Hayes (1995). Correspondingly, the present research illustrates the position of primary and secondary stress in mono-morphemic words, verbal paradigms, compound words, ...
متن کاملAnalyzing and Aligning German compound nouns
In this paper, we present and evaluate an approach for the compositional alignment of compound nouns using comparable corpora from technical domains. The task of term alignment consists in relating a source language term to its translation in a list of target language terms with the help of a bilingual dictionary. Compound splitting allows to transform a compound into a sequence of components w...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Comput. Proc. Oriental Lang.
دوره 20 شماره
صفحات -
تاریخ انتشار 2006