An Experiment in Word Alignment with a Parallel Corpus
نویسنده
چکیده
This report documents an experiment done on word alignment using a parallel , sentence aligned corpus. The languages are English and Japanese and the corpus is derived from the Asahi Shinbun daily newspaper editorials. The aims of the experiment are To nd out how accurate word alignment is with simple pattern matching. To nd out how useful a conventional English-Japanese lexicon is. To observe how well English and Japanese newspaper editorials correspond at the lexical level of transfer. The results show approximately 80% word alignment success using a small bilingual lexicon of 81 English words and a mean 17.4 translations per word. The reason for error is analysed and it is concluded that gaps in the senses of words in the lexicon and stylistic diierences in the English and Japanese are responsible. The shallow level of analysis is not a major problem although it is expected to become more important as coverage is increased.
منابع مشابه
Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...
متن کاملPhrase-based alignment combining corpus cooccurrences and linguistic knowledge
This paper introduces a phrase alignment strategy that seeks phrase and word links in two stages using cooccurrence measures and linguistic information. On a first stage, the algorithm finds high-precision links involving a linguistically-derived set of phrases, leaving word alignment to be performed in a second phase. Experiments have been carried out for an English-Spanish parallel corpus, an...
متن کاملFlow Network Models for Word Alignment and
This paper presents a new model for word alignments between parallel sentences, which allows one to accurately estimate diierent parameters, in a computationally eecient way. An application of this model to bilingual terminology extraction , where terms are identiied in one language and guessed, through the alignment process , in the other one, is also described. An experiment conducted on a sm...
متن کاملCollocation Extraction Using Monolingual Word Alignment Method
Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...
متن کاملAutomatic creation of WordNets from parallel corpora
In this paper we present the evaluation results for the creation of WordNets for five languages (Spanish, French, German, Italian and Portuguese) using an approach based on parallel corpora. We have used three very large parallel corpora for our experiments: DGT-TM, EMEA and ECB. The English part of each corpus is semantically tagged using Freeling and UKB. After this step, the process of WordN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995