نتایج جستجو برای: corpora

تعداد نتایج: 19685  

Journal: :Theory and Practice in Language Studies 2022

Chinese Word Sketch (CWS) provides a tool to identify the semantic distinctions of near synonyms in natural language use situation. This study has case 注 意 Zhùyì and 專 心 Zhuānxīn, with Gigaword_CNA Gigaword_XIN, two main sub-corpora CWS, as corpora for research. Based on comparison senses part speech (POS) from Net dictionaries, it is found that there are some disagreements POS between ones pre...

2014
Jian Zhang

This paper describes Dublin City University’s (DCU) submission to the WMT 2014 Medical Summary task. We report our results on the test data set in the French to English translation direction. We also report statistics collected from the corpora used to train our translation system. We conducted our experiment on the Moses 1.0 phrase-based translation system framework. We performed a variety of ...

Journal: :Computer Speech & Language 2023

This work addresses the cross-corpora generalization issue for low-resourced spoken language identification (LID) problem. We have conducted experiments in context of Indian LID and identified strikingly poor due to corpora-dependent non-lingual biases. Our contribution this is twofold. First, we propose domain diversification, which diversifies limited training data using different audio augme...

2006
Matthias Richter Uwe Quasthoff Erla Hallsteinsdóttir Christian Biemann

In this paper the Leipzig Corpora Collection is introduced as a contribution to the idea that there is need for standardization of multilingual language resources. We explain the steps of building, processing and presenting corpora of comparable sizes and in a uniform format. Results from intraand interlingual comparisons of corpora are given and methods that can build upon these corpora

2014
Ajay Srinivasamurthy Gopala K. Koduri Sankalp Gulati Vignesh Ishwar Xavier Serra

Research corpora are representative collections of data and are essential to develop data-driven approaches in Music Information Research (MIR). We address the problem of building research corpora for MIR in Indian art music traditions of Hindustani and Carnatic music, considering several relevant criteria for building such corpora. We also discuss a methodology to assess the corpora based on t...

Journal: :Prague Bull. Math. Linguistics 2010
Mark Fishel Heiki-Jaan Kaalep

This work introduces amethod and tool for handling overlapping parallel corpora – i.e. corpora that are based on the same source material. The method is insensitive to minor changes in the text, different segmentation levels of the corpora and omitted material from either corpora. The aim is to detect matching sentence pairs and either produce combinations of the overlapping corpora or compare ...

2003
Chris Callison-Burch Miles Osborne

We present two methods for the automatic creation of parallel corpora. Whereas previous work into the automatic construction of parallel corpora has focused on harvesting them from the web, we examine the use of existing parallel corpora to bootstrap data for new language pairs. First, we extend existing parallel corpora using co-training, wherein machine translations are selectively added to t...

2008
Tuomas Talvensaari

Aligned corpora are often-used resources in CLIR systems. The three qualities of translation corpora that most dramatically affect the performance of a corpus-based CLIR system are: (1) topical nearness to the translated queries, (2) the quality of the alignments, and (3) the size of the corpus. In this paper, the effects of these factors are studied and evaluated. Topics of two different domai...

Journal: :CoRR 2016
Yong Cheng Wei Xu Zhongjun He Wei He Hua Wu Maosong Sun Yang Liu

While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semisupervised approach for training NMT model...

2016
Amir Hazem Emmanuel Morin

Comparable corpora are the main alternative to the use of parallel corpora to extract bilingual lexicons. Although it is easier to build comparable corpora, specialized comparable corpora are often of modest size in comparison with corpora issued from the general domain. Consequently, the observations of word co-occurrences which are the basis of context-based methods are unreliable. We propose...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید