نتایج جستجو برای: vocabulary coverage

تعداد نتایج: 111577  

2007
Kerstin Bach Alexandre Hanft

In this paper we will introduce a measure of saturation for unstructured texts of unknown domains. Therefore we will present the Textual Coverage Rate (TCR), a method to determine the IE coverage of unstructured texts using a given vocabulary. We advance efficiency while building vocabulary repositories tailored for given problems and ensure a certain quality of representation. Our approach, wh...

2015
Amittai Axelrod Philip Resnik Xiaodong He Mari Ostendorf

We present a method that improves data selection by combining a hybrid word/part-of-speech representation for corpora, with the idea of distinguishing between rare and frequent events. We validate our approach using data selection for machine translation, and show that it maintains or improves BLEU and TER translation scores while substantially improving vocabulary coverage and reducing data se...

2007
Mathias Creutz Teemu Hirsimäki Mikko Kurimo Antti Puurula Janne Pylkkönen Vesa Siivola Matti Varjokallio Ebru Arisoy Murat Saraclar Andreas Stolcke

We analyze subword-based language models (LMs) in large-vocabulary continuous speech recognition across four “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. By estimating n-gram LMs over sequences of morphs instead of words, better vocabulary coverage and reduced data sparsity is obtained. Standard word LMs suffer from high out-of-vocabulary (OOV) r...

2011
Paul Maergner Ian Lane Alex Waibel

In this work, we investigate methods to automatically adapt our simultaneous lecture translation systems to the diverse topics that occur in educational lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates lecture-specific models and vocabularies b...

2015
Ming Sun Yun-Nung Chen Alexander I. Rudnicky

Ensuring language coverage in dialog systems can be a challenge, since the language in a domain may drift over time, creating a mismatch between the original training data and current input. This in turn degrades performance by increasing misunderstanding and eventually leading to task failure. Without the capability of adapting the vocabulary and the language model based on certain domains or ...

2002
Helin Dutagaci Levent M. Arslan

This paper gives a comparison of three language models proposed as alternatives to word-based language model for large vocabulary speech recognition of Turkish. Turkish is an agglutinative language and has morphological productivity. This results in a huge vocabulary size and a large number of out of vocabulary words for unseen test data. The solution is to parse the words, in order to get smal...

2009
Dong Yang Yi-Cheng Pan Sadaoki Furui

Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...

Journal: :Traektoriâ nauki 2022

The article deals with the interactive relationship between common vocabulary and medical terminology. In recent years, terminology, against background of current global events, has actively begun to move into language. With this fact in mind, study attempts justify process from a theoretical point view. Descriptive-comparative empirical methods scientific analysis have been used research. It i...

2008
George R S Weir Toshiaki Ozasa

In this paper we describe our analysis of vocabulary across three sets of Japanese ESL texts. We focus upon frequency analysis of individual words and multiword sequences (n-grams), giving cross comparisons of 2, 3 and 4-gram multiword sequences. In addition, we consider the degree of emphasis on multiword vocabulary that is evident in each textbook corpus. This is derived from analysis of the ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید