نتایج جستجو برای: واژه سازی word building

تعداد نتایج: 440903  

2015
Krishna N. Kaliannan Adam Kapelner Krishna Kaliannan Dean Foster Lyle Ungar

We identified features that drive differential accuracy in word sense disambiguation (WSD) by building regression models using 10,000 coarse-grained WSD instances which were labeled on Mturk. Features predictive of accuracy include properties of the target word (word frequency, part of speech, and number of possible senses), the example context (length), and the Turker’s engagement with our tas...

Journal: :CoRR 2017
Willie Boag Hassan Kané

In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of words. Clinical natural language processing datasets, however, tend to be much smaller. Even the largest publicly-available dataset of medical notes is three o...

2006
Mikko Kurimo Antti Puurula Ebru Arisoy Vesa Siivola Teemu Hirsimäki Janne Pylkkönen Tanel Alumäe Murat Saraclar

It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambig...

2008
Brendan T. Johns Michael N. Jones

We propose a method to derive predictions for single-word retrieval times from a semantic space model trained on text corpora. In Experiment 1 we present a large corpus analysis demonstrating that it is the number of unique semantic contexts a word appears in across language, rather than simply the number of contexts or the frequency of the word, that is the most salient predictor of lexical de...

2004
Jia Xu Richard Zens Hermann Ney

In Chinese texts, words are not separated by white spaces. This is problematic for many natural language processing tasks. The standard approach is to segment the Chinese character sequence into words. Here, we investigate Chinese word segmentation for statistical machine translation. We pursue two goals: the first one is the maximization of the final translation quality; the second is the mini...

1989
Peter Norvig

In this paper I present a number of studies of individual lexical items demonstrating the incredible variety of idiosyncratic properties that occur in the lexicon of English. While many current theories ignore this level of detail, I argue that a complete on-line lexicon suitable for a variety of linguistic and computational uses must include much more information than is available in any exist...

2008
Mohammad Bahrani Hossein Sameti Nazila Hafezi Saeedeh Momtazi

In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for Persian continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means algorithm. Using this method causes a r...

2011
Khin Thandar Nwet

Word alignment in bilingual corpora has been an active research topic in the Machine Translation research groups. In this paper, we describe an alignment system that aligns English-Myanmar texts at word level in parallel sentences. Essential for building parallel corpora is the alignment of translated segments with source segments. Since word alignment research on Myanmar and English languages ...

2009
Ariya Rastrow Abhinav Sethy Bhuvana Ramabhadran Frederick Jelinek

This paper presents the advantages of augmenting a word-based system with sub-word units as a step towards building open vocabulary speech recognition systems. We show that a hybrid system which combines words and data-driven, variable length sub word units has a better phone accuracy than word only systems. In addition the hybrid system is better in detecting Out-Of-Vocabulary (OOV) terms and ...

2001
Wuu Yang Pin-Chia Feng

When learning a new foreign language, a non-native speaker met more problems in using a word than in understanding the meaning of the word. The Monona system is a useful tool for learners of English that provides a user with many sample sentences containing a queried word. Monona builds a database of the high-quality articles that are available freely in the internet and an accompanied index. S...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید