نتایج جستجو برای: grams

تعداد نتایج: 8929  

2005
Helmer Strik Diana Binnenpoorte Catia Cucchiarini

In this study, we examined the pronunciation characteristics of multiword expressions (MWEs). We first drew up an inventory of frequently occurring N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. For about 10% of these Ngrams phonetic transcriptions were available, which were examined. Our results show that the pronunciation ...

Journal: :Algorithms 2009
Raphael André Bauer Kristian Rother Peter Moor Knut Reinert Thomas Steinke Janusz M. Bujnicki Robert Preissner

This work presents a generalized approach for the fast structural alignment of thousands of macromolecular structures. The method uses string representations of a macromolecular structure and a hash table that stores n-grams of a certain size for searching. To this end, macromolecular structure-to-string translators were implemented for protein and RNA structures. A query against the index is p...

Journal: :International Journal of Medical Informatics 2009

Journal: :Open Computer Science 2021

Abstract Despite the modern boom in technology, we are still faced with fact that people write texts without diacritics. There two main reasons for this. The first, historical reason stems from past when use of diacritics was troublesome and would text them. second one is speed - typing usually faster. Text easy to understand people, but some types documents, missing can cause a problem. This a...

2010
David Guthrie Mark Hepple

We present three novel methods of compactly storing very large n-gram language models. These methods use substantially less space than all known approaches and allow n-gram probabilities or counts to be retrieved in constant time, at speeds comparable to modern language modeling toolkits. Our basic approach generates an explicit minimal perfect hash function, that maps all n-grams in a model to...

Journal: :AMIA ... Annual Symposium proceedings. AMIA Symposium 2007
Ira Goldstein Anna Arzumtsyan Özlem Uzuner

We describe and evaluate three systems for automatically predicting the ICD-9-CM codes of radiology reports from short excerpts of text. The first system benefits from an open source search engine, Lucene, and takes advantage of the relevance of reports to one another based on individual words. The second uses BoosTexter, a boosting algorithm based on n-grams (sequences of consecutive words) an...

Journal: :Computer Speech & Language 2012
Koen Deschacht Jan De Belder Marie-Francine Moens

Statistical language models have found many applications in information retrieval since their introduction almost three decades ago. Currently the most popular models are n-gram models, which are known to suffer from serious sparseness issues, which is a result of the large vocabulary size |V | of any given corpus and of the exponential nature of n-grams, where potentially |V | n-grams can occu...

2012
SRILAXMI CHEETI

There is an increasing amount of user-generated information in online documents, including user opinions on various topics and products such as movies, DVDs, kitchen appliances, etc. To make use of such opinions, it is useful to identify the polarity of the opinion, in other words, to perform sentiment classification. The goal of sentiment classification is to classify a given text/document as ...

2006
Kallirroi Georgila James Henderson Oliver Lemon

We propose the “advanced” n-grams as a new technique for simulating user behaviour in spoken dialogue systems, and we compare it with two methods used in our prior work, i.e. linear feature combination and “normal” n-grams. All methods operate on the intention level and can incorporate speech recognition and understanding errors. In the linear feature combination model user actions (lists of 〈 ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید