نتایج جستجو برای: grams

تعداد نتایج: 8929  

2011
Mohammad Sadoghi Hans-Arno Jacobsen

String tokenization using q-grams maps the string into a high-dimensional vector space model, in which the domain of each dimension is binary. The size of this space is exponential in the length of q-grams. For instance, q-grams of size three results in a space with 26 dimensions. The vector space model representation of a tokenized string (e.g., {‘str’, ‘tri’, ‘rin’, ‘ing’}) can be expressed b...

2014
Yurii Palkovskii Alexei Belov

This paper describes approaches used for the Plagiarism Detection task during PAN 2014 International Competition on Uncovering Plagiarism, Authorship, and Social Software Misuse, that scored 1-st place with plagdet score (0.907) for test corpus no.3 and 3-rd place score (0.868) for test corpus no. 2. In this work we aggregated all the previously researched experience from PAN12 and PAN 13 resea...

2011
Etter Vincent

We first present our work in machine translation, during which we used aligned sentences to train a neural network to embed n-grams of different languages into an d-dimensional space, such that n-grams that are the translation of each other are close with respect to some metric. Good n-grams to n-grams translation results were achieved, but full sentences translation is still problematic. We re...

1997
Anand Natrajan Allison L. Powell James C. French

Retrieval systems based on N-grams have been used as alternatives to word-based systems. N-grams offer a language-independent technique that allows retrieval based on portions of words. A query that contains misspellings or differences in transliteration can defeat word-based systems. N-gram systems are more resistant to these problems. We present a retrieval system based on N-grams that uses a...

2010
David Wible Nai-Lung Tsao

We describe and motivate the design of a lexico-grammatical knowledgebase called StringNet and illustrate its significance for research into constructional phenomena in English. StringNet consists of a massive archive of what we call hybrid n-grams. Unlike traditional n-grams, hybrid n-grams can consist of any co-occurring combination of POS tags, lexemes, and specific word forms. Further, we d...

2015
Upendra Sapkota Steven Bethard Manuel Montes-y-Gómez Thamar Solorio

Character n-grams have been identified as the most successful feature in both singledomain and cross-domain Authorship Attribution (AA), but the reasons for their discriminative value were not fully understood. We identify subgroups of character n-grams that correspond to linguistic aspects commonly claimed to be covered by these features: morphosyntax, thematic content and style. We evaluate t...

2004
Sergio Grau

In this work, we make a contribution to natural speech dialogue act detection. We focus our attention on the dialogue act classification using a Bayesian approach. Our classifier is tested on two corpora, the Switchboard and the Basurde tasks. A combination of a naive Bayes classifier and n-grams is used. The impact of different smoothing methods (Laplace and Witten Bell) and n-grams in classif...

2012
Ariya Rastrow Sanjeev Khudanpur Mark Dredze

Statistical language models used in deployed systems for speech recognition, machine translation and other human language technologies are almost exclusively n-gram models. They are regarded as linguistically naı̈ve, but estimating them from any amount of text, large or small, is straightforward. Furthermore, they have doggedly matched or outperformed numerous competing proposals for syntactical...

Journal: :Computer Speech & Language 2005
Diana Binnenpoorte Catia Cucchiarini Lou Boves Helmer Strik

The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained...

2006
Samir Abdou

In information retrieval, Chinese and Japanese present many challenging problems. Unlike most European languages, the lack of explicit word boundaries represents one of the most important issue for indexing. For this reason, many works proposed different approaches to index documents or requests written in these languages. This article presents a comparison of the common indexing strategies. Mo...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید