n grams

نتایج جستجو برای: n grams

تعداد نتایج: 982486 فیلتر نتایج به سال:

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition

2016

Babak Damavandi Shankar Kumar Noam Shazeer Antoine Bruguier

We present NN-grams, a novel, hybrid language model integrating n-grams and neural networks (NN) for speech recognition. The model takes as input both word histories as well as n-gram counts. Thus, it combines the memorization capacity and scalability of an n-gram model with the generalization ability of neural networks. We report experiments where the model is trained on 26B words. NN-grams ar...

متن کامل

N-Gram Language Modeling for Robust Multi-Lingual Document Classification

2004

Jörg Steffen

Statistical n-gram language modeling is used in many domains like speech recognition, language identification, machine translation, character recognition and topic classification. Most language modeling approaches work on n-grams of terms. This paper reports about ongoing research in the MEMPHIS project which employs models based on character-level n-grams instead of term n-grams. The models ar...

متن کامل

Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval

2010

David Guthrie Mark Hepple

We present three novel methods of compactly storing very large n-gram language models. These methods use substantially less space than all known approaches and allow n-gram probabilities or counts to be retrieved in constant time, at speeds comparable to modern language modeling toolkits. Our basic approach generates an explicit minimal perfect hash function, that maps all n-grams in a model to...

متن کامل

Language-independent text categorization by word N-gram using an automatic acquisition of words

2013

Makoto Suzuki Naohide Yamagishi Yi-Ching Tsai Masayuki Goto

We previously proposed the accumulation method, a language-independent text classification method that is based on character N-grams. The accumulation method does not depend on the language structure because this method uses character N-grams to form

متن کامل

Using x-gram for efficient speech recognition

1998

Antonio Bonafonte José B. Mariño

X-grams are a generalization of the n-grams, where the number of previous conditioning words is different for each case and decided from the training data. X-grams reduce perplexity with respect to trigrams and need less number of parameters. In this paper, the representation of the x-grams using finite state automata is considered. This representation leads to a new model, the non-deterministi...

متن کامل

Authorship Attribution in Portuguese Using Character N-grams

2017

Ilia Markov Jorge Baptista Obdulia Pichardo-Lagunas

For the Authorship Attribution (AA) task, character n-grams are considered among the best predictive features. In the English language, it has also been shown that some types of character n-grams perform better than others. This paper tackles the AA task in Portuguese by examining the performance of different types of character n-grams, and various combinations of them. The paper also experimen...

متن کامل

Multiword expressions in spoken language: An exploratory study on pronunciation variation

Journal: :Computer Speech & Language 2005

Diana Binnenpoorte Catia Cucchiarini Lou Boves Helmer Strik

The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained...

متن کامل

Detection of New Malicious Code Using N-grams Signatures

2004

Tony Abou-Assaleh Nick Cercone Vlado Keselj Ray Sweidan

Signature-based malicious code detection is the standard technique in all commercial anti-virus software. This method can detect a virus only after the virus has appeared and caused damage. Signature-based detection performs poorly when attempting to identify new viruses. Motivated by the standard signature-based technique for detecting viruses, and a recent successful text classification metho...

متن کامل

Automatically identifying code features for software defect prediction: Using AST N-grams

Journal: :Information and Software Technology 2019

متن کامل

ASOBEK at SemEval-2016 Task 1: Sentence Representation with Character N-gram Embeddings for Semantic Textual Similarity

2016

Asli Eyecioglu Bill Keller

A growing body of research has recently been conducted on semantic textual similarity using a variety of neural network models. While recent research focuses on word-based representation for phrases, sentences and even paragraphs, this study considers an alternative approach based on character n-grams. We generate embeddings for character n-grams using a continuous-bag-of-n-grams neural network...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید