n word

Counting Lyndon Factors

Journal: :Electr. J. Comb. 2017

Amy Glen Jamie Simpson William F. Smyth

In this paper, we determine the maximum number of distinct Lyndon factors that a word of length n can contain. We also derive formulas for the expected total number of Lyndon factors in a word of length n on an alphabet of size σ, as well as the expected number of distinct Lyndon factors in such a word. The minimum number of distinct Lyndon factors in a word of length n is 1 and the minimum tot...

متن کامل

Corrective Models for Speech Recognition of Inflected Languages

2006

Izhak Shafran Keith B. Hall

This paper presents a corrective model for speech recognition of inflected languages. The model, based on a discriminative framework, incorporates word ngrams features as well as factored morphological features, providing error reduction over the model based solely on word n-gram features. Experiments on a large vocabulary task, namely the Czech portion of the MALACH corpus, demonstrate perform...

متن کامل

Improving Twitter Sentiment Classification via Multi-Level Sentiment-Enriched Word Embeddings

Journal: :CoRR 2016

Shufeng Xiong

Most of existing work learn sentiment-specific word representation for improving Twitter sentiment classification, which encoded both n-gram and distant supervised tweet sentiment information in learning process. They assume all words within a tweet have the same sentiment polarity as the whole tweet, which ignores the word its own sentiment polarity. To address this problem, we propose to lear...

متن کامل

Gaussian Mixture Embeddings for Multiple Word Prototypes

Journal: :CoRR 2015

Xinchi Chen Xipeng Qiu Jingxiang Jiang Xuanjing Huang

Recently, word representation has been increasingly focused on for its excellent properties in representing the word semantics. Previous works mainly suffer from the problem of polysemy phenomenon. To address this problem, most of previous models represent words as multiple distributed vectors. However, it cannot reflect the rich relations between words by representing words as points in the em...

متن کامل

Detecting Opinion Polarities using Kernel Methods

2016

Rasoul Samad Zadeh Kaljahi Jennifer Foster

We investigate the application of kernel methods to representing both structural and lexical knowledge for predicting polarity of opinions in consumer product review. We introduce anygram kernels which model lexical information in a significantly faster way than the traditional n-gram features, while capturing all possible orders of n-grams (n) in a sequence without the need to explicitly prese...

متن کامل

Tagset Reduction without Information Loss

1995

Thorsten Brants

A technique for reducing a tagset used for n-gram part-of-speech disambiguation is introduced and evaluated in an experiment. The technique ensures that all information that is provided by the original tagset can be restored from the reduced one. This is crucial, since we are interested in the linguistically motivated tags for part-of-speech disambiguation. The reduced tagset needs fewer parame...

متن کامل

Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back-Transliteration

2007

Sarvnaz Karimi Falk Scholer Andrew Turpin

We propose a novel algorithm for English to Persian transliteration. Previous methods proposed for this language pair apply a word alignment tool for training. By contrast, we introduce an alignment algorithm particularly designed for transliteration. Our new model improves the English to Persian transliteration accuracy by 14% over an n-gram baseline. We also propose a novel back-transliterati...

متن کامل

A Web Interface for Diachronic Semantic Search in Spanish

2017

Pablo Gamallo Iván Rodríguez-Torres Marcos García

This article describes a semantic system which is based on distributional models obtained from a chronologically structured language resource, namely Google Books Syntactic Ngrams. The models were created using dependency-based contexts and a strategy for reducing the vector space, which consists in selecting the more informative and relevant word contexts. The system allows linguists to analiz...

متن کامل

Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study

Journal: :CoRR 2017

Rasoul Samad Zadeh Kaljahi Jennifer Foster

Any-gram kernels are a flexible and efficient way to employ bag-of-n-gram features when learning from textual data. They are also compatible with the use of word embeddings so that word similarities can be accounted for. While the original any-gram kernels are implemented on top of tree kernels, we propose a new approach which is independent of tree kernels and is more efficient. We also propos...

متن کامل

Bayesian methods for language verification

1997

Eluned S. Parris Harvey Lloyd-Thomas Michael J. Carey Jeremy H. Wright

This paper describes a number of techniques for language verification based on acoustic processing and n-gram language modelling. A new technique is described which uses anti-models to model the general class of languages. These models are then used to normalise the acoustic score giving a 34% reduction in the error rate of the system. An approach to automatically generate discriminative subwor...

متن کامل