n model

A hierarchical language model incorporating class-dependent word models for OOV words recognition

2000

Koichi Tanigaki Hirofumi Yamamoto Yoshinori Sagisaka

A new language model is proposed to cope with the demands for recognizing out-of-vocabulary (OOV) words not registered in the lexicon. This language model is a class N-gram incorporating a set of word models that reflect the statistical characteristics of the phonotactics, which depend on the lexical classes. Utilization of class-dependency enhances recognition accuracy and enables identificati...

متن کامل

Fast n-gram language model look-ahead for decoders with static pronunciation prefix trees

2008

Marijn Huijbregts Roeland Ordelman Franciska de Jong

Decoders that make use of token-passing restrict their search space by various types of token pruning. With use of the Language Model Look-Ahead (LMLA) technique it is possible to increase the number of tokens that can be pruned without loss of decoding precision. Unfortunately, for token passing decoders that use single static pronunciation prefix trees, full n-gram LMLA increases the needed n...

متن کامل

Expanding the Language model in a low-resource hybrid MT system

2014

George Tambouratzis Sokratis Sofianopoulos Marina Vassiliou

The present article investigates the fusion of different language models to improve translation accuracy. A hybrid MT system, recentlydeveloped in the European Commissionfunded PRESEMT project that combines example-based MT and Statistical MT principles is used as a starting point. In this article, the syntactically-defined phrasal language models (NPs, VPs etc.) used by this MT system are supp...

متن کامل

Learning to Embed Songs and Tags for Playlist Prediction

2012

Joshua L. Moore Shuo Chen Thorsten Joachims Douglas Turnbull

Automatically generated playlists have become an important medium for accessing and exploring large collections of music. In this paper, we present a probabilistic model for generating coherent playlists by embedding songs and social tags in a unified metric space. We show how the embedding can be learned from example playlists, providing the metric space with a probabilistic meaning for song/s...

متن کامل

Contextual Dependencies in Unsupervised Word Segmentation

2006

Sharon Goldwater Thomas L. Griffiths Mark Johnson

Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment speech. We propose two new Bayesian word segmentation methods that assume unigram and bigram models of word dependencies respectively. The bigram model greatly outperforms the unigram model (and previous probabilistic...

متن کامل

A Semi Supervised Dialog Act Tagging for Telugu

2015

Suman Dowlagar Radhika Mamidi

In a task oriented domain, recognizing the intention of a speaker is important so that the conversation can proceed in the correct direction. This is possible only if there is a way of labeling the utterance with its proper intent. One such labeling techniques is Dialog Act (DA) tagging. This work focuses on discussing various n-gram DA tagging techniques. In this paper, a new method is propose...

متن کامل

Weakly supervised parsing with rules

2013

Christophe Cerisara Alejandra Lorenzo Pavel Král

This work proposes a new research direction to address the lack of structures in traditional n-gram models. It is based on a weakly supervised dependency parser that can model speech syntax without relying on any annotated training corpus. Labeled data is replaced by a few hand-crafted rules that encode basic syntactic knowledge. Bayesian inference then samples the rules, disambiguating and com...

متن کامل

Exploring Asymmetric Clustering for Statistical Language Modeling

2002

Jianfeng Gao Joshua Goodman Guihong Cao Hang Li

The n-gram model is a stochastic model, which predicts the next word (predicted word) given the previous words (conditional words) in a word sequence. The cluster n-gram model is a variant of the n-gram model in which similar words are classified in the same cluster. It has been demonstrated that using different clusters for predicted and conditional words leads to cluster models that are super...

متن کامل

A statistical text-to-phone function using ngrams and rules

1999

William M. Fisher

Adopting concepts from statistical language modeling and rulebased transformations can lead to effective and efficient text-tophone (TTP) functions. We present here the methods and results of one such effort, resulting in a relatively compact and fast set of TTP rules that achieves 94.5% segmental phonemic accuracy.

متن کامل

Class-based language model adaptation using mixtures of word-class weights

2000

Gareth Moore Steve J. Young

This paper describes the use of a weighted mixture of classbased n-gram language models to perform topic adaptation. By using a fixed class n-gram history and variable word-given-class probabilities we obtain large improvements in the performance of the class-based language model, giving it similar accuracy to a word n-gram model, and an associated small but statistically significant improvemen...

متن کامل