Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models
نویسندگان
چکیده
We present a system for computing similarity between pairs of words. Our system is based on Pair Hidden Markov Models, a variation on Hidden Markov Models that has been used successfully for the alignment of biological sequences. The parameters of the model are automatically learned from training data that consists of word pairs known to be similar. Our tests focus on the identification of cognates — words of common origin in related languages. The results show that our system outperforms previously proposed techniques.
منابع مشابه
Evaluation Of Several Phonetic Similarity Algorithms On The Task Of Cognate Identification
We investigate the problem of measuring phonetic similarity, focusing on the identification of cognates, words of the same origin in different languages. We compare representatives of two principal approaches to computing phonetic similarity: manually-designed metrics, and learning algorithms. In particular, we consider a stochastic transducer, a Pair HMM, several DBN models, and two constructe...
متن کاملFast and unsupervised methods for multilingual cognate clustering
In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM ...
متن کاملWord Recognition and Learning based on Associative Memories and Hidden Markov Models
A word recognition architecture based on a network of neural associative memories and hidden Markov models has been developed. The input stream, composed of subword-units like wordinternal triphones consisting of diphones and triphones, is provided to the network of neural associative memories by hidden Markov models. The word recognition network derives words from this input stream. The archit...
متن کاملIntroducing Busy Customer Portfolio Using Hidden Markov Model
Due to the effective role of Markov models in customer relationship management (CRM), there is a lack of comprehensive literature review which contains all related literatures. In this paper the focus is on academic databases to find all the articles that had been published in 2011 and earlier. One hundred articles were identified and reviewed to find direct relevance for applying Markov models...
متن کاملLearning Text Pair Similarity with Context-sensitive Autoencoders
We present a pairwise context-sensitive Autoencoder for computing text pair similarity. Our model encodes input text into context-sensitive representations and uses them to compute similarity between text pairs. Our model outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task. For retrieval, our unsupervised approach that merely ranks input...
متن کامل