Aligning words using matrix factorisation
نویسندگان
چکیده
Aligning words from sentences which are mutual translations is an important problem in different settings, such as bilingual terminology extraction, Machine Translation, or projection of linguistic features. Here, we view word alignment as matrix factorisation. In order to produce proper alignments, we show that factors must satisfy a number of constraints such as orthogonality. We then propose an algorithm for orthogonal non-negative matrix factorisation, based on a probabilistic model of the alignment data, and apply it to word alignment. This is illustrated on a French-English alignment task from the Hansard.
منابع مشابه
Tracking Time Evolution of Collective Attention Clusters in Twitter: Time Evolving Nonnegative Matrix Factorisation
Micro-blogging services, such as Twitter, offer opportunities to analyse user behaviour. Discovering and distinguishing behavioural patterns in micro-blogging services is valuable. However, it is difficult and challenging to distinguish users, and to track the temporal development of collective attention within distinct user groups in Twitter. In this paper, we formulate this problem as trackin...
متن کاملAutomatically learning the units of speech by non-negative matrix factorisation
We present an unsupervised technique to discover the (wordsized) speech units in which a corpus of utterances can be decomposed. First, a fixed-length high-dimensional vector representation of the utterances is obtained. Then, the resulting matrix is decomposed in terms of additive units by applying the non-negative matrix factorisation algorithm. On a small vocabulary task, the obtained basis ...
متن کاملFast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation
Nonnegative matrix factorisation and tri-factorisation Nonnegative matrix factorisation (NMF) and tri-factorisation (NMTF) methods decompose a given matrix R into two or three smaller matrices so that R ≈ UV T or R ≈ FSG , respectively. Schmidt, Winther and Hansen (2009) introduced a Bayesian version of nonnegative matrix factorisation (left), which we extend to matrix tri-factorisation (right)...
متن کاملProbabilistic non-negative matrix factorisation and extensions
Matrix factorisation models have had an explosive growth in popularity in the last decade. It has become popular due to its usefulness in clustering and missing values prediction. We review the main literature for matrix factorisation, focusing on nonnegative matrix factorisation and probabilistic approaches. We also consider several extensions: matrix tri-factorisation, Tensor factorisation, T...
متن کاملBayesian Hybrid Matrix Factorisation for Data Integration
1 Models 2 1.1 Matrix factorisation models . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Matrix factorisation with ARD and importance values . . . . . . . . . . . . . 8 1.3 Hybrid matrix factorisation model . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.1 Model definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.2 Gibbs sampler . . . . . . . . . ....
متن کامل