An Unsupervised Bayesian Distance Measure

نویسندگان

  • Petri Kontkanen
  • Jussi Lahtinen
  • Petri Myllymäki
  • Henry Tirri
چکیده

We introduce a distance measure based on the idea that two vectors are considered similar if they lead to similar predictive probability distributions. The suggested approach avoids the scaling problem inherent to many alternative techniques as the method automatically transforms the original attribute space to a probability space where all the numbers lie between 0 and 1. The method is also exible in the sense that it allows diierent attribute types (discrete or continuous) in the same consistent framework. To study the validity of the suggested measure , we ran a series of experiments with publicly available data sets. The empirical results demonstrate that the unsupervised distance measure is sensible in the sense that it can be used for discovering the hidden clustering structure of the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker segmentation and clustering for simultaneously presented speech

This paper proposes a new scheme used to segment and cluster speech segments on an unsupervised basis in cases where multiple speakers are presented simultaneously at different SNRs. The new elements in our work are in the development of new feature for segmenting and clustering simultaneously-presented speech, the procedure for identifying a candidate set of possible speaker-change points, and...

متن کامل

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model

We present an unsupervised, nonparametric Bayesian approach to coreference resolution which models both global entity identity across a corpus as well as the sequential anaphoric structure within each document. While most existing coreference work is driven by pairwise decisions, our model is fully generative, producing each mention from a combination of global entity properties and local atten...

متن کامل

A Similarity Measure for Unsupervised Semantic Disambiguation

This paper presents an unsupervised method for the resolution of lexical ambiguity of nouns. The method relies on the topological structure of the noun taxonomy of WordNet where a notion of semantic distance is defined. An unsupervised semantic tagger, based on the above measure, is evaluated over an hand-annotated portion of the British National Corpus and compared with a supervised approach b...

متن کامل

Unsupervised Empirical Bayesian Multiple Testing with External Covariates

In an empirical Bayesian setting, we provide a new multiple testing method, useful when an additional covariate is available, that influences the probability of each null hypothesis being true. We measure the posterior significance of each test conditionally on the covariate and the data, leading to greater power. Using covariate-based prior information in an unsupervised fashion, we produce a ...

متن کامل

An Equivalent Pseudoword Solution to Chinese Word Sense Disambiguation

This paper presents a new approach based on Equivalent Pseudowords (EPs) to tackle Word Sense Disambiguation (WSD) in Chinese language. EPs are particular artificial ambiguous words, which can be used to realize unsupervised WSD. A Bayesian classifier is implemented to test the efficacy of the EP solution on Senseval-3 Chinese test set. The performance is better than state-of-the-art results wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000