Revisiting Fisher Kernels for Document Similarities

نویسندگان

  • Martin Nyffenegger
  • Jean-Cédric Chappelier
  • Éric Gaussier
چکیده

This paper presents a new metric to compute similarities between textual documents, based on the Fisher information kernel as proposed by T. Hofmann. By considering a new point-of-view on the embedding vector space and proposing a more appropriate way of handling the Fisher information matrix, we derive a new form of the kernel that yields significant improvements on an information retrieval task. We apply our approach to two different models: Naive Bayes and PLSI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fisher Kernels and Probabilistic Latent Semantic Models THÈSE NO 4647 ( 2010 ) ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Tasks that rely on semantic content of documents, notably Information Retrieval and Document Classification, can benefit from a good account of document context, i.e. the semantic association between documents. To this effect, the scheme of latent semantics blends individual words appearing throughout a document collection into latent topics, thus providing a way to handle documents that is les...

متن کامل

Using Fisher Kernels and Hidden Markov Models for the Identification of Famous Composers from their Sheet Music

We present a novel application of Fisher kernels to the problem of identifying famous composers from their sheet music. The characteristics of the composers writing style are obtained from note changes on a basic beat level, combined with the notes hidden harmony. We are able to extract this information by the application of a Hidden Markov Model to learn the underlying probabilistic structure ...

متن کامل

Information Diffusion Kernels

A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with non-parametric discr...

متن کامل

Diffusion Kernels on Statistical Manifolds

A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomia...

متن کامل

Deriving TF-IDF as a Fisher Kernel

The Dirichlet compound multinomial (DCM) distribution has recently been shown to be a good model for documents because it captures the phenomenon of word burstiness, unlike standard models such as the multinomial distribution. This paper investigates the DCM Fisher kernel, a function for comparing documents derived from the DCM. We show that the DCM Fisher kernel has components that are similar...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006