String Kernels, Fisher Kernels and Finite State Automata

نویسندگان

  • Craig Saunders
  • John Shawe-Taylor
  • Alexei Vinokourov
چکیده

In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel from which the n-gram and string kernels can be re-constructed. The Fisher kernel view gives a more flexible insight into the string kernel and suggests how it can be parametrised in a way that reflects the statistics of the training corpus. Furthermore, the probabilistic modelling approach suggests extending the Markov process to consider sub-sequences of varying length, rather than the standard fixed-length approach used in the string kernel. We give a procedure for determining which sub-sequences are informative features and hence generate a Finite State Machine model, which can again be used to obtain a Fisher kernel. By adjusting the parametrisation we can also influence the weighting received by the features. In this way we are able to obtain a logarithmic weighting in a Fisher kernel. Finally, experiments are reported comparing the different kernels using the standard Bag of Words kernel as a baseline.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning state machine-based string edit kernels

During the past few years, several works have been done to derive string kernels from probability distributions. For instance, the Fisher kernel uses a generative model M (e.g. a hidden markov model) and compares two strings according to how they are generated by M . On the other hand, the marginalized kernels allow the computation of the joint similarity between two instances by summing condit...

متن کامل

String Kernels , Fisher Kernels

In this paper we show how the generation of documents can be thought of as a k-stage Markov process, which leads to a Fisher kernel from which the n-gram and string kernels can be reconstructed. The Fisher kernel view gives a more exible insight into the string kernel and suggests how it can be parametrised in a way that re-ects the statistics of the training corpus. Furthermore, the prob-abili...

متن کامل

Positive Definite Rational Kernels

Kernel methods are widely used in statistical learning techniques. We recently introduced a general kernel framework based on weighted transducers or rational relations, rational kernels, to extend kernel methods to the analysis of variable-length sequences or more generally weighted automata. These kernels are efficient to compute and have been successfully used in applications such as spoken-...

متن کامل

Weighted Automata Kernels – General F

Kernel methods have found in recent years wide use in statistical learning techniques due to their good performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels handle. We recently introduced a general kernel framework based on weighted transducers, rationa...

متن کامل

Weighted automata kernels - general framework and algorithms

Kernel methods have found in recent years wide use in statistical learning techniques due to their good performance and their computational efficiency in high-dimensional feature space. However, text or speech data cannot always be represented by the fixed-length vectors that the traditional kernels handle. We recently introduced a general kernel framework based on weighted transducers, rationa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002