minimum description length

نتایج جستجو برای: minimum description length

تعداد نتایج: 718390 فیلتر نتایج به سال:

Workshop Notes of the ECML / MLnet Workshop on Empirical Learning of Natural Language Processing Tasks

1997

Miles Osborne

When applied to probabilistic categorial grammar learning, the Minimum Description Length principle outperforms Maximum Likelihood Estimation. Smoothing does not bridge the gap between the two approaches.

متن کامل

Bloat Control and Generalization Pressure Using the Minimum Description Length Principle for a Pittsburgh Approach Learning Classifier System

2005

Jaume Bacardit Josep Maria Garrell i Guiu

Bloat control and generalization pressure are very important issues in the design of Pittsburgh Approach Learning Classifier Systems (LCS), in order to achieve simple and accurate solutions in a reasonable time. In this paper we propose a method to achieve these objectives based on the Minimum Description Length (MDL) principle. This principle is a metric which combines in a smart way the accur...

متن کامل

UNGRADE: UNsupervised GRAph DEcomposition

2009

Bruno Golénia Sebastian Spiegler Peter A. Flach

This article presents an unsupervised algorithm for word decomposition called UNGRADE (UNsupervised GRAph DEcomposition) to segment any word list of any language. UNGRADE assumes that each word follows the structure prefixes, a stem and suffixes without giving a limit on the number of prefixes and suffixes. The UNGRADE’s algorithm works in three steps and is language independent. Firstly, a pse...

متن کامل

Agglomerative Grouping of Observations by Bounding Entropy Variation

2005

Christian Beder

An information theoretic framework for grouping observations is proposed. The entropy change incurred by new observations is analyzed using the Kalman filter update equations. It is found, that the entropy variation is caused by a positive similarity term and a negative proximity term. Bounding the similarity term in the spirit of the minimum description length principle and the proximity term ...

متن کامل

An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets

2008

Jonathan L. Lustgarten Shyam Visweswaran Himanshu Grover Vanathi Gopalakrishnan

Rule learning has the major advantage of understandability by human experts when performing knowledge discovery within the biomedical domain. Many rule learning algorithms require discrete data in order to learn the IF-THEN rule sets. This requirement makes the selection of a discretization technique an important step in rule learning. We compare the performance of one standard technique, Fayya...

متن کامل

Text Segmentation by Language Using Minimum Description Length

2012

Hiroshi Yamaguchi Kumiko Tanaka-Ishii

The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solutio...

متن کامل

Technology Extraction for Future Generations from Process Time Series Data Reflecting Expert Operator Skills

2006

Setsuya Kurahashi Takao Terano

This paper proposes a novel method to develop a process response model from continuous time-series data. The main contribution of the research is to establish a method to mine a set of meaningful control rules from Learning Classifier System using the Minimum Description Length criteria. The proposed method has been applied to an actual process of a biochemical plant and has shown the validity ...

متن کامل

An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion

2010

Sravana Reddy John A. Goldsmith

We address a key problem in grapheme-tophoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose learning chunks, or subwords, to reduce ambiguity. This can be interpreted as learning a lexicon of subwords that has minimum description length. We implement an algorithm to build such a lexicon, as well as a simple d...

متن کامل

Language acquisition in the MDL framework

1992

Jorma Rissanen Eric Sven Ristad

The Minimum Description Length (MDL) principle provides guidance to the fundamental question of determining what a given set of observed data tells us about the underlying data generating machinery. Hence, in the broadest sense the MDL principle relates to the central question of all science, although its most useful applications have been to the more practical problem of tting statistical mode...

متن کامل

A New Measure for the Accuracy of a Bayesian Network

2002

Alexandros Pappas Duncan Fyfe Gillies

A Bayesian Network is a construct that is used to model a given joint probability distribution. In order to assess the quality of an inference, or to choose between competing networks modelling the same data, we need methods to estimate the accuracy of a Bayesian network. Although the accuracy of a Bayesian network can be easily defined in theory, it is rarely possible to compute it in practice...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید