Learning Tree Languages from Text

نویسنده

  • Henning Fernau
چکیده

We study the problem of learning regular tree languages from text. We show that the framework of function distinguishability as introduced by the author in Theoretical Computer Science, 290:1679–1711, 2003, can be generalized from the case of string languages towards tree languages. This provides a large source of identifiable classes of regular tree languages. Each of these classes can be characterized in various ways. Moreover, we present a generic inference algorithm with polynomial update time and prove its correctness. In this way, we generalize previous works of Angluin, Sakakibara and ourselves. Moreover, we show that this way all regular tree languages can be approximately identified.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on text categorization of Indian and non-Indian languages using supervised learning techniques

Categorization of text plays an important role in the text mining field. Text categorization is the process in which documents are categorized into its predefined category. Automatic text categorization is an important task due to large amount of electronic documents. This paper presents a survey of Text categorization of Indian and non-Indian languages. There is very less work done in text cat...

متن کامل

iSentenizer-μ: Multilingual Sentence Boundary Detection Model

Sentence boundary detection (SBD) system is normally quite sensitive to genres of data that the system is trained on. The genres of data are often referred to the shifts of text topics and new languages domains. Although new detection models can be retrained for different languages or new text genres, previous model has to be thrown away and the creation process has to be restarted from scratch...

متن کامل

Extracting a Tree Adjoining Grammar from the Penn Arabic Treebank

Much progress in natural language processing (NLP) over the last decade has come from the combination of using corpora of annotated naturally occurring text along with machine learning algorithms. Following this trend, corpora have been created for other languages, such as the Penn Arabic Treebank (PATB) (Maamouri et al.2003). However, the corpora almost invariably need to reinterpreted for the...

متن کامل

Multidimensional fuzzy finite tree automata

This paper introduces the notion of multidimensional fuzzy finite tree automata (MFFTA) and investigates its closure properties from the area of automata and language theory. MFFTA are a superclass of fuzzy tree automata whose behavior is generalized to adapt to multidimensional fuzzy sets. An MFFTA recognizes a multidimensional fuzzy tree language which is a regular tree language so that for e...

متن کامل

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • ITA

دوره 41  شماره 

صفحات  -

تاریخ انتشار 2002