ABL: Alignment-Based Learning

نویسنده

  • Menno van Zaanen
چکیده

This ])al)er introdu(:es a new tyl)e of grammar learning algorithm, iilst)ired l)y sl;ring edit distance (Wagner and Fis(:her, 1974). The algor i thm takes a (:ortms of tlat S(~lltell(:es as input and returns a (:ortms of lat)elled, l)ra(:ket(~(1 sen-~ ten(:(~s. The method works on 1)airs of unstru(:tllr(?(l SelltellC(~,s that have one or more words in ( :onunon. W]lc, ll two senten( 'es are (tivided in to parts that are the same in 1)oth s(mten(:es and parl;s tha|; are (litl'erent, this intbrmation is used to lind l)arl;s that are interchangeal)le. These t)arts are taken as t)ossil)le (:onstituents of the same tyl)e. After this alignment learning stel) , the selection learning stc l) sel(~('ts the most l)rot)at)le constituents from all 1)ossit)le (:onstituents. This m(;thod was used to t)ootstrat) s tructure (m the ATIS (:ortms (Mar(:us et al., 1f)93) and on the OVIS ~ (:ort)us (Bommma et ~d., 1997). While the results are en(:om:aging (we ol)|;ained Ul) to 89.25 % non-crossing l)ra(:kets precision), this 1)at)er will 1)oint out some of the shortcomings of our at)l)roa(:h and will suggest 1)ossible solul;ions. 1 I n t r o d u c t i o n Unsupervised learning of syntactic s tructure is one of the hardest 1)rol)lems in NLP. Although people are adept at learning grammatical structure, it is ditficult to model this 1)recess and therefore it is hard to make a eomtmter learn

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alignment-Based Learning versus Data-Oriented Parsing

This chapter will briefly describe the Alignment-Based Learning (ABL) framework and relate it to Data-Oriented Parsing from different viewpoints. Firstly, ABL can be used to bootstrap an initial treebank, which can then be used by DOP. Secondly, ABL can be used to enhance the robustness of DOP. Thirdly, a DOP model can be used to disambiguate ambiguous syntactic structures found during the lear...

متن کامل

String Alignment in Grammatical Inference

This thesis is concerned with unsupervised learning of syntactic structure from plain text corpora by aligning sentences. Based on Harris’ (1951) linguistic notion of substitutability, sentences in a plain text corpus can be compared to each other and those parts that have similar context and in addition can be substituted for each other without resulting in ungrammatical sentences are consider...

متن کامل

String Alignment in Grammatical Inference: what Suffix Trees can do

This thesis is concerned with unsupervised learning of syntactic structure from plain text corpora by aligning sentences. Based on Harris’ (1951) linguistic notion of substitutability, sentences in a plain text corpus can be compared to each other and those parts that have similar context and in addition can be substituted for each other without resulting in ungrammatical sentences are consider...

متن کامل

Grammatical Inference Using Suffix Trees

The goal of the Alignment-Based Learning (ABL) grammatical inference framework is to structure plain (natural language) sentences as if they are parsed according to a context-free grammar. The framework produces good results even when simple techniques are used. However, the techniques used so far have computational drawbacks, resulting in limitations with respect to the amount of language data...

متن کامل

Bootstrapping structure into language : alignment-based learning

. . . refined and abstract meanings largely grow out of more concrete meanings. — Bloomfield (1933) This thesis introduces a new unsupervised learning framework, called AlignmentBased Learning, which is based on the alignment of sentences and Harris’s (1951) notion of substitutability. Instances of the framework can be applied to an untagged, unstructured corpus of natural language sentences, r...

متن کامل

ABL: Alignment-Based Learning

This paper introduces a new type of grammar learning algorithm, inspired by string edit distance (Wagner and Fischer, 1974). The algorithm takes a corpus of flat sentences as input and returns a corpus of labelled, bracketed sentences. The method works on pairs of unstructured sentences that have one or more words in common. When two sentences are divided into parts that are the same in both se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000