Non-Contiguous Tree Parsing

نویسندگان

  • Mark Dras
  • Chung-hye Han
چکیده

Pairing structural descriptions in MT, syntax-semantics interfaces and so on becomes more difficult the more structurally different are the languages involved; there is, implicitly or explicitly, a process of ‘tree parsing’, where a structural description is split into component smaller trees for transfer rules to be applied. Recent work has looked at the construction of transfer rules, using both symbolic and statistical approaches, that require the pairing of groups of several contiguous nodes in structural descriptions. We look at the case where pairings of groups of non-contiguous nodes are necessary, and present an efficient dynamic programming algorithm based on TAG and drawing on compiler theory for a decomposition into appropriate groupings. We then examine the formal properties of this algorithm, and show that it is linear in the number of nodes in the tree and has the same complexity as existing algorithms requiring only groupings of contiguous nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation

The tree sequence based translation model allows the violation of syntactic boundaries in a rule to capture non-syntactic phrases, where a tree sequence is a contiguous sequence of subtrees. This paper goes further to present a translation model based on non-contiguous tree sequence alignment, where a non-contiguous tree sequence is a sequence of sub-trees and gaps. Compared with the contiguous...

متن کامل

Introducing Non-Syntactic Phrases into a Syntax-Based Machine Translation System

The dominance of traditional phrase-based statistical machine translation (SMT) models (Koehn, Och, and Marcu, 2003) has recently been challenged by the development and improvement of a number of newer translation models that explicity take into account the syntax of the sentences being translated. One simple approach to incorporating syntax is to limit the phrases learned by a standard SMT tra...

متن کامل

Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

The availability of contiguous and non-contiguous multiword lexical units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing precision, helps attachment decisions, improves indexing in information retrieval (IR) systems, reinforces information extraction (IE) and text mining, among other applications. Unfortunately, their acquisition has long been a significant problem in NLP, ...

متن کامل

Non-Contiguous Pattern Avoidance in Binary Trees

In this paper we consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by computing closed formulas for the number of trees avoiding a single binary tree pattern with 4 or fewer leaves and compare these results to analogous work for contiguous tree patterns. Next, we give an explicit generating function that counts binary trees avoiding a single non-con...

متن کامل

Non-Projective Dependency Parsing using Spanning Tree Algorithms

We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n3) time. More surprisingly, the representation is extended naturally to non-projective parsing using Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds, 1967) MST algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004