Technical Correspondence: Parsing Discontinuous Constituents In Dependency Grammar
نویسنده
چکیده
Discontinuous constituents-for example, a noun and its modifying adjective separated by words unrelated to them-arc common in variable-word-order languages; Figure I shows examples. But phrase structure grammars, including ID/LP grammars, require each constituent to be a contiguous series of words. Insofar as standard parsing algorithms are based on phrase structure rules, they are inadequate for parsing such languagesJ The algorithm presented here, however, does not require constituents to be continuous, but merely prefers them so. It can therefore parse languages in which conventional parsing techniques do not work. At the same time, because of its preference for nearby attachments, it prefers to make constituents continuous when more than one analysis is possible. The new algorithm has been used successfully to parse Russian and Latin (Covington 1988, 1990). This algorithm uses dependency grammar. That is, instead of breaking the sentence into phrases and subphrases, it establishes links between individual words. Each link connects a word (the "head") with one of its "dependents" (an argument or modifier). Figure 2 shows how this works. The arrows point from head to dependent; a head can have many dependents, but each dependent can have only one head. Of course the same word can be the head in one link and the dependent in another. 2 Dependency grammar is equivalent to an X-bar theory with only one phrasal bar level (Figure 3)-the dependents of a word are the heads of its sisters. Thus dependency grammar captures the increasingly recognized importance of headship in syntax. At the same time, the absence of phrasal nodes from the dependency representation streamlines the search process during parsing. The parser presupposes a grammar that specifies which words can depend on which. In the prototype, the grammar consists of unification-based dependency rules (called D-rules) such as: "category:noun ] r category:verb] person:X I number." Y I "~ |person:X | case:nominativeJ Lnumber:Y J This rule sanctions a dependency relation between any two words whose features unify with the structures shown-in this case, the verb and its subject in a language such as Russian or Latin. The arrow means "can depend on" and the word order is not specified. X and Y are variables. D-rules take the place of the phrase structure rules used by Shieber (1986) and others; semantic information can easily be added to them, and the whole power of unification-based grammar is available. The parser accepts words from the input string and …
منابع مشابه
Improving the Efficiency of Parsing Discontinuous Constituents
A prominent tradition within the framework of Head-Driven Phrase Structure Grammar (HPSG, Pollard and Sag 1994) has argued on linguistic grounds for analyses which license so-called discontinuous constituents (Reape 1993; Kathol 1995; Richter and Sailer 2001; Müller 1999a; Penn 1999; Donohue and Sag 1999; Bonami et al. 1999), joining researchers in other linguistic frameworks, including Depende...
متن کاملDiscontinuous Data-Oriented Parsing through Mild Context-Sensitivity
It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discont...
متن کاملParsing String Generating Hypergraph Grammars
A string generating hypergraph grammar is a hyperedge replacement grammar where the resulting language consists of string graphs i.e. hypergraphs modeling strings. With the help of these grammars, string languages like anbncn can be modeled that can not be generated by context-free grammars for strings. They are well suited to model discontinuous constituents in natural languages, i.e. constitu...
متن کاملParsing with Discontinuous Constituents
By generalizing the notion of location of a constituent to allow discontinuous Ioctaions, one can describe the discontinuous constituents of non-configurational languages. These discontinuous constituents can be described by a variant of definite clause grammars, and these grammars can be used in conjunction with a proof procedure to create a parser for non-configurational languages.
متن کاملDiscontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the...
متن کامل