Parsing in the Minimalist Program: On SOV Languages and Relativization

نویسنده

  • Sandiway Fong
چکیده

This paper examines computational issues in the processing of SOV languages in the probe-goal framework, a theory in the Minimalist Program (MP). A generative theory that seeks to minimize search, such as the probe-goal model, provides a strong linguistic basis for the investigation of efficient parsing architecture. For parsing, two main design challenges present themselves: (1) how to limit search while incrementally recovering structure from input without the benefit of a pre-determined lexical array, and (2) how to come up with a system that not only correctly resolves parsing ambiguities in accordance with empirical data but does so with mechanisms that are architecturally justified. We take as our starting point an existing probe-goal parser with features that allows it to compute syntactic representation without recourse to derivation history search. We extend this parser to handle pre-nominal relative clauses of the sort found in SOV languages. In doing so we tie together and provide a unified computational account of facts on possessor (and non-possessor) relativization and processing preferences in Turkish, Japanese and Korean. 1 The author gratefully acknowledges discussion and data from Nobuko Hasegawa,Yuki Hirose, Cağlar Iskender, So Young Kang, Shigeru Miyagawa and Kyung Sook Shin. Parts of this paper have been presented at the Conference on Interfaces, July 30–August 1st 2004, Pescara, Italy, and the MIT IAP Computational Linguistics Fest, January 14 2005. Introduction Recent proposals in the framework of the Minimalist Program (MP), e.g. (Chomsky 1998,1999), have highlighted the role of efficient, locally deterministic computation for the assembly of phrase structure. From a generative standpoint, phrase structure assembly proceeds in bottom-up fashion using the primitive combinatory operations of MERGE and MOVE selecting from a domain of pre-determined lexical items known as a lexical array (LA). The set of LA items available and their lexical properties and features limit the combinatory options and hence possible phrase structure. Further limits on phrase structure result from the interaction of heads known as probes and goals. In Chomsky's formulation of the Case-agreement system, probes, e.g. functional heads such as T and v*, target and agree with goals, e.g. referential and expletive nominals, within their c-command domain and value their uninterpretable Case features. Within this system, Case-agreement can be long-distance and does not necessarily trigger movement, e.g. in the case of there-expletive constructions and Icelandic Quirky Case. In the case of parsing systems that aim to implement MP models of the kind outlined above, we can identify two major design challenges that should be met. First, the proposed parser architecture should support similar design goals to the original (generative) model in the sense that computation should be driven by lexical properties and features, and be locally deterministic where possible. However, the fact that the generative model is not a directly viable model of parsing (in a sense to be made clear below) means that the efficient recovery of structure is not guaranteed. More specifically, if we assume that a parser should process input from left to right and incrementally build phrase structure, the two operations, MERGE and MOVE, that lie at the heart of the (bottom-up) generative model cannot be employed directly. Moreover, for a parser there exists no pre-determined LA. It must attempt to efficiently reconstruct the participating lexical and functional elements (possibly covert) solely on the basis of overt input and its knowledge of grammar. Finally, parser architecture needs to provide support for efficient computation of probe-goal agreement relations. In an ideal model, a probe would identify its goal (or goals) without invoking search, i.e. without sifting through the derivation history represented by constructed phrase structure. The second design challenge concerns temporary ambiguities encountered in the course of the recovery of structure. Temporary ambiguities will manifest themselves as computational choice points. Numerous architectural options are available to the parser designer. However, in the ideal case, a parser should always resolve temporary ambiguities in favor of the (locally) least expensive computational option. In this paper, we focus on the computational issues involved in meeting the second design challenge. We will present a parsing model and provide cross-linguistic empirical support for its proper operation. We take as our starting point a left-to-right, incremental parser in the MP framework (Fong 2005). This implemented parser is designed to recover phrase structure in accordance with Chomsky's probe-goal model for the Case-agreement system as described in (Chomsky 1998). The parser includes architectural features that allow search to be minimized in the computation of probe-goal relations; thus facilitating efficient computation in the sense of the 2 Note that this does not necessarily imply that the parser need select the globally least expensive option. More to the point, we are not advocating a return to a parsing model based on some metric from a modern formulation of the derivational theory of complexity hypothesis (Miller and Chomsky, 1963). first design challenge identified above. We propagate this design efficiency into the realm of parsing preferences by appealing to computational cost reduction and simplicity. We extend the probe-goal architecture, paying special attention to SOV language data in the area of possessor (and non-possessor) relativization. More specifically, we show how a parser that seeks to minimize search when faced with temporary ambiguity can account for and tie together independent facts on relativization with respect to bare (i.e. non-Case-marked) noun phrases (BNPs) in Turkish and object scrambling in Japanese and Korean. We propose that the same (possibly universal) mechanism that resolves subject-object ambiguity in the case of Turkish BNPs is also at work in the case of (Case-marked) object scrambling in Japanese and Korean. The remainder of the paper is organized as follows. First, we will briefly review and highlight relevant design features of the probe-goal parser described in (Fong, 2005). Next, we will describe how the basic system can be adapted to accommodate the head-final nature of SOV languages such as Turkish and Japanese. We will then extend the model to include a bottom-up component necessary to accommodate pre-nominal relative clauses in these languages. Finally, we will describe a (non-language-particular) mechanism of relativization motivated by the desire to avoid search and document the empirical support for the proposed model. Probe-Goal Parser Design In this section, we will provide an overview of a parser that implements Chomsky’s probe-goal model. We will discuss the layout of the lexicon, lay out the basic computational procedure, and highlight architectural features introduced for minimizing search in probe-goal agreement. The Lexicon We begin with the lexicon, which lies at the heart of the generative model. Bottom-up computation via MERGE and MOVE is driven by lexical properties such as selection and the need to eliminate uninterpretable features within narrow syntax. We assume the parser operates with, and is propelled by, the same set of properties and features as the generative theory; i.e. the parser does not come with its own set of parsing-specific uninterpretable features. An illustrative sample of lexical items and their properties is given in Figure 1 below. The property of selection, denoted by select(X) where X is a category, forms the basis for a top-down selection-driven model. For example, sentence parsing begin with the complementizer c at the top. c selects for tense T, which in turn selects for v* and a specifier position (shown as spec(select(N))). v* selects for V plus a sentential subject in specifier position. Finally, V selects for an object nominal N in the simple transitive case. An example of a tree recovered by the parser for the basic transitive sentence John saw Mary is given in Figure 2. Note that the 3 For the full details and step-by-step worked examples, see (Fong 2005). 4 Variants of the categories are not shown here. For example, v comes in several flavors containing different subsets of the properties and features of transitive v*. Both unaccusative and unergative v do not have uninterpretable φ-features or the EPP option, and lack the ability to value accusative Case (shown as acc). T comes either with a full set of uninterpretable φ-features and the ability to value nominative Case (shown as nom), or in a defective version ! T " lacking several uninterpretable φ-features and Case-valuing ability. sequence of selection steps (outlined above) extends the derivation from left to right in a similar fashion to (Phillips 1995). Note also that a strict interpretation of bottom-up MERGE and MOVE would result in a right-to-left parse. Figure 1: A Sample Lexicon Uninterpretable Features Lexical Item (LI) Properties φ-features Other Interpretable Features v* (transitive) select(V) spec(select(N)) value(case(acc)) per(_) num(_) gen(_) (EPP) V (transitive) (unaccusative) select(N) select( ! T " ) T select(v) value(case(nom)) per(_) num(_) gen(_) epp c select(T) N (referential) case(_) per(P) num(N) gen(G) Feature matching is an important component of probe-goal agreement and the parsing process. It is also lexically driven. For example, the uninterpretable φ-features of the probe v* (representing person, number and gender) must be matched, valued and cancelled by the parallel interpretable φ-features from a (nominal) goal in v*’s c-command domain. (Uninterpretable features can be viewed as features with unvalued slots, depicted here using ‘_’.) At the same time, the uninterpretable Case feature belonging to the relevant nominal will be valued and cancelled provided the probe has the property of valuing Case. A (valid) parse tree is one that obeys the selectional properties of the lexical items involved, covers the entire input, and no uninterpretable feature remains uncancelled. Figure 2: Basic Phrase Structure Computation with Elementary Trees Since MERGE and MOVE cannot form the basis for a left-to-right parser model, (Fong 2005) adopts a system driven by elementary tree (ET) composition with respect to a range of heads in the extended verbal projection (v*, V, c and T). ETs are underspecified phrases with structural options determined by lexical properties. They contain open positions to be filled by input and movement during the course of parsing. Examples of ETs implied by the lexicon of Figure 1 are given in Figure 3. Figure 3: Basic Elementary Trees c T v* V (a) (b) (c) (d) Agree

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parsing Minimalist Languages with Interpreted Regular Tree Grammars

Minimalist Grammars (MGs) (Stabler, 1997) are a formalisation of Chomsky’s minimalist program (Chomsky, 1995), which currently dominates much of mainstream syntax. MGs are simple and intuitive to work with, and are mildly context sensitive (Michaelis, 1998), putting them in the right general class for human language (Joshi, 1985).1 Minimalist Grammars are known to be more succinct than their Mu...

متن کامل

Minimalist Parsing of Subjects Displaced from Embedded Clauses in Free Word Order Languages

In Sayeed and Szpakowicz (2004), we proposed a parser inspired by some aspects of the Minimalist Program. This incremental parser was designed specifically to handle discontinuous constituency phenomena for NPs in Latin. We take a look at the application of this parser to a specific kind of apparent island violation in Latin involving the extraction of constituents, including subjects, from ten...

متن کامل

Developing a Minimalist Parser for Free Word Order Languages with Discontinuous Constituency

We propose a parser based on ideas from the Minimalist Programme. The parser supports free word order languages and simulates a human listener who necessarily begins sentence analysis before all the words in the sentence have become available. We first sketch the problems that free word order languages pose. Next we discuss an existing framework for minimalist parsing, and show how it is diffic...

متن کامل

Maximizing Processing in an Sov Language

Head-driven parser models predict that SOV languages are harder to process than SVO languages, since the parser has to hold both S and O until it reaches V, instead of just S as in an SVO language. However, since no reading time differences have been attested between SOV and SVO languages, we hypothesize that either these models are wrong or SOV languages have strategies to compensate for the l...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005