Robust Models Of Human Parsing

نویسنده

Frank Keller

چکیده

A striking property of the human parser is its efficiency and robustness. For the vast majority of sentences, the parser will effortlessly and rapidly deliver the correct analysis. In doing so, it is robust to noise, i.e., it can provide an analysis even if the input is distorted, e.g., by ungrammaticalities. Furthermore, the human parser achieves broad coverage: it deals with a wide variety of syntactic constructions, and is not restricted by the domain, genre, or modality of the input. Current research on human parsing rarely investigates the issues of efficiency, robustness, and broad coverage, as pointed out by Crocker and Brants (2000). Instead, most researchers have focussed on the difficulties that the human parser has with certain types of sentences. Based on the study of garden path sentences (which involve a local ambiguity that makes the sentence hard to process), theories have been developed that successfully explain how the human parser deals with ambiguities in the input. However, garden path sentences are arguably a pathological case for the parser; garden paths are not representative of naturally occurring text. This means that the corresponding processing theories face a scaling problem: it is not clear how they can explain the normal behavior of the human parser, where sentence processing is highly efficient and very robust (see Crocker and Brants 2000 for details on this scalability argument). This criticism applies to most existing theories of human parsing, including the classical garden path model advanced by Frazier and Rayner (1982) and Frazier (1989), and more recent lexicalist parsing frameworks, of which MacDonald et al. (1994) and MacDonald (1994) are representative examples. Both the garden path model and the lexicalist model are designed to deal with idealized input, i.e., with input that is (locally) ambiguous, but fully wellformed. A real life parser, however, has to cope with a large amount of noise, which often renders the input ungrammatical or fragmentary, due to errors such as typographical mistakes in the case of text, or slips of the tongue, disfluencies, or repairs in the case of speech. A quick search in the Penn Treebank (Marcus et al., 1993) shows that about 17% of all sentences contain parentheticals or other sentence fragments, interjections, or unbracketable constituents. Note that this figure holds for carefully edited newspaper text; the figure is likely to be much higher for speech. The human parser is robust to such noise, i.e., it is able to assign an (approximate) analysis to a sentence even if it is ungrammatical or fragmentary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Sample Selection for Statistical Parsing

Corpus-based statistical parsing relies on using large quantities of annotated text as training examples. Building this kind of resource is expensive and labor-intensive. This work proposes to use sample selection to find helpful training examples and reduce human effort spent on annotating less informative ones. We consider several criteria for predicting whether unlabeled data might be a help...

متن کامل

A hybrid metaheuristic algorithm for the robust pollution-routing problem

Emissions resulted from transportation activities may lead to dangerous effects on the whole environment and human health. According to sustainability principles, in recent years researchers attempt to consider the environmental burden of logistics activities in traditional logistics problems such as vehicle routing problems (VRPs). The pollution-routing problem (PRP) is an extension of the VRP...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Robust portfolio selection with polyhedral ambiguous inputs

Ambiguity in the inputs of the models is typical especially in portfolio selection problem where the true distribution of random variables is usually unknown. Here we use robust optimization approach to address the ambiguity in conditional-value-at-risk minimization model. We obtain explicit models of the robust conditional-value-at-risk minimization for polyhedral and correlated polyhedral am...

متن کامل