Document Parsing: Towards Realistic Syntactic Analysis
نویسندگان
چکیده
In this work we take a view of syntactic analysis as processing ‘raw’, running text instead of idealised, pre-segmented inputs—a task we dub document parsing. We observe the state of the art in sentence boundary detection and tokenisation, and their effects on syntactic parsing (for English), observing that common evaluation metrics are ill-suited for the comparison of an ‘end-to-end’ syntactic analysis pipeline. To provide a more informative assessment of performance levels and error propagation throughout the full pipeline, we propose a unified evaluation framework and gauge document parsing accuracies for common processors and data sets.
منابع مشابه
Feature extraction in opinion mining through Persian reviews
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کاملبرچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملAnalysis of Discourse Structure with Syntactic Dependencies and Data-Driven Shift-Reduce Parsing
We present an efficient approach for discourse parsing within and across sentences, where the unit of processing is an entire document, and not a single sentence. We apply shift-reduce algorithms for dependency and constituent parsing to determine syntactic dependencies for the sentences in a document, and subsequently a Rhetorical Structure Theory (RST) tree for the entire document. Our result...
متن کاملبررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کامل