Syntactic Wordclass Tagging

نویسندگان

  • Hans van Halteren
  • Adwait Ratnaparkhi
چکیده

Part-of-speech (POS) tagging is one of the most popular and thoroughly researched tasks in the field of natural language processing, particularly since it is a prerequisite for a wide variety of more complex tasks. The book Syntactic Wordclass Tagging is a multiauthor collection of articles giving advice on how to use and implement a POS tagger. Part I of the book is entitled "The User's View" and is geared towards novices and researchers who are interested in the POS annotation that taggers produce. Part II, entitled "The Implementer's View," is more technical and is written for researchers who want to understand the advantages of the various computational techniques used for POS tagging; it includes an introductory chapter by Hans van Halteren. After an introductory chapter by Atro Voutilainen, Part I begins with "A Short History of Tagging," also by Voutilainen, which describes some of the notable developments in both data-driven and linguistic approaches to tagging. Then, in "The Use of Tagging," Geoffrey Leech and Nicholas Smith argue that POS tagging is useful for almost every task in corpus linguistics, and has made its way into a number of practical applications as well, such as information retrieval, spelling correction, and machine-aided translation. Leech and Smith also point out that syntactic parsing is arguably the central task of natural language processing, since it is a prerequisite for any kind of semantic analysis of text, and claim that tagging, by being a prerequisite to parsing, is effectively an "entry to the most central area of corpus processing" (p. 27). Jan Cloeren then describes tagsets for wordclass annotation, and discusses the different levels of linguistic details--morphological, syntactic, semantic, discoursal-captured by various tagsets. He also discusses how certain phenomena, such as multiunit tokens, multitoken units, wordclass underspecification, and wordclass ambiguity, present challenges to the design of a tagset. He further describes a proposal by the Text Encoding Initiative (TEI) that specifies how to encode wordclass tags with SGML. "Standards for Tagsets" by Geoffrey Leech and Andrew Wilson discusses the guidelines for POS annotation standards developed by the Expert Advisory Group on Language Engineering Standards, or EAGLES. The chapter describes the obligatory, recommended, and optional attribute-value sets for describing wordclasses across languages and tasks. The intent is that the widely recognized attribute-value pairs (major parts of speech, gender, etc.) would be obligatory or recommended, whereas the task or language-specific attribute-value pairs would be optional. In "Performance of Taggers," Hans van Halteren discusses the relationship between correctness, ambiguity, precision, and recall, which are the most commonly

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Accuracy in Wordclass Tagging through Combination of Machine Learning Systems

We examine how differences in language models, learned by different data driven systems performing the same NLP task, can be exploited to yield a higher accuracy than the best individual system. We do this by means of experiments involving the task of morpho-syntactic wordclass tagging, on the basis of three different tagged corpora. Four well-known tagger generators (Hidden Markov Model, Memor...

متن کامل

ACL - COLING 1998 , Montreal , Canada , 491 - 497 , 1998 Improving Data Driven

In this paper we examine how the diierences in modelling between diierent data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger generators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum...

متن کامل

ACL - COLING 1998 , Montreal , Canada , 491 - 497 , 1998 Improving Data

In this paper we examine how the di erences in modelling between di erent data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi vidual system We do this by means of an ex periment involving the task of morpho syntactic wordclass tagging Four well known tagger gen erators Hidden Markov Model Memory Based Transformation Rules and Maximum E...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

A system for creating and manipulating generalized wordclass transition matrices from large labelled text-corpora

This paper deals with the training phase of a Markov-type linguistic model that is based on transition probabilities between pvirs and triplets of syntactic categories. To determine the o?timal level of detail for a set of syntactic classes we developed a systetn that uses a set-theoretical formalism to defiue such sets mid has some measm~s to comp~uce and c,ptimize them fildividually. In secti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002