The Tanl tagger for Named Entity Recognition on Transcribed Broadcast News

نویسندگان

  • Giuseppe Attardi
  • Giacomo Berardi
  • Stefano Dei Rossi
  • Maria Simi
چکیده

The Tanl tagger is a configurable tagger based on a Maximum Entropy classifier, which uses dynamic programming to select the best sequences of tags. We applied it to the NER tagging task, customizing the set of features to use, and including features deriving from dictionaries extracted from the training corpus. The final accuracy of the tagger is further improved by applying simple heuristic rules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Tanl Tagger for Named Entity Recognition on Transcribed Broadcast News at Evalita 2011

The Tanl tagger is a flexible sequence labeller based on Conditional Markov Model that can be configured to use different classifiers and to extract features according to feature templates expressed through patterns provided in a configuration file. The Tanl Tagger was applied to the task of Named Entity Recognition (NER) on Transcribed Broadcast News of Evalita 2011. The goal of the task was t...

متن کامل

Named entity extraction from Japanese broadcast news

This paper describes a method for named entity extraction from Japanese broadcast news. Our proposed named entity tagger gives entity categories for every character in order to deal with unknown words and entities correctly. This character-based tagger has models designed by maximum entropy modeling. We discuss the efficiency of the proposed tagger by comparison with a conventional word-based t...

متن کامل

EVALITA 2011: Description and Results of the Named Entity Recognition on Transcribed Broadcast News Task

This report describes features and outcomes of the Named Entity Recognition on Transcribed Broadcast News task at EVALITA 2011. This task represented a change with respect to previous editions of the NER task within the EVALITA evaluation campaign because it was based on automatic transcription of broadcast news. Four participants took part in the task and submitted a total of 9 runs. In this p...

متن کامل

Named Entity Recognition in Broadcast News Using Similar Written Texts

We propose a new approach to improving named entity recognition (NER) in broadcast news speech data. The approach proceeds in two key steps: (1) we detect block alignments between highly similar blocks of the speech data and corresponding written news data that are easily obtainable from the Web, (2) we employ term expansion techniques commonly used in information retrieval to recover named ent...

متن کامل

1998 Hub-4 Information Extraction Evaluation

This paper documents the Information Extraction Named-Entity Evaluation (IE-NE), one of the new spokes added to the DARPA-sponsored 1998 Hub-4 Broadcast News Evaluation. This paper discusses the information extraction task as posed for the 1998 Broadcast News Evaluation. This paper reviews the evaluation metrics, the scoring process, and the test corpus that was used for the evaluation. Finally...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011