initially named barbat

Structured Named Entities in two distinct press corpora: Contemporary Broadcast News and Old Newspapers

2012

Sophie Rosset Cyril Grouin Karën Fort Olivier Galibert Juliette Kahn Pierre Zweigenbaum

This paper compares the reference annotation of structured named entities in two corpora with different origins and properties. It addresses two questions linked to such a comparison. On the one hand, what specific issues were raised by reusing the same annotation scheme on a corpus that differs from the first in terms of media and that predates it by more than a century? On the other hand, wha...

متن کامل

Improving Named Entity Recognition in Tweets via Detecting Non-Standard Words

2015

Chen Li Yang Liu

Most previous work of text normalization on informal text made a strong assumption that the system has already known which tokens are non-standard words (NSW) and thus need normalization. However, this is not realistic. In this paper, we propose a method for NSW detection. In addition to the information based on the dictionary, e.g., whether a word is out-ofvocabulary (OOV), we leverage novel i...

متن کامل

Pattern-based Aggregation of Named Entity Extractors

2011

T. Lemmond

Despite significant advances in named entity extraction technologies, state-of-the-art extraction tools achieve insufficient accuracy rates for practical use in many operational settings. However, they are not all prone to the same types of error, suggesting that substantial improvements may be achieved via appropriate combinations of existing tools, provided their behavior can be accurately ch...

متن کامل

Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data

2003

Lluís Màrquez i Villodre Adrià de Gispert Xavier Carreras Lluís Padró

This work studies Named Entity Classification (NEC) for Catalan without making use of large annotated resources of this language. Two views are explored and compared, namely exploiting solely the Catalan resources, and a direct training of bilingual classification models (Spanish and Catalan), given that a large collection of annotated examples is available for Spanish. The empirical results ob...

متن کامل

Chemical Named Entity Recognition: Improving Recall Using a Comprehensive List of Lexical Features

2014

Andre Lamurias João D. Ferreira Francisco M. Couto

As the number of published scienti c papers grows everyday, there is also an increasing necessity for automated named entity recognition (NER) systems capable of identifying relevant entities mentioned in a given text, such as chemical entities. Since high precision values are crucial to deliver useful results, we developed a NER method, Identifying Chemical Entities (ICE), which was tuned for ...

متن کامل

Global Health Monitor - A Web-based System for Detecting and Mapping Infectious Diseases

2008

Son Doan Hung Quoc Ngo Ai Kawazoe Nigel Collier

We present the Global Health Monitor, an online Web-based system for detecting and mapping infectious disease outbreaks that appear in news stories. The system analyzes English news stories from news feed providers, classifies them for topical relevance and plots them onto a Google map using geo-coding information, helping public health workers to monitor the spread of diseases in a geo-tempora...

متن کامل

Relation detection between named entities: report of a shared task

2009

Cláudia Freitas Diana Santos Cristina Mota Hugo Gonçalo Oliveira Paula Carvalho

In this paper we describe the first evaluation contest (track) for Portuguese whose goal was to detect and classify relations between named entities in running text, called ReRelEM. Given a collection annotated with named entities belonging to ten different semantic categories, we marked all relationships between them within each document. We used the following fourfold relationship classificat...

متن کامل

A Golden Resource for Named Entity Recognition in Portuguese

2006

Diana Santos Nuno Cardoso

This paper presents a collection of texts manually annotated with named entities in context, which was used for HAREM, the first evaluation contest for named entity recognizers for Portuguese. We discuss the options taken and the originality of our approach compared with previous evaluation initiatives in the area. We document the choice of categories, their quantitative weight in the overall c...

متن کامل

Extracting Named Entities and Relating Them over Time Based on Wikipedia

Journal: :Informatica (Slovenia) 2007

Abhijit Bhole Blaz Fortuna Marko Grobelnik Dunja Mladenic

This paper presents an approach to mining information relating people, places, organizations and events extracted from Wikipedia and linking them on a time scale. The approach consists of two phases: (1) identifying relevant pages categorizing the articles as containing people, places or organizations; (2) generating timeline linking named entities and extracting events and their time frame. We...

متن کامل

Second HAREM: Advancing the State of the Art of Named Entity Recognition in Portuguese

2010

Cláudia Freitas Cristina Mota Diana Santos Hugo Gonçalo Oliveira Paula Carvalho

In this paper, we present Second HAREM, the second edition of an evaluation campaign for Portuguese, addressing named entity recognition (NER). This second edition also included two new tracks: the recognition and normalization of temporal entities (proposed by a group of participants, and hence not covered on this paper) and ReRelEM, the detection of semantic relations between named entities. ...

متن کامل