three heuristics named cluster

Clique-Based Clustering for Improving Named Entity Recognition Systems

2009

Julien Ah-Pine Guillaume Jacquet

We propose a system which builds, in a semi-supervised manner, a resource that aims at helping a NER system to annotate corpus-specific named entities. This system is based on a distributional approach which uses syntactic dependencies for measuring similarities between named entities. The specificity of the presented method however, is to combine a clique-based approach and a clustering techni...

متن کامل

Seeded Discovery of Base Relations in Large Corpora

2008

Nicholas Andrews Naren Ramakrishnan

Relationship discovery is the task of identifying salient relationships between named entities in text. We propose novel approaches for two sub-tasks of the problem: identifying the entities of interest, and partitioning and describing the relations based on their semantics. In particular, we show that term frequency patterns can be used effectively instead of supervised NER, and that the pmedi...

متن کامل

UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets

2016

Flavio Massimiliano Cecchini Elisabetta Fersini Pikakshi Manchanda Enza Messina Debora Nozza Matteo Palmonari Cezar Sas

English. This paper describes the framework proposed by the UNIMIB Team for the task of Named Entity Recognition and Linking of Italian tweets (NEEL-IT). The proposed pipeline, which represents an entry level system, is composed of three main steps: (1) Named Entity Recognition using Conditional Random Fields, (2) Named Entity Linking by considering both Supervised and Neural-Network Language m...

متن کامل

SIR-NERD: A Chinese Named Entity Recognition and Disambiguation System using a Two-Stage Method

2012

Zehuan Peng Le Sun Xianpei Han

This paper presents our SIR-NERD system for the Chinese named entity recognition and disambiguation Task in the CIPS-SIGHAN joint conference on Chinese language processing (CLP2012). Our system uses a two-stage method and some key techniques to deal with the named entity recognition and disambiguation (NERD) task. Experimental results on the test data shows that the proposed system, which incor...

متن کامل

Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities

2006

Soto Montalvo Raquel Martínez-Unanue Arantza Casillas Víctor Fresno-Fernández

This paper presents an approach for Multilingual Document Clustering in comparable corpora. The algorithm is of heuristic nature and it uses as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. One of the main advantages of this approach is that it does not depend on bilingual or multilingual resources. However, it depends ...

متن کامل

Bilingual News Clustering Using Named Entities and Fuzzy Similarity

2007

Soto Montalvo Raquel Martínez-Unanue Arantza Casillas Víctor Fresno-Fernández

This paper is focused on discovering bilingual news clusters in a comparable corpus. Particularly, we deal with the news representation and with the calculation of the similarity between documents. We use as representative features of the news the cognate named entities they contain. One of our main goals consists of proving whether the use of only named entities is a good source of knowledge f...

متن کامل

Multilingual News Document Clustering: Two Algorithms Based on Cognate Named Entities

2006

Soto Montalvo Raquel Martínez-Unanue Arantza Casillas Víctor Fresno-Fernández

This paper presents an approach for Multilingual News Document Clustering in comparable corpora. We have implemented two algorithms of heuristic nature that follow the approach. They use as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. In addition, no information about the right number of clusters has to be provided to ...

متن کامل

UBC Entity Linking at TAC-KBP 2013: random forests for high accuracy

2013

Ander Barrena Eneko Agirre Aitor Soroa

This paper describe our systems and different runs submitted for the Entity Linking task at TAC-KBP 2013. We developed two systems, one is a generative entity linking model and the other is a supervised system reusing the scores of the previous model using random forests. Our main research interest is Named Entity Disambiguation task and we thus performed a very naive clustering of NIL instance...

متن کامل

Robust Multilingual Named Entity Recognition with Shallow Semi-supervised Features (Extended Abstract)

2017

Rodrigo Agerri German Rigau

We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamless...

متن کامل

A Hybrid Approach to Features Representation for Fine-grained Arabic Named Entity Recognition

2014

Fahd Alotaibi Mark G. Lee

Despite considerable research on the topic of Arabic Named Entity Recognition (NER), almost all efforts focus on a traditional set of semantic classes, features and token representations. In this work, we advance previous research in a systematic manner and devise a novel method to represent these features, relying on a dependency-based structure to capture further evidence within the sentence....

متن کامل