Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Authors

A. Akkasi Departmen of Computer Engineeringt,Bandar Abbas Branch, Islamic Azad University, Bandar Abbas, Iran

E. Varoglu Computer Engineering Department, Eastern Mediterranean University, Famagusta, North Cyprus, Via Mersin 10, Turkey.

Abstract:

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracted is naturally imbalanced since chemical entities are fewer compared to other segments in text. In this paper, the class imbalance problem in the context of chemical named entity recognition has been studied and adopted version of random undersampling for NER data, has been leveraged to generate a pool of classifiers. In order to keep the classes’ distribution balanced within each sentence, the well-known random undersampling method is modified to a sentence based version where the random removal of samples takes place within each sentence instead of considering the dataset as a whole. Furthermore, to take the advantages of combination of a set of diverse predictors, an ensemble of classifiers trained with the set of different training data resulted by sentence-based undersampling, is created. The proposed approach is developed and tested using the ChemDNER corpus released by BioCreative IV. Results show that the proposed method improves the classification performance of the baseline classifiers mainly as a result of an increase in recall. Furthermore, the combination of high performing classifiers trained using undersampled train data surpasses the performance of all single best classifiers and the combination of classifiers using full data.

Download for Free

Already have an account?login

similar resources

Named Entity Recognition through Classifier Combination

This paper presents a classifier-combination experimental framework for named entity recognition in which four diverse classifiers (robust linear classifier, maximum entropy, transformation-based learning, and hidden Markov model) are combined under different conditions. When no gazetteer or other additional training resources are used, the combined system attains a performance of 91.6F on the ...

full text

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

full text

Dutch Named Entity Recognition using Classifier Ensembles

Named Entity Recognition (NER) is the task of automatically identifying names within text and classifying them into categories, such as persons, locations and organizations. A variety of machine learning algorithms has been applied to the task, with research often aimed at feature selection and parameter optimization to improve a single classifier’s performance. However, finding the optimal fea...

full text

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

full text

Memory-Based Named Entity Recognition

We apply a memory-based learner to the CoNLL-2002 shared task: language-independent named entity recognition. We use three additional techniques for improving the base performance of the learner: cascading, feature selection and system combination. The overall system is trained with two types of features: words and substrings of words which are relevant for this particular task. It is tested on...

full text

Named Entity Recognition as a House of Cards: Classifier Stacking

This paper presents a classifier stacking-based approach to the named entity recognition task (NER henceforth). Transformation-based learning (Brill, 1995), Snow (sparse network of winnows (Muñoz et al., 1999)) and a forward-backward algorithm are stacked (the output of one classifier is passed as input to the next classifier), yielding considerable improvement in performance. In addition, in a...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}

Journal title

Journal of Artificial Intelligence and Data Mining

volume 7 issue 2

pages 311- 319

publication date 2019-04-01

unfollow

{@ msg @}

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Chemical Named Entity recognition Class Imbalance Problem Random Undersampling Classifier Combination

Hosted on Doprax cloud platform doprax.com