Mixing Statistical and Symbolic Approaches for Chemical Names Recognition

نویسندگان

  • Florian Boudin
  • Juan-Manuel Torres-Moreno
  • Marc El-Bèze
چکیده

This paper investigates the problem of automatic chemical Term Recognition (TR) and proposes to tackle the problem by fusing Symbolic and statistical techniques. Unlike other solutions described in the literature, which only use complex and costly human made ruledbased matching algorithms, we show that the combination of a seven rules matching algorithm and a näıve Bayes classifier achieves high performances. Through experiments performed on different kind of available Organic Chemistry texts, we show that our hybrid approach is also consistent across different data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

Integrating Symbolic And Statistical Approches In Speech And Natural Language Applications

Symbolic and statistical approaches have traditionally been kept separate and applied to very different problems. Symbolic techniques apply best where we have a priori knowledge of the language or the domain and where the application of a theory or study of selected examples can help leverage and extend our knowledge. Statistical approaches apply best where the results of decisions can be repre...

متن کامل

Multi-class Protein Fold Recognition Through a Symbolic-Statistical Framework

Protein fold recognition is an important problem in molecular biology. Machine learning symbolic approaches have been applied to automatically discover local structural signatures and relate these to the concept of fold in SCOP. However, most of these methods cannot handle uncertainty being therefore not able to solve multiple prediction problems. In this paper we present an application of the ...

متن کامل

A Comparison of Statistical Approaches to Symbolic Genre Recognition

Previous work in genre recognition and characterization from symbolic sources (melodies extracted from MIDI files) carried out by our group pointed our research to study how the different utilized approaches perform and how their different abilities can be used together in order to improve both the accuracy and robustness of their decisions. Results for a corpus of Jazz and Classical music piec...

متن کامل

Combining Machine Learning with Dictionary Lookup for Chemical Compound and Drug Name Recognition Task

Following the interest taken into Name Entity Recognition in academic literature in the Gene Mention recognition task of BioCreative I and II, the BioCreative IV hopes to make the implementation of the system in the field of detecting mentions of chemical compounds and drugs. Considering that the machine learning methods have obtained great success in the correct identification of gene and prot...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008