An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
نویسندگان
چکیده
The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein-protein interactions extraction, and (2) Gene-suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene-suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.
منابع مشابه
Extraction of Drug-Drug Interaction from Literature through Detecting Linguistic-based Negation and Clause Dependency
Extracting biomedical relations such as drug-drug interaction (DDI) from text is an important task in biomedical NLP. Due to the large number of complex sentences in biomedical literature, researchers have employed some sentence simplification techniques to improve the performance of the relation extraction methods. However, due to difficulty of the task, there is no noteworthy improvement in t...
متن کاملFiltering large-scale event collections using a combination of supervised and unsupervised learning for event trigger classification
BACKGROUND Biomedical event extraction is one of the key tasks in biomedical text mining, supporting various applications such as database curation and hypothesis generation. Several systems, some of which have been applied at a large scale, have been introduced to solve this task. Past studies have shown that the identification of the phrases describing biological processes, also known as trig...
متن کاملExtraction of Gene/Protein Interaction from Text Documents with Relation Kernel
Even though there are many databases for gene/protein interactions, most such data still exist only in the biomedical literature. They are spread in biomedical literature written in natural languages and they require much effort such as data mining for constructing well-structured data forms. As genomic research advances, knowledge discovery from a large collection of scientific papers is becom...
متن کاملSimple tricks for improving pattern-based information extraction from the biomedical literature
BACKGROUND Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns. RESULTS We propose several techniques for filtering sets of automatically generated patterns a...
متن کاملPainless Relation Extraction with Kindred
Relation extraction methods are essential for creating robust text mining tools to help researchers find useful knowledge in the vast published literature. Easy-touse and generalizable methods are needed to encourage an ecosystem in which researchers can easily use shared resources and build upon each others’ methods. We present the Kindred Python package1 for relation extraction. It builds upo...
متن کامل