PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
نویسنده
چکیده
Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising. Keywords—Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.
منابع مشابه
BioPPIExtractor: A protein-protein interaction extraction system for biomedical literature
Automatic extracting protein–protein interaction information from biomedical literature can help to build protein relation network, predict protein function and design new drugs. This paper presents a protein–protein interaction extraction system BioPPIExtractor for biomedical literature. This system applies Conditional Random Fields model to tag protein names in biomedical text, then uses a li...
متن کاملAnalysis of Link Grammar on Biomedical Dependency Corpus Targeted at Protein-Protein Interactions
In this paper, we present an evaluation of the Link Grammar parser on a corpus consisting of sentences describing protein-protein interactions. We introduce the notion of an interaction subgraph, which is the subgraph of a dependency graph expressing a protein-protein interaction. We measure the performance of the parser for recovery of dependencies, fully correct linkages and interaction subgr...
متن کاملIntegrating querying and retrieval for biomedical information extraction
Biomedical natural language processing (BioNLP) involves the automatic processing of documents in the biomedical domain for the purpose of extracting information of interest from them. A common approach to BioNLP is through sequential application of a selection of modules (such as a named entity recognizer and a grammar parser followed by extraction) where the aggregated output is stored in an ...
متن کاملTowards Effective Sentence Simplification for Automatic Processing of Biomedical Text
The complexity of sentences characteristic to biomedical articles poses a challenge to natural language parsers, which are typically trained on large-scale corpora of non-technical text. We propose a text simplification process, bioSimplify, that seeks to reduce the complexity of sentences in biomedical abstracts in order to improve the performance of syntactic parsers on the processed sentence...
متن کاملExtracting gene pathway relations using a hybrid grammar: the Arizona Relation Parser
MOTIVATION Text-mining research in the biomedical domain has been motivated by the rapid growth of new research findings. Improving the accessibility of findings has potential to speed hypothesis generation. RESULTS We present the Arizona Relation Parser that differs from other parsers in its use of a broad coverage syntax-semantic hybrid grammar. While syntax grammars have generally been tes...
متن کامل