Semi-supervised learning of the hidden vector state model for extracting protein-protein interactions

نویسندگان

  • Deyu Zhou
  • Yulan He
  • Chee Keong Kwoh
چکیده

OBJECTIVE The hidden vector state (HVS) model is an extension of the basic discrete Markov model in which context is encoded as a stack-oriented state vector. It has been applied successfully for protein-protein interactions extraction. However, the HVS model, being a statistically based approach, requires large-scale annotated corpora in order to reliably estimate model parameters. This is normally difficult to obtain in practical applications. METHODS AND MATERIALS In this paper, we present two novel semi-supervised learning approaches, one based on classification and the other based on expectation-maximization, to train the HVS model from both annotated and un-annotated corpora. RESULTS AND CONCLUSION Experimental results show the improved performance over the baseline system using the HVS model trained solely from the annotated corpus, which gives the support to the feasibility and efficiency of our approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering

Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...

متن کامل

Effective reranking for extracting protein-protein interactions from biomedical literature

A semantic parser based on the hidden vector state (HVS) model has been proposed for extracting protein-protein interactions. The HVS model is an extension of the basic discrete hidden Markov model, in which context is encoded as a stack-oriented state vector and state transitions are factored into a stack shift operation followed by the push of a new preterminal category label. In this paper, ...

متن کامل

Extracting Protein-Protein Interaction based on Discriminative Training of the Hidden Vector State Model

The knowledge about gene clusters and protein interactions is important for biological researchers to unveil the mechanism of life. However, large quantity of the knowledge often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and deep parsing, have been prop...

متن کامل

Learning to Extract Proteins and their Interactions from Medline Abstracts

We present results from a variety of learned information extraction systems for identifying human protein names in Medline abstracts and subsequently extracting interactions between the proteins. We demonstrate that machine learning approaches using support vector machines and hidden Markov models are able to identify human proteins with higher accuracy than several previous approaches. We also...

متن کامل

Extracting PPIs from MEDLINE using the HVS Model 1 Extracting Protein-Protein Interactions from MEDLINE using the Hidden Vector State Model

Protein-protein interactions referring to the associations of protein molecules are crucial for many biological functions. A major challenge in text mining for biomedicine is automatically extracting protein-protein interactions from the vast amount of biomedical literature since most knowledge about them still hides in biomedical publications. We have constructed an information extraction syst...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Artificial intelligence in medicine

دوره 41 3  شماره 

صفحات  -

تاریخ انتشار 2007