Research and applications: Induced lexico-syntactic patterns improve information extraction from online medical forums

نویسندگان

  • Sonal Gupta
  • Diana L. MacLean
  • Jeffrey Heer
  • Christopher D. Manning
چکیده

OBJECTIVE To reliably extract two entity types, symptoms and conditions (SCs), and drugs and treatments (DTs), from patient-authored text (PAT) by learning lexico-syntactic patterns from data annotated with seed dictionaries. BACKGROUND AND SIGNIFICANCE Despite the increasing quantity of PAT (eg, online discussion threads), tools for identifying medical entities in PAT are limited. When applied to PAT, existing tools either fail to identify specific entity types or perform poorly. Identification of SC and DT terms in PAT would enable exploration of efficacy and side effects for not only pharmaceutical drugs, but also for home remedies and components of daily care. MATERIALS AND METHODS We use SC and DT term dictionaries compiled from online sources to label several discussion forums from MedHelp (http://www.medhelp.org). We then iteratively induce lexico-syntactic patterns corresponding strongly to each entity type to extract new SC and DT terms. RESULTS Our system is able to extract symptom descriptions and treatments absent from our original dictionaries, such as 'LADA', 'stabbing pain', and 'cinnamon pills'. Our system extracts DT terms with 58-70% F1 score and SC terms with 66-76% F1 score on two forums from MedHelp. We show improvements over MetaMap, OBA, a conditional random field-based classifier, and a previous pattern learning approach. CONCLUSIONS Our entity extractor based on lexico-syntactic patterns is a successful and preferable technique for identifying specific entity types in PAT. To the best of our knowledge, this is the first paper to extract SC and DT entities from PAT. We exhibit learning of informal terms often used in PAT but missing from typical dictionaries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extraction of Semantic Relationships from Academic Papers using Syntactic Patterns

Integrating concept and citation networks on a specific research subject can help researchers focus their own work or use methods described in prior works. In this paper, we propose a method to extract semantic relations from concepts and citation in the descriptions of related work. Specifically, we examined (i) topic-paper relations between research topics and reference papers and (ii) method...

متن کامل

Using Lexico-Syntactic Ontology Design Patterns for Ontology Creation and Population

In this paper we discuss the use of information extraction techniques involving lexico-syntactic patterns to generate ontological information from unstructured text and either create a new ontology from scratch or augment an existing ontology with new entities. We refine the patterns using a term extraction tool and some semantic restrictions derived from WordNet and VerbNet, in order to preven...

متن کامل

Evaluating Various Linguistic Features on Semantic Relation Extraction

Machine learning approaches for Information Extraction use different types of features to acquire semantically related terms from free text. These features may contain several kinds of linguistic knowledge: from orthographic or lexical to more complex features, like PoStags or syntactic dependencies. In this paper we select four main types of linguistic features and evaluate their performance i...

متن کامل

Insight to Hyponymy Lexical Relation Extraction in the Patent Genre Versus Other Text Genres

Due to the large amount of available patent data, it is no longer feasible for industry actors to manually create their own terminology lists and ontologies. Furthermore, domain specific thesauruses are rarely accessible to the research community. In this paper we present extraction of hyponymy lexical relations conducted on patent text using lexico-syntactic patterns. We explore the lexico-syn...

متن کامل

Methodology to Build Medical Ontology from Textual Resources

In the medical field, it is now established that the maintenance of unambiguous thesauri goes through ontologies. Our research task is to help pneumologists code acts and diagnoses with a software that represents medical knowledge through a domain ontology. In this paper, we describe our general methodology aimed at knowledge engineers in order to build various types of medical ontologies based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of the American Medical Informatics Association : JAMIA

دوره 21 5  شماره 

صفحات  -

تاریخ انتشار 2014