SyCo: A Probabilistic Machine Learning Method for Classifying Chief Complaints into Symptom and Syndrome Categories

نویسندگان

  • Jeremy U. Espino
  • John Dowling
  • John Levander
  • Peter Sutovsky
  • Michael M. Wagner
  • Gregory F. Cooper
چکیده

OBJECTIVE Design, build and evaluate a symptom-based probabilistic chief complaint classifier for the Real-time Outbreak and Disease Surveillance System (RODS). BACKGROUND Scientists have utilized many chief complaint (CC) classification techniques in biosurveillance including keyword search, weighted keyword search, and naïve Bayes. These techniques may utilize CC-to-syndrome or CC-to-symptom-to-syndrome classification approaches. In the former approach, we classify a CC directly into syndrome categories. In the latter approach, we first classify a CC into symptom categories. Then, we use a syndrome definition, a combination of one or more symptoms, to determine whether or not a chief complaint belongs in a particular syndrome category. One approach to CC-to-symptom-to-syndrome classification uses manually weighted keyword search and Boolean operations to build syndrome classifiers. A limitation to this approach is that it does not address uncertainty in the data and the system is manually parameterized. A CC-tosymptom-to-syndrome approach that is both probabilistic and utilizes machine learning addresses these limitations. METHODS We constructed SyCo — a CC-to-symptom-to-syndrome probabilistic chief complaint classifier. SyCo learns a Naïve Bayes model of the relationship between words and symptoms given a training set of labeled chief complaints. To perform a classification, SyCo first computes the posterior probability of each symptom using the odds formulation of Bayes rule. SyCo can compute the posterior probability traditionally or in single word mode. When single word mode is enabled SyCo will only use the likelihood ratio of the word (given a symptom) that maximizes the posterior probability. Finally, SyCo uses the posterior probabilities from the first step to compute the posterior probability of a syndrome given a chief complaint. A syndrome is defined as any combination of symptoms and Boolean operations. SyCo supports the operations AND, OR, and NOT by using the rules of conjunction, disjunction and negation of independent events. For example, P(A) AND P(B) = P(A) x P(B). A board certified infectious disease physician [JD] read 16718 chief complaints and indicated the presence or absence of seventeen symptoms for each chief complaint. We measured the performance of SyCo when classifying seventeen individual symptoms and three syndromes with and without the single word mode using leave-one-out cross validation. We measured the area under the curve (AUC) of the resultant receiver operator characteristic (ROC) curves and established 90% confidence intervals using 100 iterations of non-parametric bootstrapping. RESULTS The area under the curve without and with the single word assumption ranged from 0.785 to 0.9918 and 0.7442 to 0.9916, respectively. The single word mode improved performance significantly in 7 out of 20 cases and degraded performance in 2 out of the 20 cases. CONCLUSION SyCo is a symptom-based probabilistic chief complaint classifier that has excellent discriminatory ability for classifying chief complaints into symptom categories and syndromes. We have made SyCo available in RODS Version 4.2. ACKNOWLEDGEMENTS This research was supported in part by DARPA/Mellon/Pitt grant F30602-01-2-0550, NSF grant IIS-0325581 and PA Department of Health grant ME-01-737

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of preprocessing techniques for chief complaint classification

OBJECTIVE To determine whether preprocessing chief complaints before automatically classifying them into syndromic categories improves classification performance. METHODS We preprocessed chief complaints using two preprocessors (CCP and EMT-P) and evaluated whether classification performance increased for a probabilistic classifier (CoCo) or for a keyword-based classifier (modification of the...

متن کامل

Classifying free-text triage chief complaints into syndromic categories with natural language processing

OBJECTIVE Develop and evaluate a natural language processing application for classifying chief complaints into syndromic categories for syndromic surveillance. INTRODUCTION Much of the input data for artificial intelligence applications in the medical field are free-text patient medical records, including dictated medical reports and triage chief complaints. To be useful for automated systems...

متن کامل

A Symptom Profile Analysis of Depression in a Sample of Iranian Patients

Background: In some cultures, including ours, direct explanation of inner psychic world is inhibited and stigmatized, therefore finding alternative modes of expression. The aim of this cross-sectional study was to assess the frequency of somatization in the depressed patients. Methods: The present study comprised 500 patients referred to the outpatient clinic of Shiraz University of Med...

متن کامل

Syndromic surveillance on the Victorian chief complaint data set using a hybrid statistical and machine learning technique

Emergency Department Chief Complaints have been used to detect the size and the spread of disease outbreaks in the past. Chief complaints are readily available in digital formats and provide a good data source for syndromic surveillance. This paper reports our findings on the identification of the distribution of a few syndromes over time using the Victorian Syndromic Surveillance (SynSurv) dat...

متن کامل

Multilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints

PURPOSE Syndromic surveillance is aimed at early detection of disease outbreaks. An important data source for syndromic surveillance is free-text chief complaints (CCs), which may be recorded in different languages. For automated syndromic surveillance, CCs must be classified into predefined syndromic categories to facilitate subsequent data aggregation and analysis. Despite the fact that syndr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007