Development and Performance of Text-Mining Algorithms to Extract Socioeconomic Status from De-Identified Electronic Health Records
نویسندگان
چکیده
Socioeconomic status (SES) is a fundamental contributor to health, and a key factor underlying racial disparities in disease. However, SES data are rarely included in genetic studies due in part to the difficultly of collecting these data when studies were not originally designed for that purpose. The emergence of large clinic-based biobanks linked to electronic health records (EHRs) provides research access to large patient populations with longitudinal phenotype data captured in structured fields as billing codes, procedure codes, and prescriptions. SES data however, are often not explicitly recorded in structured fields, but rather recorded in the free text of clinical notes and communications. The content and completeness of these data vary widely by practitioner. To enable gene-environment studies that consider SES as an exposure, we sought to extract SES variables from racial/ethnic minority adult patients (n=9,977) in BioVU, the Vanderbilt University Medical Center biorepository linked to de-identified EHRs. We developed several measures of SES using information available within the de-identified EHR, including broad categories of occupation, education, insurance status, and homelessness. Two hundred patients were randomly selected for manual review to develop a set of seven algorithms for extracting SES information from de-identified EHRs. The algorithms consist of 15 categories of information, with 830 unique search terms. SES data extracted from manual review of 50 randomly selected records were compared to data produced by the algorithm, resulting in positive predictive values of 80.0% (education), 85.4% (occupation), 87.5% (unemployment), 63.6% (retirement), 23.1% (uninsured), 81.8% (Medicaid), and 33.3% (homelessness), suggesting some categories of SES data are easier to extract in this EHR than others. The SES data extraction approach developed here will enable future EHR-based genetic studies to integrate SES information into statistical analyses. Ultimately, incorporation of measures of SES into genetic studies will help elucidate the impact of the social environment on disease risk and outcomes.
منابع مشابه
Text Data Mining of In-patient Nursing Records Within Electronic Medical Records Using KeyGraph
This research used a text data mining technique to extract useful information from nursing records within Electronic Medical Records. Although nursing records provide a complete account of a patient’s information, they are not being fully utilized. Such relevant information as laboratory results and remarks made by doctors and nurses is not always considered. Knowledge concerning the condition ...
متن کاملAdoption of Electronic Personal Health Records in Canada: Perceptions of Stakeholders
Background Healthcare stakeholders have a great interest in the adoption and use of electronic personal health records (ePHRs) because of the potential benefits associated with them. Little is known, however, about the level of adoption of ePHRs in Canada and there is limited evidence concerning their benefits and implications for the healthcare system. This study aimed to describe the current ...
متن کاملUsing text-mining techniques in electronic patient records to identify ADRs from medicine use.
This literature review included studies that use text-mining techniques in narrative documents stored in electronic patient records (EPRs) to investigate ADRs. We searched PubMed, Embase, Web of Science and International Pharmaceutical Abstracts without restrictions from origin until July 2011. We included empirically based studies on text mining of electronic patient records (EPRs) that focuse...
متن کاملThe relationship between mother\'s socioeconomic status and child health
Child health as one of the main indicators of economic development has been included directly in the millennium development goals. Due to the increased rate of mothers' employment and education along with children malnutrition, the effect of mothers’ socioeconomic status on children’s health was examined in this study. In case study, data on children at birth were gathered from heal...
متن کاملAutomated extraction of clinical traits of multiple sclerosis in electronic medical records
OBJECTIVES The clinical course of multiple sclerosis (MS) is highly variable, and research data collection is costly and time consuming. We evaluated natural language processing techniques applied to electronic medical records (EMR) to identify MS patients and the key clinical traits of their disease course. MATERIALS AND METHODS We used four algorithms based on ICD-9 codes, text keywords, an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
دوره 22 شماره
صفحات -
تاریخ انتشار 2017