Symbolic rule-based classification of lung cancer stages from free-text pathology reports
نویسندگان
چکیده
OBJECTIVE To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification. DESIGN By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relating to the staging guidelines. Post-coordinated SNOMED CT expressions based on templates were defined and populated by concepts in reports, and tested for subsumption by staging factors. The subsumption results were used to build logic according to the staging guidelines to calculate the TNM stage. MEASUREMENTS The accuracy measure and confusion matrices were used to evaluate the TNM stages classified by the symbolic rule-based system. The system was evaluated against a database of multidisciplinary team staging decisions and a machine learning-based text classification system using support vector machines. RESULTS Overall accuracy on a corpus of pathology reports for 718 lung cancer patients against a database of pathological TNM staging decisions were 72%, 78%, and 94% for T, N, and M staging, respectively. The system's performance was also comparable to support vector machine classification approaches. CONCLUSION A system to classify lung TNM stages from free-text pathology reports was developed, and it was verified that the symbolic rule-based approach using SNOMED CT can be used for the extraction of key lung cancer characteristics from free-text reports. Future work will investigate the applicability of using the proposed methodology for extracting other cancer characteristics and types.
منابع مشابه
Assessing the Utility of Automatic Cancer Registry Notifications Data Extraction from Free-Text Pathology Reports
Cancer Registries record cancer data by reading and interpreting pathology cancer specimen reports. For some Registries this can be a manual process, which is labour and time intensive and subject to errors. A system for automatic extraction of cancer data from HL7 electronic free-text pathology reports has been proposed to improve the workflow efficiency of the Cancer Registry. The system is c...
متن کاملAutomated Cancer Stage Classification from Free-text Histology Reports
Objectives: This paper describes a system to automatically classify the stage of a lung cancer patient based on text analysis of their histology reports. Methods: The system uses machine learning techniques to train a statistical classifier, specifically a support vector machine, for each stage category based on word occurrences in a corpus of histology reports for staged patients. New reports ...
متن کاملApplication of Information Technology: Collection of Cancer Stage Data by Classifying Free-text Medical Reports
Cancer staging provides a basis for planning clinical management, but also allows for meaningful analysis of cancer outcomes and evaluation of cancer care services. Despite this, stage data in cancer registries is often incomplete, inaccurate, or simply not collected. This article describes a prototype software system (Cancer Stage Interpretation System, CSIS) that automatically extracts cancer...
متن کاملAutomated classification of free-text pathology reports for registration of incident cases of cancer.
OBJECTIVE Our study aimed to construct and evaluate functions called "classifiers", produced by supervised machine learning techniques, in order to categorize automatically pathology reports using solely their content. METHODS Patients from the Poitou-Charentes Cancer Registry having at least one pathology report and a single non-metastatic invasive neoplasm were included. A descriptor weight...
متن کاملEvaluating classification power of linked admission data sources with text mining
Lung cancer is a leading cause of death in developed countries. This paper presents a text mining system using Support Vector Machines for detecting lung cancer admissions. Performance of the system using different clinical data sources is evaluated. We use radiology reports as an initial data source and add other sources, such as pathology reports, patient demographic information and hospital ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 17 4 شماره
صفحات -
تاریخ انتشار 2010