Evaluating Methods for Identifying Cancer in Free-Text Pathology Reports Using Various Machine Learning and Data Preprocessing Approaches
نویسندگان
چکیده
Automated detection methods can address delays and incompleteness in cancer case reporting. Existing automated efforts are largely dependent on complex dictionaries and coded data. Using a gold standard of manually reviewed pathology reports, we evaluated the performance of alternative input formats and decision models on a convenience sample of free-text pathology reports. Results showed that the input format significantly impacted performance, and specific algorithms yielded better results for presicion, recall and accuracy. We conclude that our approach is sufficiently accurate for practical purposes and represents a generalized process.
منابع مشابه
Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملAutomatic Classification of Cancer Notifiable Death Certificates
The timely notification of cancer cases is crucial for cancer monitoring and prevention. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, are complex and time-consuming activities. In this paper we investigate approaches for the automatic detection of cases where the cause of death is a notifi...
متن کاملPrediction of Breast Tumor Malignancy Using Neural Network and Whale Optimization Algorithms (WOA)
Introduction: Breast cancer is the most prevalent cause of cancer mortality among women. Early diagnosis of breast cancer gives patients greater survival time. The present study aims to provide an algorithm for more accurate prediction and more effective decision-making in the treatment of patients with breast cancer. Methods: The present study was applied, descriptive-analytical, based on the ...
متن کاملSymbolic rule-based classification of lung cancer stages from free-text pathology reports
OBJECTIVE To classify automatically lung tumor-node-metastases (TNM) cancer stages from free-text pathology reports using symbolic rule-based classification. DESIGN By exploiting report substructure and the symbolic manipulation of systematized nomenclature of medicine-clinical terms (SNOMED CT) concepts in reports, statements in free text can be evaluated for relevance against factors relati...
متن کاملEvaluating machine learning methods and satellite images to estimate combined climatic indices
The reflections recorded on satellite images have been affected by various environmental factors. In these images, some of these factors are combined with other environmental factors that cannot be distinguished. Therefore, it seems wise to model these environmental phenomena in the form of hybrid indicators. In this regard, satellite imagery and machine learning methods can play a unique role ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Studies in health technology and informatics
دوره 216 شماره
صفحات -
تاریخ انتشار 2015