A New Challenge for Text Mining: Cancer Risk Assessment

نویسندگان

  • Ian Lewin
  • Ilona Silins
  • Anna Korhonen
  • Johan Högberg
  • Ulla Stenius
چکیده

Motivation: Cancer Risk Assessment (RA) of chemicals is an important and challenging multi-step task which requires combining scientific expertise with elaborate literature search and review. Due to the rapidly growing volume of RA literature, the increasing complexity of experimental evidence, and the accelerating need for chemical assessment, the task is now getting increasingly challenging to manage via manual means. Text Mining (TM) technology specifically tailored for the needs of the task could lead to considerably more systematic and efficient RA. In this paper we present the first steps taken towards the development of such technology. Results: We have downloaded a corpus of 830 abstracts from PubMed and manually annotated the abstracts according to their relevance and the type of evidence they provide for cancer RA of selected test chemicals. The result is a taxonomy which classifies the key types of scientific evidence required for RA. The taxonomy can aid manual RA and is a starting point for the development of an approach based on TM. Using the annotated corpus we have demonstrated that supervised machine learning of large portions of the taxonomy and overall document relevance yields high accuracy and can be useful for the first step of cancer RA: finding the articles relevant for the task. We are now installing the automated classifier into the pipeline so that we can assess its impact on the RA process as a whole.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CRAB 2.0: A text mining tool for supporting literature review in chemical cancer risk assessment

Chemical cancer risk assessment is a literature-dependent task which could greatly benefit from text mining support. In this paper we describe CRAB – the first publicly available tool for supporting the risk assessment workflow. CRAB, currently at version 2.0, facilitates the gathering of relevant literature via PubMed queries as well as semantic classification, statistical analysis and efficie...

متن کامل

User-Driven Development of Text Mining Resources for Cancer Risk Assessment

One of the most neglected areas of biomedical Text Mining (TM) is the development of systems based on carefully assessed user needs. We investigate the needs of an important task yet to be tackled by TM — Cancer Risk Assessment (CRA) — and take the first step towards the development of TM for the task: identifying and organizing the scientific evidence required for CRA in a taxonomy. The taxono...

متن کامل

Text Mining for Literature Review and Knowledge Discovery in Cancer Risk Assessment and Research

Research in biomedical text mining is starting to produce technology which can make information in biomedical literature more accessible for bio-scientists. One of the current challenges is to integrate and refine this technology to support real-life scientific tasks in biomedicine, and to evaluate its usefulness in the context of such tasks. We describe CRAB - a fully integrated text mining to...

متن کامل

ارائه مدلی برای استخراج اطلاعات از مستندات متنی، مبتنی بر متن‌کاوی در حوزه یادگیری الکترونیکی

As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. T...

متن کامل

Health Risk Assessment of Heavy Metals in Soil from the Iron Mines of Itakpe and Agbaja, Kogi State, Nigeria

The study evaluates associated health risks of heavy metals in the soil to inhabitants of two mining areas of Nigeria. For so doing, it collects and analyses nine homogenous soil samples for their lead, copper, cadmium, zinc, and chromium levels, using AAS. The samples are then used to calculate health risks to adults and children. For adult population in Agbaja community, the calculated hazard...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008