Classification of arrayCGH data using fused SVM

نویسندگان

  • Franck Rapaport
  • Emmanuel Barillot
  • Jean-Philippe Vert
چکیده

MOTIVATION Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of bacterial artificial chromosomes along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules. RESULTS We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome. AVAILABILITY All data and algorithms are publicly available.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification of arrayCGH data using a fused SVM

Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profi...

متن کامل

A Comparative Study of SVM and RF Methods for Classification of Alteration Zones Using Remotely Sensed Data

Identification and mapping of the significant alterations are the main objectives of the exploration geochemical surveys. The field study is time-consuming and costly to produce the classified maps. Therefore, the processing of remotely sensed data, which provide timely and multi-band (multi-layer) data, can be substituted for the field study. In this study, the ASTER imagery is used for altera...

متن کامل

A hypergraph-based learning algorithm for classifying gene expression and arrayCGH data with prior knowledge

MOTIVATION Incorporating biological prior knowledge into predictive models is a challenging data integration problem in analyzing high-dimensional genomic data. We introduce a hypergraph-based semi-supervised learning algorithm called HyperPrior to classify gene expression and array-based comparative genomic hybridization (arrayCGH) data using biological knowledge as constraints on graph-based ...

متن کامل

Integrating Textural and Spectral Features to Classify Silicate-Bearing Rocks Using Landsat 8 Data

Texture as a measure of spatial features has been useful as supplementary information to improve image classification in many areas of research fields. This study focuses on assessing the ability of different textural vectors and their combinations to aid spectral features in the classification of silicate rocks. Texture images were calculated from Landsat 8 imagery using a fractal dimension me...

متن کامل

Application of Fusion with Sar and Optical Images in Land Use Classification Based on Svm

As the increment of remote sensing data with multi-space resolution, multi-spectral resolution and multi-source, data fusion technologies have been widely used in geological fields. Synthetic Aperture Radar (SAR) and optical camera are two most common sensors presently. The multi-spectral optical images express spectral features of ground objects, while SAR images express backscatter informatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 24  شماره 

صفحات  -

تاریخ انتشار 2008