Incidence of missing values in hierarchical clustering of microarrays data

نویسندگان

  • Alexandre G. de Brevern
  • Serge Hazout
  • Alain Malpertuy
چکیده

Microarrays allow to determine expressed genes in a given cell type, at a given time and under particular experimental conditions. These experiments are performed on a huge scale and produce numerous data. Methods like Hierarchical Clustering (HC) [1] or Self Organizing Map [2] are often used to identify co-expressed genes. The data often contain several missing values (MV) due to experimental difficulties. These missing data represent a drawback for using certain quantitative analysis. Moreover, the presence of MV may strikingly modify the results. Usually, the users remove the genes with MV or replace the MV by a constant (zero or the average expression level of the entire experiment). Troyanskaya et al. demonstrate that the weighted K-Nearest Neighbours (KNN) seems to be the most accurate method to estimate the value of MV in microarrays data [3]. First we have analysed the quantity of missing values in different public sets of microarrays data concerning Saccharomyces cerevisiae. In a second step we have assessed the effects of MV and the chosen metrics in the HC results. Then we have tested the KNN method incidence on the HC with our different datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Missing value estimation methods for DNA microarrays

MOTIVATION Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete matrix of gene array values as input. For example, methods such as hierarchical clustering and K-means clustering are not robust to missing data, and may lose effectiveness even with a few missing values....

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation

1- INTRODUCTION The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to ...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

Missing data imputation in multivariable time series data

Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003