Biweight Correlation as a Measure of Distance between Genes on a Microarray
نویسندگان
چکیده
Motivation: The underlying goal of microarray experiments is to identify genetic patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions should be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partition genes of interest into groups, clusters, or modules based on measures of similarity. Typically, Pearson correlation is used to measure distance (or similarity) before implementing a clustering algorithm. Pearson correlation is quite susceptible to outliers, however, an unfortunate characteristic when dealing with microarray data (well known to be typically quite noisy.) Results: We propose a resistant similarity metric based on Tukey’s biweight estimate of multivariate scale and location. The resistant metric is simply the correlation obtained from a resistant covariance matrix of scale. We give results which demonstrate that our correlation metric is much more resistant than the Pearson correlation while being more efficient than other nonparametric measures of correlation (e.g., Spearman correlation.) Additionally, our method gives a systematic gene flagging procedure which is useful when dealing with large amounts of noisy data. Availability: R code is available from the authors.
منابع مشابه
Biweight Correlation as a Measure of Distance between Genes on a Microarray Abstract: The underlying goal of microarray experiments is to identify genetic patterns
The underlying goal of microarray experiments is to identify genetic patterns across different experimental conditions. Genes contained in a particular pathway or that respond similarly to experimental conditions should be coregulated and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses, we can partition genes of interest...
متن کاملThe Advantages of a Biweight Metric in Clustering Microarray Data
Distance metrics are often the backbone of clustering algorithms. Yet certain distance metrics, such as one based on Pearson’s correlation, are sensitive to outliers. Microarray data tend to have outlying data points. Hence, we may intuitively believe metrics like one based on Pearson’s correlation may not be appropriate for clustering microarray data. Hardin, et al. (2007) show a metric based ...
متن کاملDiagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data
Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...
متن کاملInferring gene-gene interactions from microarray data series using unbiased pattern detection
Motivation: In recent years microarray technology has provided biologists with an unprecedented wealth of experimental data. The analysis and interpretation of this data, usually for the purpose of identifying biologically significant genes or clusters of interacting genes, presents a highly complex challenge, and has been one of the most prominent areas of bioinformatics and systems biology. R...
متن کاملRanking Genes from DNA Microarray Data of Cervical Cancer by a local Tree Comparison
The major objective of this paper is to introduce a new method to select genes from DNA microarray data. As criterion to select genes we suggest to measure the local changes in the correlation graph of each gene and to select those genes whose local changes are largest. More precisely, we calculate the correlation networks from DNA microarray data of cervical cancer whereas each network represe...
متن کامل