IDENTIFYING INTERESTING GENES WITH SIGGENES Identifying Interesting Genes with siggenes
نویسندگان
چکیده
A common and important task in microarray experiments is the identification of genes whose expression values differ substantially between groups or conditions. Finding such differentially expressed genes requires methods that can deal with multiple testing problems in which thousands or even tens of thousands of hypotheses are tested simultaneously. Usually, a statistic appropriate for testing if the expression levels are associated with a covariate of interest and the corresponding p-value are computed for each gene. Afterwards, these raw p-values are adjusted for multiplicity such that a Type I error rate is strongly controlled at a pre-specified level of significance. The classical example of such an error rate is the family-wise error rate (FWER), i.e. the probability of at least one false positive. This error rate, however, might be too conservative for a situation in which thousands of hypotheses are tested and several tens of genes should be identified. In the analysis of microarray data, another error rate has, hence, become very popular: The False Discovery Rate (FDR) which is loosely spoken the expected proportion of false positives among all rejected null hypotheses, i.e. identified genes. There are, however, other ways to adjust for multiplicity: For example, QQ plots or the Bayesian framework can be employed for this purpose. If the observed test statistics are plotted against the values of the test statistics that would be expected under the null hypothesis most of the points will approximately lie on the diagonal. Those points that differ substantially from this line correspond to genes that are most likely differentially expressed. The Significance Analysis of Microarrays (SAM) proposed by Tusher et al. (2001) can be used to specify what “differ substantially" means. While Tusher et al. (2001) base their analysis on a moderated t statistic, Schwender et al. (2003) compare this approach with a SAM version based on Wilcoxon rank sums. Efron et al. (2001) use an empirical Bayes analysis (EBAM) to model the distribution of the observed test statistics as a mixture of two components, one for the differentially expressed genes, and the other for the not differentially expressed genes. Following their analysis, a gene is called differentially expressed if the corresponding posterior probability is larger than 0.9. Both SAM and EBAM are implemented in the Bioconductor package siggenes. In this article, we, however, will concentrate on SAM. In the following, we briefly describe the SAM procedure, its implementation in siggenes (for more details, see Schwender et al. (2003)), and the test statistics already available in this package. Afterwards, we show how you can write your own function for other testing situations. Finally, we will give an example of how sam can be applied to gene expression data.
منابع مشابه
Identifying differentially expressed genes with siggenes
In this vignette, we show how the functions contained in the R package siggenes can be used to perform both the Significance Analysis of Microarrays (SAM) proposed by Tusher et al. (2001) and the Empirical Bayes Analysis of Microarrays (EBAM) suggested by Efron et al. (2001).
متن کاملExpression analysis of the estrogen receptor target genes in renal cell carcinoma
The aim of the present study was to investigate the differentially expressed genes (DEGs) and target genes of the estrogen receptor (ER) in renal cell carcinoma. The data (GSE12090) were downloaded from the gene expression omnibus database. Data underwent preprocessing using the affy package for Bioconductor software, then the DEGs were selected via the significance analysis of microarray algor...
متن کاملUsing the Protein-protein Interaction Network to Identifying the Biomarkers in Evolution of the Oocyte
Background Oocyte maturity includes nuclear and cytoplasmic maturity, both of which are important for embryo fertilization. The development of oocyte is not limited to the period of follicular growth, and starts from the embryonic period and continues throughout life. In this study, for the purpose of evaluating the effect of the FSH hormone on the expression of genes, GEO access codes for this...
متن کاملIdentification of Alzheimer disease-relevant genes using a novel hybrid method
Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...
متن کامل[The genetic determinism of polygenic diseases].
Progress in molecular biology has opened the way to identifying genes involved in predisposition to multigene diseases. The two methods currently used for this purpose--analysis of candidate genes and systematic genomic screening--have given interesting but only very partial results. The problem is complicated by the large number of genes involved, their low penetrance, and linkage disequilibrium.
متن کامل