Bayesian shrinkage estimation of the relative abundance of mRNA transcripts using SAGE.

نویسندگان

  • Jeffrey S Morris
  • Keith A Baggerly
  • Kevin R Coombes
چکیده

Serial analysis of gene expression (SAGE) is a technology for quantifying gene expression in biological tissue that yields count data that can be modeled by a multinomial distribution with two characteristics: skewness in the relative frequencies and small sample size relative to the dimension. As a result of these characteristics, a given SAGE sample may fail to capture a large number of expressed mRNA species present in the tissue. Empirical estimators of mRNA species' relative abundance effectively ignore these missing species, and as a result tend to overestimate the abundance of the scarce observed species comprising a vast majority of the total. We have developed a new Bayesian estimation procedure that quantifies our prior information about these characteristics, yielding a nonlinear shrinkage estimator with efficiency advantages over the MLE. Our prior is mixture of Dirichlets, whereby species are stochastically partitioned into abundant and scarce classes, each with its own multivariate prior. Simulation studies reveal our estimator has lower integrated mean squared error (IMSE) than the MLE for the SAGE scenarios simulated, and yields relative abundance profiles closer in Euclidean distance to the truth for all samples simulated. We apply our method to a SAGE library of normal colon tissue, and discuss its implications for assessing differential expression.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

E-Bayesian Approach in A Shrinkage Estimation of Parameter of Inverse Rayleigh Distribution under General Entropy Loss Function

‎Whenever approximate and initial information about the unknown parameter of a distribution is available, the shrinkage estimation method can be used to estimate it. In this paper, first the $ E $-Bayesian estimation of the parameter of inverse Rayleigh distribution under the general entropy loss function is obtained. Then, the shrinkage estimate of the inverse Rayleigh distribution parameter i...

متن کامل

Shrinkage Estimation for SAGE Data using a Mixture Dirichlet Prior

Serial Analysis of Gene Expression (SAGE) is a technique for estimating the gene expression profile of a biological sample. Any efficient inference in SAGE must be based upon efficient estimates of these gene expression profiles, which consist of the estimated relative abundances for each mRNA species present in the sample. The data from SAGE experiments are counts for each observed mRNA specie...

متن کامل

Classic and Bayes Shrinkage Estimation in Rayleigh Distribution Using a Point Guess Based on Censored Data

Introduction      In classical methods of statistics, the parameter of interest is estimated based on a random sample using natural estimators such as maximum likelihood or unbiased estimators (sample information). In practice,  the researcher has a prior information about the parameter in the form of a point guess value. Information in the guess value is called as nonsample information. Thomp...

متن کامل

P-82: Effect of SCNT Steps on Relative mRNA Abundances of Sheep Oocytes

Background: The oocyte is a unique cell committed to reprogram fertilizing sperm and to support early stages of embryonic development until the species-specific stage of zygote genome activation that occurs around the second to third cell cycle in sheep embryos. In this sense, considering the huge list of oocyte transcripts, we selected some candidates genes based on their roles of regulating di...

متن کامل

Serial Analysis of Gene Expression: Applications in Malaria Parasite, Yeast, Plant, and Animal Studies

The serial analysis of gene expression (SAGE) method is based on the isolation of unique sequence tags from individual transcripts and concatenation of tags serially into long DNA molecules. SAGE is an innovative technique that offers the potential of cataloging both the identity and relative frequencies of mRNA transcripts in a given RNA preparation. It can quantify low-abundance transcripts a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Biometrics

دوره 59 3  شماره 

صفحات  -

تاریخ انتشار 2003