Bayesian evaluation of effect size after replicating an original study
نویسندگان
چکیده
The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study's significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method.
منابع مشابه
Bayesian Sample Size Determination for Joint Modeling of Longitudinal Measurements and Survival Data
A longitudinal study refers to collection of a response variable and possibly some explanatory variables at multiple follow-up times. In many clinical studies with longitudinal measurements, the response variable, for each patient is collected as long as an event of interest, which considered as clinical end point, occurs. Joint modeling of continuous longitudinal measurements and survival time...
متن کاملBayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models
Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...
متن کاملContent-aware Image Retargeting Based on Visual Effect Assessment
Content-aware image retargeting has drawn much attention in image and vision research in recent years. However, existing methods are very difficult to ensure that the result images from retargeting achieve good visual effect on the whole, since these methods mainly focus on spatial image information. In this paper, we propose a new approach on content-aware image retargeting based on visual eff...
متن کاملThe Effect of Bayesian Reasoning Training on the Results of Clinical Reasoning Tests of Interns
Introduction: Clinical reasoning includes a range of thinking about clinical medicine at all stages of patient evaluation. Bayesian theory can be used to refute or confirm differential diagnoses in the clinical reasoning process. In this way, by learning the basic mathematical language of probability in medicine, we can change our beliefs according to new evidence. The aim of this study is to i...
متن کاملComparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study
Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...
متن کامل