Summarizing studies of diagnostic test performance.

نویسندگان

  • Victor M Montori
  • Gordon H Guyatt
چکیده

In this issue of Clinical Chemistry, Brown et al. (1 ) report a metaanalysis of studies of the test characteristics of the latex turbidimetric D-dimer test for the diagnosis of pulmonary embolism. In this editorial, we will discuss the importance of conducting systematic reviews of diagnostic evidence and their contribution to the practice of evidence-based laboratory medicine. Evidence-based practitioners complement, or at times substitute, diagnostic intuition with the explicit use of the best available quantitative evidence about the power of symptoms, signs, and laboratory tests to increase or decrease the probabilities associated with alternative diagnoses. Summarizing studies that have high validity will yield unbiased results, and pooling across studies will reduce the random error associated with individual smaller studies. In addition to generating more precise, accurate summaries, pooling across different patient groups will, if tests perform similarly in those groups, yield results that apply to a broader population than the individual studies. Thus, systematic summaries of valid diagnostic evidence are at the top of the hierarchy of diagnostic evidence. Summaries of evidence will yield misleading results if they try to combine results across patient groups or test methods that are too heterogeneous; if they assemble an incomplete, biased sample of potentially available studies; or if they use results from studies that are themselves methodologically weak and very susceptible to bias. To avoid these sources of error, authors of systematic reviews should (a) ask a sensible question; (b) conduct a detailed and exhaustive search for relevant studies; (c) if possible, focus on studies of high methodologic quality; and (d) use reproducible approaches to assess the limitations in the methodologic quality of the studies on which they focus (2 ). Brown et al. (1 ) asked a narrowly focused and sensible question and translated their review question into appropriate eligibility criteria. Using procedures to minimize bias (e.g., use of two reviewers working independently), they applied these criteria to studies identified through a systematic search for published and unpublished evidence. These researchers also assessed eligible articles for the extent to which they included safeguards against two threats to validity: spectrum of disease and verification of test results with the reference standard. As we review below, empirical evidence suggests that these two criteria constitute important markers of bias in diagnostic studies. Clinicians are rarely interested in the ability of a test to sort out definitely ill patients from apparently healthy volunteers. Studies that choose the severely affected as their target positive population and apparently healthy individuals as the target negative are likely to overestimate the power of the test when used in the right patients. The right patients are those in whom, before obtaining the test results, clinicians were unsure whether the patients did or did not have the target condition. To determine whether the estimates of diagnostic accuracy are unbiased, clinicians should therefore judge whether the population studied really represents a group in which genuine diagnostic uncertainty existed. In an evaluation of 184 studies of diagnostic tests, Lijmer et al. (3 ) quantified the effect of spectrum bias, which arises when clinicians enroll very ill patients and compare their results with those from healthy controls. Studies with inadequate disease spectrum overestimated diagnostic performance threefold [relative diagnostic odds ratio (RDOR) 3.0; 95% confidence interval (CI), 2.0–4.5] relative to those that recruited patients in whom genuine diagnostic uncertainty existed (3 ). Studies of diagnostic test properties can also yield biased estimates when investigators do not conduct, in all patients, a blind comparison of tests results with an independent reference standard. By blind we mean that those judging the results of the reference standard are unaware of the results of the test under evaluation and vice versa. By independent we mean that information from the test under evaluation should not affect the interpretation of the reference standard. Finally, the reference standard should be applied to all patients regardless of the results of the test under evaluation. Lijmer et al. (3 ) found that lack of blinding and verification bias (the error of using a different reference standard depending on the test result) overestimated test performance [RDOR 1.3 (95% CI, 1.0–1.9) and 2.2 (95% CI, 1.5–3.3), respectively]. Brown et al. (1 ) determined the methodologic quality of the included studies, found that all were free from verification bias, that two had enrolled patients with less than ideal spectrum of disease, and that two had not optimally blinded the individual judging the reference standard. In summary, the authors of this systematic review asked a clinically sensible question, conducted a detailed and thorough search, and included studies of high methodologic quality. Thus, clinicians can draw strong inferences from this review. Statistical pooling of results from individual studies, also called metaanalysis, can provide a single best estimate of diagnostic test performance (4 ). Brown et al. (1 ) pooled the sensitivities and specificities from the studies by use of a random-effects model, a statistical technique that yields conservative estimates (i.e., wider confidence intervals) because it takes into account between-study differences. The metaanalysis yielded very precise estimates for sensitivity (93%; 95% CI, 89–96%) and specificity (51%; 95% CI, 42–59%). Clinicians obtain diagnostic tests to lower the probability of the target condition below the testing threshold (i.e., stopping testing for it and eliminating it from further consideration) or to increase this probability above the treatment threshold (i.e., stopping testing for it and initiating appropriate therapy). The likelihood ratio (LR) best captures the direction and magnitude of this change from Editorial

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diverse types of review studies based on their approach to retrieving and summarizing original findings

 Background: We are living in an era in which different branches of science are growing very rapidly. Therefore, retrieving and summarizing all new valid findings on a specific subject is one of the most important priorities of scientists. The aim of the present article is to categorize different review studies within the health domain based on their approach to retrieving and summarizing ...

متن کامل

Diagnostic Utility of miRNAs in Cancer

Cancer is the one of most prevalent and leading causes of death in the world. Current ad­vancements in technology improve the understanding of the pathogenesis and pathology of cancers. But, due to enlarging mortality rates, poor prognosis, and lacunae in clinical early predictive biomarkers provide an important momentum to investigate novel early diagnos­tic/prognostic markers and spec...

متن کامل

Diagnostic Utility of miRNAs in Cancer

Cancer is the one of most prevalent and leading causes of death in the world. Current ad­vancements in technology improve the understanding of the pathogenesis and pathology of cancers. But, due to enlarging mortality rates, poor prognosis, and lacunae in clinical early predictive biomarkers provide an important momentum to investigate novel early diagnos­tic/prognostic markers and spec...

متن کامل

A random-sum Wilcoxon statistic and its application to analysis of ROC and LROC data.

The Wilcoxon-Mann-Whitney statistic is commonly used for a distribution-free comparison of two groups. One requirement for its use is that the sample sizes of the two groups are fixed. This is violated in some of the applications such as medical imaging studies and diagnostic marker studies; in the former, the violation occurs since the number of correctly localized abnormal images is random, w...

متن کامل

Blood-Based Protein Signatures for Early Detection of Colorectal Cancer: A Systematic Review

OBJECTIVES Blood-based proteins might be an attractive option for early detection of colorectal cancer (CRC), but individually they are unlikely to achieve the diagnostic performance required for population based screening. We aimed at summarizing current evidence of diagnostic performance of signatures based on multiple proteins for early detection of CRC. METHODS A systematic literature rev...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Clinical chemistry

دوره 49 11  شماره 

صفحات  -

تاریخ انتشار 2003