Much Ado About Nothing: Statistical Methods for Meta-analysis with Rare
Events
Jonathan Deeks, Michael Bradburn, Warren Bilker, Russell Localio, Jesse
Berlin, Centre for Statistics in Medicine, Institute of Health Sciences,
Oxford, UK
Objective: To evaluate the
performance of standard methods of meta-analysis of binary data when event
rates are very low.
Background: An issue that
arises when studying uncommon outcomes in trials is that a substantial
proportion of studies may report no events in either the treated group, or the
control group, or both. Zero cells in
contingency tables cause problems, both in terms of the validity of the methods
when numbers are small, and the practicalities of coping with possible divisions
by zero. Most traditional methods based
on ratio measures, such as odds ratios (OR), ignore studies in which no events
occur in either group.
Methods: We conducted a
series of statistical simulations in which data for two study groups, with
known, low event probabilities, were generated. Relative risks of 1.0, 0.75 and 0.5 were considered corresponding
to varying strengths of treatment benefit.
Meta-analyses of 5 and 20 studies were simulated, the sample sizes being
based on the results of real Cochrane reviews, with baseline (control group)
event rates of 5%, 1%, 0.5% and 0.1%.
The performance of a selection of the statistical methods available in
RevMan (Mantel-Haenszel (MH) OR with RBG variance, Peto OR, D&L random
effects OR, and the MH and D&L risk difference methods) together with other
methods not currently available in RevMan (Poisson regression models, inverse
variance and exact methods) was evaluated. At each combination of relative risk
and baseline event rate 10,000 meta-analyses were simulated. Each method was
evaluated with respect to bias and statistical power.
Results: For datasets
generated assuming a fixed underlying effect, contrary to current expectations,
the Peto method provided the least biased and most powerful method of pooling
study results among the methods available in RevMan. Other methods tended to underestimate treatment effects by
between 25% and 35% when event rates were 0.1% and the treatment effect was a
RR=0.5. The bias increased with decreasing baseline rates, increasing treatment
effects and increasing imbalance in trial group sizes.
Conclusions: Several of the
statistical methods available in RevMan perform very poorly when event rates
are low, and tend to underestimate treatment effects. The Peto method may be
the best method in these circumstances unless trial group sizes are severely
imbalanced, and will provide a good approximation to the relative risk.
Baltimore 1998, Oral 24
Half-dead or half-alive? Which way
should events be coded for meta-analyses of risk ratios?
Jonathan J Deeks, ICRF/NHS Centre for Statistics in Medicine, Oxford,
United Kingdom
Objective: Meta-analyses of risk
ratios can give different results for the same review if the definitions of the
“event”and “non-event” are switched.
This empirical study aimed to investigate the suitability of rules for
the coding of “events”and “non-events” in meta-analyses of risk ratios by
comparing the consistency of the trial results with the two meta-analytical
estimates obtained by switching the selection of the event.
Methods: Meta-analyses were
selected if they were first published on the Cochrane Library between
1998-2000, their first outcome was binary and they pooled results from at least
5 trials. Interventions were classified
as preventative, if the intervention aimed to prevent the patient moving to a
worse health state, or therapeutic, if the intervention aimed to move the
patient to a better health state.
Meta-analyses of risk ratios of (a) the good event, and (b) the bad
event were performed using the Mantel-Haenszel risk ratio method, consistency
being measured using the Cochran-Q statistic.
Additional comparisons were also made according to the selection of the
event yielding the lowest mean control group event rate.
Results: 114 reviews were included
in the analsyis: 69 of preventative
interventions, 45 of therapeutic interventions. Twelve of the preventative interventions showed discordance in
the significance of heterogeneity, 11 favoured use of the bad outcome, 1
favoured use of the good outcome (P=0.006).
Fifteen of the therapeutic interventions showed discordance, 8 favoured
use of the bad outcome, 7 favoured use of the good outcome P=1.0). Among the therapeutic trials the use of the
outcome giving the lowest mean control group event rate did not significantly
improve consistency (P=0.12).
Conclusions: for preventative
interventions, risk ratios of the bad outcome yield the most consistent
estimates, supporting the use of Glasziou and Irwig's model for
individualisation of treatment effects.
The situation for therapeutic interventions is more complex, no
universal preference for either outcome being supported.
Reference: Glasziou PP, Irwig
LM. An evidence-based approach to
individualising treatment. BMJ 1995;
311: 1356-1359.
Lyon 2001, P-009
Agreement between randomized and non-randomized studies – the effects of
bias and confounding
David Henry, Annette Moxey, Dianne OConnell
School of Population Health Sciences, Faculty of Medicine and Health
sciences, The University of Newcastle, Australia.
Background: Formal comparisons of
the results of randomized and non-randomized studies have led to conflicting
conclusions. The number of potential
topics for study is huge, and levels of agreement vary with selection of interventions
and clinical settings.
Objective: To determine the extent
to which some important sources of bias affecting randomized trials (RCT) and
observational studies (OS) influence the levels of agreement between these
designs.
Methods: We performed systematic
reviews of RCTs and OS for 7 intervention/outcome pairs chosen because of the
likelihood of bias and confounding.
These were: interventions to
minimize the need for allogeneic blood transfusion (unblended, poor
randomisation, subjective outcomes); the impact of laparoscopic cholecystectomy
(lap chole), compared with open or mini-laparotomy, on post-operative
infections and bile duct injury (variable quality trials, operator skill
dependent); the impact of antioxidants on death from malignancy and
cardiovascular disease (different interventions, healthy cohort effect); and
the effects of hormone replacement therapy (HRT) on cardiovascular and overall
mortality (healthy cohort effect).
Articles identified through electronic and bibliographic searches were
reviewed independently by two raters.
Adjusted and unadjusted RRs were pooled using inverse variance weights,
and Metaview 4.1 was used to pool crude RRs.
We assessed qualitative agreement as +/+ when RCTs and OS (respectively)
agreed on the direction of a statistically significant effect, -/- when there
was agreement on the absence of such an effect, and +/- or -/+ when there was
disagreement. We classified
quantitative agreement as 0.1 or 0.2 if pooled estimates of RR differed by no
more than these values, and >0.2 if they did.
Results: The greatest level of
agreement between RCTs and OS was seen with blood-sparing techniques (4
comparisons, 59 RCTs and 104 OS) with agreement ranging from +/+ 0.1 to +/+
0.2. Antioxidants (4 comparisons, 9
RCTs and 13 OS) gave mixed results with agreement graded as -/- >0.2 and -/- 0.2 for CVD mortality with
beta-carotene and lung cancer with Vit E, and -/+ >0.2 for CVD mortality
with Vit E and lung cancer with beta-carotene.
For HRT (2 comparisons, 3 RCTs and 15 OS) agreement was poor for all
cause mortality and CVD mortality (-/+ >0.2 for both). Likewise, the agreement was poor for lap
chole (2 comparisons, 11 RCTs, 29 OS): -/+ >0.2 for both outcomes.
Conclusions: In this series RCTs
and OS studies agreed when the RCTs were of poor quality and evaluated
subjective outcome measures. The high
quality surgical trials had results closer to the bull, but were of
insufficient size to quantify adverse effects.
Healthy cohort effects probably explain the discrepant findings between
RCTs and OS of antioxidants and HRT. To
some extent discrepancies between the results of randomized and non-randomized
studies can be anticipated from a knowledge of likely sources of bias and
confounding. However, agreement between
the study designs may indicate that they are both giving inaccurate results.
Lyon 2001, P-013
Empirical evidence of bias? The
hazard of ignoring heterogeneity in meta-epidemiology
Jonathan AC Sterne, Christopher Bartlett, Peter Jüni, Matthias Egger
MRC Health Services Research Collaboration, Department of Social Medicine,
University of Bristol, United Kingdom
Objective: Bias in meta-analysis
may be examined by considering collections of meta-analyses in which component
trials are classified according to characteristics such as study quality. The landmark study by Schulz et al (JMA
1995) demonstrated, for 250 trials included in 33 meta-analyses, that
inadequate concealment of randomisation yielded exaggerated estimates of
treatment effect. Schulz et al assumed
that the effect of bias is constant across meta-analyses but heterogeneity
within and between meta-analyses may affect results. We examined different methods for assessing the influence of
trial characteristics on estimated treatment effects.
Methods: Study of the influence of
unpublished trials and trials published in languages other than English on
treatment effect estimates from 142 meta-analyses. The data were analysed using fixed effects and random-effects
within and between meta-analyses. We
also used standard logistic regression and logistic regression using the
“information sandwich” to estimate robust standard errors (results not
shown). The results are presented as
ratios of odds ratios (RORs). A ROR of
1 indicates that there is no difference in estimated treatment effects between
these groups of trials whereas a ratio above 1 indicates, for example, that the
unpublished trials showed less beneficial treatment effects than the published
trials.
Results: The table (not given
here) shows the effects of publication status and language of trials on
treatment effect estimates, using fixed and random effects within and between
meta-analyses.
There was clear evidence of between-meta-analysis heterogeneity both for
analyses of publication status and language.
Taking into account or ignoring heterogeneity within and between
meta-analyses had important effects on overall results.
Conclusions: Our
“meta-meta-analytic” approach provides a natural way to examine the importance
of different trial characteristics on treatment effect estimates. Fixed-effects logistic regression models,
which have been used in most previous studies, may not always be
appropriate. Improved understanding of
the circumstances in which bias is likely to undermine a review should inform
attempts to prevent bias in the future.
Lyon 2001, P-011
Quality assessment
of mammography screening trials
Ole Olsen, Peter C Gotzsche
The Nordic Cochrane Centre, Copenhagen, Denmark
Objective: To assess the quality of
mammography screening trials.
Methods: We first used the Cochrane
criteria developed for treatment trials: 1) randomisation methods, 2)
exclusions after randomisation, and 3) blinding in outcome assessment; we based
our first assessment on the information in the main publications from the
trials. The screening trials, however,
turned out to be highly atypical and we therefore carried out a more detailed
assessment which was based on more than 200 published papers and on repeated
communications with the authors.
Results: 1) Randomisation. Each of the 7 trials used different
randomisation methods, which involved individual and cluster randomisation,
various matching criteria, computer randomisation, allocation by date of birth,
by order of attendance, or by centralised flip of a coin, varying randomisation
ratios by birth year, and with or without concealment of allocation. Date of randomisation was not always clear
and not always identical to the date of entry.
Date of entry was not always defined in the same way in the two trial
arms. 2) Exclusions. The number of randomised women often varied
from report to report, and it was often difficult to obtain exact numbers,
timings and reasons for exclusions of women.
Usually, more women were missing or had been excluded from the screening
arm. 3) Blinding. Breast cancer mortality was the main outcome
in the trials. However, we found good
evidence of biased assessment of cause of death, even when performed by blinded
panels of independent researchers. We
therefore consider breast cancer mortality a misleading surrogate outcome
measure. No reduction in total
mortality has been reported, relative risk 1.00, 95% confidence interval
0.96-1.05, for the two most reliable trials, and 1.00, 95% confidence interval
0.98-1.02 for the 4 Swedish trials after adjustment for age imbalances that had
occurred despite attempts at randomisation.
Conclusions: The usual quality
criteria for treatment trials seem inadequate for mass screening trials. This is consistent with the observation that
“published meta-analyses of screening are often deficient in their reporting of
methodology” (Walter & Jadad, Stat Med 1999). Additional or alternative quality criteria need to be developed
for screening trials.
Lyon 2001, O-032
Obtaining Published Errata to Randomised Controlled Trials: Is it Worth the Effort?
Pamela Royle, Wessex Institute for Health Research and Development,
University of Southampton, Southampton, United Kingdom
Objective: To do a pilot study to
determine the frequency and nature of published errata linked to RCTs in the
MEDLINE database, and to estimate the proportion that are worthwhile obtaining.
Methods: MEDLINE (Silver Platter)
was searched from 1995-2001/06 for records that had both
‘randomized-controlled-trial’ in the publication type field and ‘erratum’ in
the comments field. (When a citable
correction is published for an article that has a citation in MEDLINE, a
reference to the published correction is added.) Records from four journals (Lancet, BMJ, New England Journal of
Medicine and JAMA) were downloaded ,
and 100 were randomly selected. The
full articles and errata of the 100 records were examined by the author and
assigned to one of three categories: 1.
Yes –worthwhile acquiring, 2. Maybe – possibly worthwhile, 3. No – not
worthwhile.
Also investigated were the number of errors mentioned per published
erratum, the number of citations to the RCT and their erratum in the Science
Citation Index, the time between publication RTCs and errata, and indexing of
the errata in the Cochrane Controlled Trials Register (CCTR).
Results: The 666 errata to RCTs
were published in 295 different journals.
130 errata (19.5%) were published in just four journals: Lancet, BMJ,
NEJM and JAMA. The average percentage
of RCTs assigned errata was 1.2% over all journals, but 7.8% for these four
journals. All RCTs and errata were in
CCTR, but 77% of RCTs had one or more duplicate records in CCTR without the
errata information. The Lancet, BMJ,
NEJM and JAMA all provided free electronic access to errata, but many other
journals did not.
The classification of errata gave:
1.
Yes = 74%. Mostly in
tables or figures.
2.
Maybe = 9%. Mostly in
the introduction or discussion.
3. No =
17%. Mostly errors in authorship.
Table 1.
Characteristics of RCTs and errata (not shown here)
Conclusions:
Errata were published for about 8% of RCTs in four major general medical
journals, and most (74%) appeared to contain information important enough to be
worthwhile obtaining. However, as most
errata were never cited, this suggested they were largely ignored. To increase their visibility and access, it
is recommended all journals provide free electronic access to errata and that
the duplicate RCTs without errata information be removed from CCTR. This should facilitate access to more
complete and accurate data for those using RCTs.
Stavanger 2002, P6
Quality Assessment in Cochrane Reviews:
Do We Practice What We Preach?
Telaro Elena, D’Amico Roberto, Moja Pasquale, Battaglia Alessandro, Bianco
Elvira, Calderan Alessandro, Colli Agostino, Di Pietrantonj Carlo, Ferri
Marica, Fraquelli Mirella, Girolami Bruno, Marchioni Enrico, Mezza Elisabetta,
Piccoli Giorgina, Vignatelli Luca, Liberati Alessandro (members of “Milano
Master Course in Systematic Reviews”)
Italian Cochrane Centre, Milano, Italy
Background: One of the most
important steps in a systematic review (SR) is the critical appraisal of the
quality of primary studies. Many
studies have been published on the use of checklists and scales to assess
methodological quality. The Cochrane
Collaboration Handbook provides general principles that should be followed by
the reviewers in an attempt to assess the quality of the primary studies. Nevertheless, there is still discussion
about some issues such as:
a)
what is the most valid scale/checklist for the quality assessment?
b)
what should be the use of quality assessment in the context of
SRs?
Objectives: The aim of this project is to review the
approaches used in the quality assessment in Cochrane Reviews.
Methods: A sample of SRs based on
50% of those published on the Cochrane Library Issue 1, 2002, stratified by
Collaborative Review Groups (CRGs) and type of intervention (drugs,
rehabilitation, prevention/screening, surgery/radiotherapy, intervention,
communication/organisational/educational, other) were eligible for this
study. Each CRG module and review has
been reviewed by two people independently using two checklists specifically
developed for this project.
Results: Preliminary results based
on a sample of 10% of all SRs show that ad hoc developed scales are the most
frequently used tools for the assessment of the quality of studies, while 28%
of the reviews use either Jadad or Schultz scales. The items that are most widely considered are: allocation
concealment (78%) and completeness of follow-up (64%). Very often, an operational and reproducible
definition of these items is not clearly reported in the methods nor in the
methodological quality section. In 52%
of the reviews items that are used do not correspond to items which were stated
a priori in the methods section. 64
reviews (55%) do not report how they intend to use the results of the quality
assessment and 13% of these will use it either as a criterion for study
exclusion or for sensitivity analyses.
In the majority of cases (69%) quality assessment is only reported in
the “methodological quality section” and it is not linked to the results of the
analyses.
Conclusions: The results of this
preliminary analysis show that despite the attention given to quality
assessment of primary studies in the Cochrane Collaboration there is still
substantial variation in the way quality of studies is assessed. Moreover, the way this assessment is
eventually used in the analysis and interpretation of the results suffers from
several inconsistencies. The poster will
present full results and discuss the association between type of quality
assessment and characteristics of the SRs.
It will also address the opportunities for improving the consistency of
Cochrane reviews in these important aspects.
Stavanger 2002, P30