Much Ado About Nothing: Statistical Methods for Meta-analysis with Rare Events

Jonathan Deeks, Michael Bradburn, Warren Bilker, Russell Localio, Jesse Berlin, Centre for Statistics in Medicine, Institute of Health Sciences, Oxford, UK

 

Objective:  To evaluate the performance of standard methods of meta-analysis of binary data when event rates are very low.

Background:  An issue that arises when studying uncommon outcomes in trials is that a substantial proportion of studies may report no events in either the treated group, or the control group, or both.  Zero cells in contingency tables cause problems, both in terms of the validity of the methods when numbers are small, and the practicalities of coping with possible divisions by zero.  Most traditional methods based on ratio measures, such as odds ratios (OR), ignore studies in which no events occur in either group.

Methods:  We conducted a series of statistical simulations in which data for two study groups, with known, low event probabilities, were generated.  Relative risks of 1.0, 0.75 and 0.5 were considered corresponding to varying strengths of treatment benefit.  Meta-analyses of 5 and 20 studies were simulated, the sample sizes being based on the results of real Cochrane reviews, with baseline (control group) event rates of 5%, 1%, 0.5% and 0.1%.  The performance of a selection of the statistical methods available in RevMan (Mantel-Haenszel (MH) OR with RBG variance, Peto OR, D&L random effects OR, and the MH and D&L risk difference methods) together with other methods not currently available in RevMan (Poisson regression models, inverse variance and exact methods) was evaluated. At each combination of relative risk and baseline event rate 10,000 meta-analyses were simulated. Each method was evaluated with respect to bias and statistical power.

Results:  For datasets generated assuming a fixed underlying effect, contrary to current expectations, the Peto method provided the least biased and most powerful method of pooling study results among the methods available in RevMan.  Other methods tended to underestimate treatment effects by between 25% and 35% when event rates were 0.1% and the treatment effect was a RR=0.5. The bias increased with decreasing baseline rates, increasing treatment effects and increasing imbalance in trial group sizes.

Conclusions:  Several of the statistical methods available in RevMan perform very poorly when event rates are low, and tend to underestimate treatment effects. The Peto method may be the best method in these circumstances unless trial group sizes are severely imbalanced, and will provide a good approximation to the relative risk.

Baltimore 1998, Oral 24


 

Half-dead or half-alive?  Which way should events be coded for meta-analyses of risk ratios?

Jonathan J Deeks, ICRF/NHS Centre for Statistics in Medicine, Oxford, United Kingdom

Objective:  Meta-analyses of risk ratios can give different results for the same review if the definitions of the “event”and “non-event” are switched.  This empirical study aimed to investigate the suitability of rules for the coding of “events”and “non-events” in meta-analyses of risk ratios by comparing the consistency of the trial results with the two meta-analytical estimates obtained by switching the selection of the event.

Methods:  Meta-analyses were selected if they were first published on the Cochrane Library between 1998-2000, their first outcome was binary and they pooled results from at least 5 trials.  Interventions were classified as preventative, if the intervention aimed to prevent the patient moving to a worse health state, or therapeutic, if the intervention aimed to move the patient to a better health state.  Meta-analyses of risk ratios of (a) the good event, and (b) the bad event were performed using the Mantel-Haenszel risk ratio method, consistency being measured using the Cochran-Q statistic.  Additional comparisons were also made according to the selection of the event yielding the lowest mean control group event rate.

Results:  114 reviews were included in the analsyis:  69 of preventative interventions, 45 of therapeutic interventions.  Twelve of the preventative interventions showed discordance in the significance of heterogeneity, 11 favoured use of the bad outcome, 1 favoured use of the good outcome (P=0.006).  Fifteen of the therapeutic interventions showed discordance, 8 favoured use of the bad outcome, 7 favoured use of the good outcome P=1.0).  Among the therapeutic trials the use of the outcome giving the lowest mean control group event rate did not significantly improve consistency (P=0.12).

Conclusions:  for preventative interventions, risk ratios of the bad outcome yield the most consistent estimates, supporting the use of Glasziou and Irwig's model for individualisation of treatment effects.  The situation for therapeutic interventions is more complex, no universal preference for either outcome being supported.

Reference:  Glasziou PP, Irwig LM.  An evidence-based approach to individualising treatment.  BMJ 1995; 311: 1356-1359.

Lyon 2001, P-009


 

Agreement between randomized and non-randomized studies – the effects of bias and confounding

David Henry, Annette Moxey, Dianne OConnell

School of Population Health Sciences, Faculty of Medicine and Health sciences, The University of Newcastle, Australia.

Background:  Formal comparisons of the results of randomized and non-randomized studies have led to conflicting conclusions.  The number of potential topics for study is huge, and levels of agreement vary with selection of interventions and clinical settings.

Objective:  To determine the extent to which some important sources of bias affecting randomized trials (RCT) and observational studies (OS) influence the levels of agreement between these designs.

Methods:  We performed systematic reviews of RCTs and OS for 7 intervention/outcome pairs chosen because of the likelihood of bias and confounding.  These were:  interventions to minimize the need for allogeneic blood transfusion (unblended, poor randomisation, subjective outcomes); the impact of laparoscopic cholecystectomy (lap chole), compared with open or mini-laparotomy, on post-operative infections and bile duct injury (variable quality trials, operator skill dependent); the impact of antioxidants on death from malignancy and cardiovascular disease (different interventions, healthy cohort effect); and the effects of hormone replacement therapy (HRT) on cardiovascular and overall mortality (healthy cohort effect).  Articles identified through electronic and bibliographic searches were reviewed independently by two raters.  Adjusted and unadjusted RRs were pooled using inverse variance weights, and Metaview 4.1 was used to pool crude RRs.  We assessed qualitative agreement as +/+ when RCTs and OS (respectively) agreed on the direction of a statistically significant effect, -/- when there was agreement on the absence of such an effect, and +/- or -/+ when there was disagreement.  We classified quantitative agreement as 0.1 or 0.2 if pooled estimates of RR differed by no more than these values, and >0.2 if they did.

Results:  The greatest level of agreement between RCTs and OS was seen with blood-sparing techniques (4 comparisons, 59 RCTs and 104 OS) with agreement ranging from +/+ 0.1 to +/+ 0.2.  Antioxidants (4 comparisons, 9 RCTs and 13 OS) gave mixed results with agreement graded as -/- >0.2  and -/- 0.2 for CVD mortality with beta-carotene and lung cancer with Vit E, and -/+ >0.2 for CVD mortality with Vit E and lung cancer with beta-carotene.  For HRT (2 comparisons, 3 RCTs and 15 OS) agreement was poor for all cause mortality and CVD mortality (-/+ >0.2 for both).  Likewise, the agreement was poor for lap chole (2 comparisons, 11 RCTs, 29 OS): -/+ >0.2 for both outcomes.

Conclusions:  In this series RCTs and OS studies agreed when the RCTs were of poor quality and evaluated subjective outcome measures.  The high quality surgical trials had results closer to the bull, but were of insufficient size to quantify adverse effects.  Healthy cohort effects probably explain the discrepant findings between RCTs and OS of antioxidants and HRT.  To some extent discrepancies between the results of randomized and non-randomized studies can be anticipated from a knowledge of likely sources of bias and confounding.  However, agreement between the study designs may indicate that they are both giving inaccurate results.

Lyon 2001, P-013

Empirical evidence of bias?  The hazard of ignoring heterogeneity in meta-epidemiology

Jonathan AC Sterne, Christopher Bartlett, Peter Jüni, Matthias Egger

MRC Health Services Research Collaboration, Department of Social Medicine, University of Bristol, United Kingdom

Objective:  Bias in meta-analysis may be examined by considering collections of meta-analyses in which component trials are classified according to characteristics such as study quality.  The landmark study by Schulz et al (JMA 1995) demonstrated, for 250 trials included in 33 meta-analyses, that inadequate concealment of randomisation yielded exaggerated estimates of treatment effect.  Schulz et al assumed that the effect of bias is constant across meta-analyses but heterogeneity within and between meta-analyses may affect results.  We examined different methods for assessing the influence of trial characteristics on estimated treatment effects.

Methods:  Study of the influence of unpublished trials and trials published in languages other than English on treatment effect estimates from 142 meta-analyses.  The data were analysed using fixed effects and random-effects within and between meta-analyses.  We also used standard logistic regression and logistic regression using the “information sandwich” to estimate robust standard errors (results not shown).  The results are presented as ratios of odds ratios (RORs).  A ROR of 1 indicates that there is no difference in estimated treatment effects between these groups of trials whereas a ratio above 1 indicates, for example, that the unpublished trials showed less beneficial treatment effects than the published trials.

Results:   The table (not given here) shows the effects of publication status and language of trials on treatment effect estimates, using fixed and random effects within and between meta-analyses.

There was clear evidence of between-meta-analysis heterogeneity both for analyses of publication status and language.  Taking into account or ignoring heterogeneity within and between meta-analyses had important effects on overall results.

Conclusions:  Our “meta-meta-analytic” approach provides a natural way to examine the importance of different trial characteristics on treatment effect estimates.  Fixed-effects logistic regression models, which have been used in most previous studies, may not always be appropriate.  Improved understanding of the circumstances in which bias is likely to undermine a review should inform attempts to prevent bias in the future.

Lyon 2001, P-011


Quality assessment of mammography screening trials

Ole Olsen, Peter C Gotzsche

The Nordic Cochrane Centre, Copenhagen, Denmark

Objective:  To assess the quality of mammography screening trials.

Methods:  We first used the Cochrane criteria developed for treatment trials: 1) randomisation methods, 2) exclusions after randomisation, and 3) blinding in outcome assessment; we based our first assessment on the information in the main publications from the trials.  The screening trials, however, turned out to be highly atypical and we therefore carried out a more detailed assessment which was based on more than 200 published papers and on repeated communications with the authors.

Results:  1) Randomisation.  Each of the 7 trials used different randomisation methods, which involved individual and cluster randomisation, various matching criteria, computer randomisation, allocation by date of birth, by order of attendance, or by centralised flip of a coin, varying randomisation ratios by birth year, and with or without concealment of allocation.  Date of randomisation was not always clear and not always identical to the date of entry.  Date of entry was not always defined in the same way in the two trial arms.  2) Exclusions.  The number of randomised women often varied from report to report, and it was often difficult to obtain exact numbers, timings and reasons for exclusions of women.  Usually, more women were missing or had been excluded from the screening arm.  3) Blinding.  Breast cancer mortality was the main outcome in the trials.  However, we found good evidence of biased assessment of cause of death, even when performed by blinded panels of independent researchers.  We therefore consider breast cancer mortality a misleading surrogate outcome measure.  No reduction in total mortality has been reported, relative risk 1.00, 95% confidence interval 0.96-1.05, for the two most reliable trials, and 1.00, 95% confidence interval 0.98-1.02 for the 4 Swedish trials after adjustment for age imbalances that had occurred despite attempts at randomisation.

Conclusions:  The usual quality criteria for treatment trials seem inadequate for mass screening trials.  This is consistent with the observation that “published meta-analyses of screening are often deficient in their reporting of methodology” (Walter & Jadad, Stat Med 1999).  Additional or alternative quality criteria need to be developed for screening trials.

Lyon 2001, O-032


Obtaining Published Errata to Randomised Controlled Trials:  Is it Worth the Effort?

Pamela Royle, Wessex Institute for Health Research and Development, University of Southampton, Southampton, United Kingdom

Objective:  To do a pilot study to determine the frequency and nature of published errata linked to RCTs in the MEDLINE database, and to estimate the proportion that are worthwhile obtaining.

Methods:  MEDLINE (Silver Platter) was searched from 1995-2001/06 for records that had both ‘randomized-controlled-trial’ in the publication type field and ‘erratum’ in the comments field.  (When a citable correction is published for an article that has a citation in MEDLINE, a reference to the published correction is added.)  Records from four journals (Lancet, BMJ, New England Journal of Medicine and JAMA) were downloaded   , and 100 were randomly selected.  The full articles and errata of the 100 records were examined by the author and assigned to one of three categories:  1. Yes –worthwhile acquiring, 2. Maybe – possibly worthwhile, 3. No – not worthwhile.

Also investigated were the number of errors mentioned per published erratum, the number of citations to the RCT and their erratum in the Science Citation Index, the time between publication RTCs and errata, and indexing of the errata in the Cochrane Controlled Trials Register (CCTR).

Results:  The 666 errata to RCTs were published in 295 different journals.  130 errata (19.5%) were published in just four journals: Lancet, BMJ, NEJM and JAMA.  The average percentage of RCTs assigned errata was 1.2% over all journals, but 7.8% for these four journals.  All RCTs and errata were in CCTR, but 77% of RCTs had one or more duplicate records in CCTR without the errata information.  The Lancet, BMJ, NEJM and JAMA all provided free electronic access to errata, but many other journals did not.

The classification of errata gave:

1.     Yes = 74%.  Mostly in tables or figures.

2.     Maybe = 9%.  Mostly in the introduction or discussion.

3.  No = 17%.  Mostly errors in authorship.

Table 1.  Characteristics of RCTs and errata (not shown here)

Conclusions:  Errata were published for about 8% of RCTs in four major general medical journals, and most (74%) appeared to contain information important enough to be worthwhile obtaining.  However, as most errata were never cited, this suggested they were largely ignored.  To increase their visibility and access, it is recommended all journals provide free electronic access to errata and that the duplicate RCTs without errata information be removed from CCTR.  This should facilitate access to more complete and accurate data for those using RCTs.

Stavanger 2002, P6


Quality Assessment in Cochrane Reviews:  Do We Practice What We Preach?

Telaro Elena, D’Amico Roberto, Moja Pasquale, Battaglia Alessandro, Bianco Elvira, Calderan Alessandro, Colli Agostino, Di Pietrantonj Carlo, Ferri Marica, Fraquelli Mirella, Girolami Bruno, Marchioni Enrico, Mezza Elisabetta, Piccoli Giorgina, Vignatelli Luca, Liberati Alessandro (members of “Milano Master Course in Systematic Reviews”)

Italian Cochrane Centre, Milano, Italy

Background:  One of the most important steps in a systematic review (SR) is the critical appraisal of the quality of primary studies.  Many studies have been published on the use of checklists and scales to assess methodological quality.  The Cochrane Collaboration Handbook provides general principles that should be followed by the reviewers in an attempt to assess the quality of the primary studies.  Nevertheless, there is still discussion about some issues such as:

a)    what is the most valid scale/checklist for the quality assessment?

b)    what should be the use of quality assessment in the context of SRs?

Objectives:  The  aim of this project is to review the approaches used in the quality assessment in Cochrane Reviews.

Methods:  A sample of SRs based on 50% of those published on the Cochrane Library Issue 1, 2002, stratified by Collaborative Review Groups (CRGs) and type of intervention (drugs, rehabilitation, prevention/screening, surgery/radiotherapy, intervention, communication/organisational/educational, other) were eligible for this study.  Each CRG module and review has been reviewed by two people independently using two checklists specifically developed for this project.

Results:  Preliminary results based on a sample of 10% of all SRs show that ad hoc developed scales are the most frequently used tools for the assessment of the quality of studies, while 28% of the reviews use either Jadad or Schultz scales.  The items that are most widely considered are: allocation concealment (78%) and completeness of follow-up (64%).  Very often, an operational and reproducible definition of these items is not clearly reported in the methods nor in the methodological quality section.  In 52% of the reviews items that are used do not correspond to items which were stated a priori in the methods section.  64 reviews (55%) do not report how they intend to use the results of the quality assessment and 13% of these will use it either as a criterion for study exclusion or for sensitivity analyses.  In the majority of cases (69%) quality assessment is only reported in the “methodological quality section” and it is not linked to the results of the analyses.

Conclusions:  The results of this preliminary analysis show that despite the attention given to quality assessment of primary studies in the Cochrane Collaboration there is still substantial variation in the way quality of studies is assessed.  Moreover, the way this assessment is eventually used in the analysis and interpretation of the results suffers from several inconsistencies.  The poster will present full results and discuss the association between type of quality assessment and characteristics of the SRs.  It will also address the opportunities for improving the consistency of Cochrane reviews in these important aspects.

Stavanger 2002, P30