Comparing effect estimates of randomized controlled trials and observational studies

Researchers and organizations often use evidence from randomized controlled trials (RCTs) to determine the efficacy of a treatment or intervention under ideal conditions, while studies of observational designs are used to measure the effectiveness of an intervention in non-experimental, 'real world' scenarios. Sometimes, the results of RCTs and observational studies addressing the same question may have different results. This review explores the questions of whether these differences in results are related to the study design itself, or other study characteristics.

This review summarizes the results of methodological reviews that compare the outcomes of observational studies with randomized trials addressing the same question, as well as methodological reviews that compare the outcomes of different types of observational studies.

The main objectives of the review are to assess the impact of study design--to include RCTs versus observational study designs (e.g. cohort versus case-control designs) on the effect measures estimated, and to explore methodological variables that might explain any differences.

We searched multiple electronic databases and reference lists of relevant articles to identify systematic reviews that were designed as methodological reviews to compare quantitative effect size estimates measuring efficacy or effectiveness of interventions of trials with observational studies or different designs of observational studies. We assessed the risks of bias of the included reviews.

Our results provide little evidence for significant effect estimate differences between observational studies and RCTs, regardless of specific observational study design, heterogeneity, inclusion of pharmacological studies, or use of propensity score adjustment. Factors other than study design per se need to be considered when exploring reasons for a lack of agreement between results of RCTs and observational studies.

Authors' conclusions: 

Our results across all reviews (pooled ROR 1.08) are very similar to results reported by similarly conducted reviews. As such, we have reached similar conclusions; on average, there is little evidence for significant effect estimate differences between observational studies and RCTs, regardless of specific observational study design, heterogeneity, or inclusion of studies of pharmacological interventions. Factors other than study design per se need to be considered when exploring reasons for a lack of agreement between results of RCTs and observational studies. Our results underscore that it is important for review authors to consider not only study design, but the level of heterogeneity in meta-analyses of RCTs or observational studies. A better understanding of how these factors influence study effects might yield estimates reflective of true effectiveness.

Read the full abstract...
Background: 

Researchers and organizations often use evidence from randomized controlled trials (RCTs) to determine the efficacy of a treatment or intervention under ideal conditions. Studies of observational designs are often used to measure the effectiveness of an intervention in 'real world' scenarios. Numerous study designs and modifications of existing designs, including both randomized and observational, are used for comparative effectiveness research in an attempt to give an unbiased estimate of whether one treatment is more effective or safer than another for a particular population.

A systematic analysis of study design features, risk of bias, parameter interpretation, and effect size for all types of randomized and non-experimental observational studies is needed to identify specific differences in design types and potential biases. This review summarizes the results of methodological reviews that compare the outcomes of observational studies with randomized trials addressing the same question, as well as methodological reviews that compare the outcomes of different types of observational studies.

Objectives: 

To assess the impact of study design (including RCTs versus observational study designs) on the effect measures estimated.

To explore methodological variables that might explain any differences identified.

To identify gaps in the existing research comparing study designs.

Search strategy: 

We searched seven electronic databases, from January 1990 to December 2013.

Along with MeSH terms and relevant keywords, we used the sensitivity-specificity balanced version of a validated strategy to identify reviews in PubMed, augmented with one term ("review" in article titles) so that it better targeted narrative reviews. No language restrictions were applied.

Selection criteria: 

We examined systematic reviews that were designed as methodological reviews to compare quantitative effect size estimates measuring efficacy or effectiveness of interventions tested in trials with those tested in observational studies. Comparisons included RCTs versus observational studies (including retrospective cohorts, prospective cohorts, case-control designs, and cross-sectional designs). Reviews were not eligible if they compared randomized trials with other studies that had used some form of concurrent allocation.

Data collection and analysis: 

In general, outcome measures included relative risks or rate ratios (RR), odds ratios (OR), hazard ratios (HR). Using results from observational studies as the reference group, we examined the published estimates to see whether there was a relative larger or smaller effect in the ratio of odds ratios (ROR).

Within each identified review, if an estimate comparing results from observational studies with RCTs was not provided, we pooled the estimates for observational studies and RCTs. Then, we estimated the ratio of ratios (risk ratio or odds ratio) for each identified review using observational studies as the reference category. Across all reviews, we synthesized these ratios to get a pooled ROR comparing results from RCTs with results from observational studies.

Main results: 

Our initial search yielded 4406 unique references. Fifteen reviews met our inclusion criteria; 14 of which were included in the quantitative analysis.

The included reviews analyzed data from 1583 meta-analyses that covered 228 different medical conditions. The mean number of included studies per paper was 178 (range 19 to 530).

Eleven (73%) reviews had low risk of bias for explicit criteria for study selection, nine (60%) were low risk of bias for investigators' agreement for study selection, five (33%) included a complete sample of studies, seven (47%) assessed the risk of bias of their included studies,

Seven (47%) reviews controlled for methodological differences between studies,

Eight (53%) reviews controlled for heterogeneity among studies, nine (60%) analyzed similar outcome measures, and four (27%) were judged to be at low risk of reporting bias.

Our primary quantitative analysis, including 14 reviews, showed that the pooled ROR comparing effects from RCTs with effects from observational studies was 1.08 (95% confidence interval (CI) 0.96 to 1.22). Of 14 reviews included in this analysis, 11 (79%) found no significant difference between observational studies and RCTs. One review suggested observational studies had larger effects of interest, and two reviews suggested observational studies had smaller effects of interest.

Similar to the effect across all included reviews, effects from reviews comparing RCTs with cohort studies had a pooled ROR of 1.04 (95% CI 0.89 to 1.21), with substantial heterogeneity (I2 = 68%). Three reviews compared effects of RCTs and case-control designs (pooled ROR: 1.11 (95% CI 0.91 to 1.35)).

No significant difference in point estimates across heterogeneity, pharmacological intervention, or propensity score adjustment subgroups were noted. No reviews had compared RCTs with observational studies that used two of the most common causal inference methods, instrumental variables and marginal structural models.