Meta-analysis

This chapter assumes you have already decided that a quantitative pooled summary is appropriate for at least one outcome—see the Evidence synthesis chapter for when meta-analysis is justified. What follows is a concise tour of the mechanics: effect sizes, heterogeneity, model choice, and how you present and scrutinize results.

Phase 1 — Building blocks: effect sizes

Before you can pool results, studies need to be on a common scale. If one study reports a mean difference and another reports Cohen’s d, you face a “Tower of Babel” problem: the numbers are not yet commensurable (Guilera et al., 2022). Harmonization usually means converting or back-calculating to a standard effect metric, ideally with documented formulas and sensitivity checks.

Continuous outcomes

Use a standardized mean difference (SMD) or Hedges’ g when instruments or scales differ across trials (for example different depression questionnaires). That puts mean changes on a comparable footing when raw units are not the same.

Binary outcomes

Use odds ratios (OR) or relative risks (RR), consistent with your question, study designs, and reporting standards. Be explicit about which contrast you modeled (e.g. event vs. non-event definitions).

Phase 2 — Signal vs. noise: heterogeneity

Meta-analysis is not only about the average effect; it is about whether averaging is coherent. Heterogeneity describes how much effect sizes scatter beyond what sampling error alone would predict. A large pooled estimate means little if studies are effectively answering different questions.

Cochran’s Q

Tests whether there is more between-study dispersion than expected by chance under a fixed-effect framing. It is useful but insensitive when there are few studies or low power.

I² statistic

Describes approximately what fraction of total variation in point estimates is due to heterogeneity rather than within-study error (not a measure of how far effects differ on the outcome scale).

I² = ((Q − df) / Q) × 100%

Interpretation (rule-of-thumb): Values of I² from about 0% to 40% are often treated as low heterogeneity, while values above roughly 75% are often described as considerable (Higgins et al., 2003). Treat these thresholds as guidance, not hard rules: clinical and methodological differences between studies matter as much as the number.

Phase 3 — Choosing your model

The pooling model encodes what you believe about how true effects vary across studies—this is one of the most consequential analysis choices in the review.

Model	Assumption	Typical use
Fixed-effect	One shared “true” effect underlies all studies; observed differences reflect sampling error only.	Uncommon in contemporary reviews. Consider only when studies are nearly identical in design, population, and conduct—rare outside specialized settings.
Random-effects	True effects differ across studies (e.g. settings, populations, interventions). The model estimates both within-study and between-study variance (DerSimonian & Laird, 1986; many implementations now use refined estimators of τ²).	Default for most reviews—it reflects that each study estimates its own context-specific effect while still allowing an average (and prediction intervals where appropriate).

Phase 4 — Forest plot and bias detection

The forest plot is the central figure for meta-analysis: each study’s effect and confidence interval is drawn on a common axis, with a diamond (or similar) summarizing the pooled estimate when you combine studies.

Visual inference: For ratios (OR, RR), a null effect is often drawn at 1.0; for mean differences and SMD, at 0. If the pooled diamond does not cross that null line, the summary is conventionally statistically significant at the chosen alpha—always pair that visual with the numeric estimate, CI width, and heterogeneity statistics.

Publication bias and small-study effects: A funnel plot plots effect size against precision; asymmetry can suggest missing small or “negative” studies, though many non-bias mechanisms also distort funnels. Complement plots with formal tests where appropriate (for example Egger’s regression test for funnel asymmetry in some settings) and interpret results cautiously—especially with few studies.