HIPPOKRATIA 2010, 14 (Suppl 1) 2�
PASCHOS KAHIPPOKRATIA 2010, 14 (Suppl 1): 2�-��
Meta-analysis in medical research Haidich AB Department of Hygiene and Epidemiology, Aristotle University of Thessaloniki School of Medicine, Thessaloniki, Greece
Abstract The objectives of this paper are to provide an introduction to meta-analysis and to discuss the rationale for this type of research and other general considerations. Methods used to produce a rigorous meta-analysis are highlighted and some aspects of presentation and interpretation of meta-analysis are discussed. Meta-analysis is a quantitative, formal, epidemiological study design used to systematically assess previous research studies to derive conclusions about that body of research. Outcomes from a meta-analysis may include a more precise estimate of the effect of treatment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. The examination of variability or heterogeneity in study results is also a critical outcome. The ben- efits of meta-analysis include a consolidated and quantitative review of a large, and often complex, sometimes apparently conflicting, body of literature. The specification of the outcome and hypotheses that are tested is critical to the conduct of meta-analyses, as is a sensitive literature search. A failure to identify the majority of existing studies can lead to er- roneous conclusions; however, there are methods of examining data to identify the potential for studies to be missing; for example, by the use of funnel plots. Rigorously conducted meta-analyses are useful tools in evidence-based medicine. The need to integrate findings from many studies ensures that meta-analytic research is desirable and the large body of research now generated makes the conduct of this research feasible. Hippokratia 2010; 14 (Suppl 1): 29-37
Key words: meta-analysis, systematic review, randomized clinical trial, bias, quality, evidence-based medicine
Corresponding author: Anna-Bettina Haidich, Department of Hygiene and Epidemiology Aristotle University of Thessaloniki, School of Medicine, 54124 Thessaloniki, Greece, Tel: +302310-999143, Fax: +302310-999701, e-mail:email@example.com
Important medical questions are typically studied more than once, often by different research teams in dif- ferent locations. In many instances, the results of these multiple small studies of an issue are diverse and con- flicting, which makes the clinical decision-making dif- ficult. The need to arrive at decisions affecting clinical practise fostered the momentum toward “evidence-based medicine”1-2. Evidence-based medicine may be defined as the systematic, quantitative, preferentially experimen- tal approach to obtaining and using medical information. Therefore, meta-analysis, a statistical procedure that in- tegrates the results of several independent studies, plays a central role in evidence-based medicine. In fact, in the hierarchy of evidence (Figure 1), where clinical evidence is ranked according to the strength of the freedom from various biases that beset medical research, meta-analy- ses are in the top. In contrast, animal research, laboratory studies, case series and case reports have little clinical value as proof, hence being in the bottom.
Meta-analysis did not begin to appear regularly in the medical literature until the late 1970s but since then a plethora of meta-analyses have emerged and the growth is exponential over time (Figure 2)3. Moreover, it has been shown that meta-analyses are the most frequently cited form of clinical research4. The merits and perils of the somewhat mysterious procedure of meta-analysis, however, continue to be debated in the medical com-
munity5-8. The objectives of this paper are to introduce meta-analysis and to discuss the rationale for this type of research and other general considerations.
Meta-Analysis and Systematic Review Glass first defined meta-analysis in the social science
literature as “The statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings”9. Meta-analysis is a quanti- tative, formal, epidemiological study design used to sys-
Figure 1: Hierarchy of evidence.
tematically assess the results of previous research to de- rive conclusions about that body of research. Typically, but not necessarily, the study is based on randomized, controlled clinical trials. Outcomes from a meta-analysis may include a more precise estimate of the effect of treat- ment or risk factor for disease, or other outcomes, than any individual study contributing to the pooled analysis. Identifying sources of variation in responses; that is, ex- amining heterogeneity of a group of studies, and general- izability of responses can lead to more effective treatments or modifications of management. Examination of hetero- geneity is perhaps the most important task in meta-analy- sis. The Cochrane collaboration has been a long-standing, rigorous, and innovative leader in developing methods in the field10. Major contributions include the development of protocols that provide structure for literature search methods, and new and extended analytic and diagnostic methods for evaluating the output of meta-analyses. Use of the methods outlined in the handbook should provide a consistent approach to the conduct of meta-analysis. Moreover, a useful guide to improve reporting of system- atic reviews and meta-analyses is the PRISMA (Preferred Reporting Items for Systematic reviews and Meta-analy- ses) statement that replaced the QUOROM (QUality Of Reporting of Meta-analyses) statement11-13.
Meta-analyses are a subset of systematic review. A systematic review attempts to collate empirical evidence that fits prespecified eligibility criteria to answer a specif- ic research question. The key characteristics of a system- atic review are a clearly stated set of objectives with pre- defined eligibility criteria for studies; an explicit, repro- ducible methodology; a systematic search that attempts to identify all studies that meet the eligibility criteria; an assessment of the validity of the findings of the included studies (e.g., through the assessment of risk of bias); and a systematic presentation and synthesis of the attributes and findings from the studies used. Systematic methods
are used to minimize bias, thus providing more reliable findings from which conclusions can be drawn and de- cisions made than traditional review methods14,15. Sys- tematic reviews need not contain a meta-analysis—there are times when it is not appropriate or possible; however, many systematic reviews contain meta-analyses16.
The inclusion of observational medical studies in meta-analyses led to considerable debate over the valid- ity of meta-analytical approaches, as there was necessar- ily a concern that the observational studies were likely to be subject to unidentified sources of confounding and risk modification17. Pooling such findings may not lead to more certain outcomes. Moreover, an empirical study showed that in meta-analyses were both randomized and non-randomized was included, nonrandomized studies tended to show larger treatment effects18.
Meta-analyses are conducted to assess the strength of evidence present on a disease and treatment. One aim is to determine whether an effect exists; another aim is to determine whether the effect is positive or negative and, ideally, to obtain a single summary estimate of the effect. The results of a meta-analysis can improve precision of estimates of effect, answer questions not posed by the in- dividual studies, settle controversies arising from appar- ently conflicting studies, and generate new hypotheses. In particular, the examination of heterogeneity is vital to the development of new hypotheses.
Individual or Aggregated Data The majority of meta-analyses are based on a series
of studies to produce a point estimate of an effect and measures of the precision of that estimate. However, methods have been developed for the meta-analyses to be conducted on data obtained from original trials19,20. This approach may be considered the “gold standard” in meta- analysis because it offers advantages over analyses using aggregated data, including a greater ability to validate the quality of data and to conduct appropriate statistical anal- ysis. Further, it is easier to explore differences in effect across subgroups within the study population than with aggregated data. The use of standardized individual-level information may help to avoid the problems encountered in meta-analyses of prognostic factors21,22. It is the best way to obtain a more global picture of the natural his- tory and predictors of risk for major outcomes, such as in scleroderma23-26.This approach relies on cooperation between researchers who conducted the relevant studies. Researchers who are aware of the potential to contribute or conduct these studies will provide and obtain addition- al benefits by careful maintenance of original databases and making these available for future studies.
Literature Search A sound meta-analysis is characterized by a thor-
ough and disciplined literature search. A clear definition of hypotheses to be investigated provides the framework for such an investigation. According to the PRISMA statement, an explicit statement of questions being ad-
Figure 2: Cumulative number of publications about meta- analysis over time, until 17 December 2009 (results from Medline search using text “meta-analysis”).
HIPPOKRATIA 2010, 14 (Suppl 1) �1
dressed with reference to participants, interventions, com- parisons, outcomes and study design (PICOS) should be provided11,12. It is important to obtain all relevant studies, because loss of studies can lead to bias in the study. Typi- cally, published papers and abstracts are identified by a computerized literature search of electronic databases that can include PubMed (www.ncbi.nlm.nih.gov./en- trez/query.fcgi), ScienceDirect (www.sciencedirect.com), Scirus (www.scirus.com/srsapp ), ISI Web of Knowledge (http://www.isiwebofknowledge.com), Google Scholar (http://scholar.google.com) and CENTRAL (Cochrane Central Register of Controlled Trials, http://www.mrw. interscience.wiley.com/cochrane/cochrane_clcentral_ar- ticles_fs.htm). PRISMA statement recommends that a full electronic search strategy for at least one major database to be presented12. Database searches should be augmented with hand searches of library resources for relevant pa- pers, books, abstracts, and conference proceedings. Cross- checking of references, citations in review papers, and communication with scientists who have been working in the relevant field are important methods used to provide a comprehensive search. Communication with pharmaceu- tical companies manufacturing and distributing test prod- ucts can be appropriate for studies examining the use of pharmaceutical interventions.
It is not feasible to find absolutely every relevant study on a subject. Some or even many studies may not be published, and those that are might not be indexed in computer-searchable databases. Useful sources for un- published trials are the clinical trials registers, such as the National Library of Medicine’s ClinicalTrials.gov Web- site. The reviews should attempt to be sensitive; that is, find as many studies as possible, to minimize bias and be efficient. It may be appropriate to frame a hypothesis that considers the time over which a study is conducted or to target a particular subpopulation. The decision whether to include unpublished studies is difficult. Although lan- guage of publication can provide a difficulty, it is impor- tant to overcome this difficulty, provided that the popula- tions studied are relevant to the hypothesis being tested.
Inclusion or Exclusion Criteria and Potential for Bias Studies are chosen for meta-analysis based on inclu-
sion criteria. If there is more than one hypothesis to be tested, separate selection criteria should be defined for each hypothesis. Inclusion criteria are ideally defined at the stage of initial development of the study protocol. The rationale for the criteria for study selection used should be clearly stated.
One important potential source of bias in meta-analy- sis is the loss of trials and subjects. Ideally, all random- ized subjects in all studies satisfy all of the trial selection criteria, comply with all the trial procedures, and provide complete data. Under these conditions, an “intention-to- treat” analysis is straightforward to implement; that is, statistical analysis is conducted on all subjects that are enrolled in a study rather than those that complete all stages of study considered desirable. Some empirical
studies had shown that certain methodological character- istics, such as poor concealment of treatment allocation or no blinding in studies exaggerate treatment effects27. Therefore, it is important to critically appraise the quality of studies in order to assess the risk of bias.
The study design, including details of the method of randomization of subjects to treatment groups, criteria for eligibility in the study, blinding, method of assess- ing the outcome, and handling of protocol deviations are important features defining study quality. When studies are excluded from a meta-analysis, reasons for exclusion should be provided for each excluded study. Usually, more than one assessor decides independently which studies to include or exclude, together with a well-defined checklist and a procedure that is followed when the assessors dis- agree. Two people familiar with the study topic perform the quality assessment for each study, independently. This is followed by a consensus meeting to discuss the studies excluded or included. Practically, the blinding of reviewers from details of a study such as authorship and journal source is difficult.
Before assessing study quality, a quality assessment protocol and data forms should be developed. The goal of this process is to reduce the risk of bias in the estimate of effect. Quality scores that summarize multiple compo- nents into a single number exist but are misleading and unhelpful28. Rather, investigators should use individual components of quality assessment and describe trials that do not meet the specified quality standards and probably assess the effect on the overall results by excluding them, as part of the sensitivity analyses.
Further, not all studies are completed, because of pro- tocol failure, treatment failure, or other factors. Nonethe- less, missing subjects and studies can provide important evidence. It is desirable to obtain data from all relevant randomized trials, so that the most appropriate analysis can be undertaken. Previous studies have discussed the significance of missing trials to the interpretation of in- tervention studies in medicine29,30. Journal editors and reviewers need to be aware of the existing bias toward publishing positive findings and ensure that papers that publish negative or even failed trials be published, as long as these meet the quality guidelines for publication.
There are occasions when authors of the selected pa- pers have chosen different outcome criteria for their main analysis. In practice, it may be necessary to revise the inclusion criteria for a meta-analysis after reviewing all of the studies found through the search strategy. Varia- tion in studies reflects the type of study design used, type and application of experimental and control therapies, whether or not the study was published, and, if published, subjected to peer review, and the definition used for the outcome of interest. There are no standardized criteria for inclusion of studies in meta-analysis. Universal criteria are not appropriate, however, because meta-analysis can be applied to a broad spectrum of topics. Published data in journal papers should also be cross-checked with con- ference papers to avoid repetition in presented data.
Clearly, unpublished studies are not found by search- ing the literature. It is possible that published studies are systemically different from unpublished studies; for ex- ample, positive trial findings may be more likely to be published. Therefore, a meta-analysis based on literature search results alone may lead to publication bias.
Efforts to minimize this potential bias include work- ing from the references in published studies, searching computerized databases of unpublished material, and in- vestigating other sources of information including con- ference proceedings, graduate dissertations and clinical trial registers.
Statistical analysis The most common measures of effect used for dichot-
omous data are the risk ratio (also called relative risk) and the odds ratio. The dominant method used for continuous data are standardized mean difference (SMD) estimation. Methods used in meta-analysis for post hoc analysis of findings are relatively specific to meta-analysis and in- clude heterogeneity analysis, sensitivity analysis, and evaluation of publication bias.
All methods used should allow for the weighting of studies. The concept of weighting reflects the value of the evidence of any particular study. Usually, studies are weighted according to the inverse of their variance31. It is important to recognize that smaller studies, therefore, usually contribute less to the estimates of overall effect. However, well-conducted studies with tight control of measurement variation and sources of confounding con- tribute more to estimates of overall effect than a study of identical size less well conducted.
One of the foremost decisions to be made when conducting a meta-analysis is whether to use a fixed-ef- fects or a random-effects model. A fixed-effects model is based on the assumption that the sole source of variation in observed outcomes is that occurring within the study; that is, the effect expected from each study is the same. Consequently, it is assumed that the models are homoge- neous; there are no differences in the underlying study population, no differences in subject selection criteria, and treatments are applied the same way32. Fixed-effect methods used for dichotomous data include most often the Mantel-Haenzel method33 and the Peto method 34(only for odds ratios).
Random-effects models have an underlying assump- tion that a distribution of effects exists, resulting in het- erogeneity among study results, known as τ2. Conse- quently, as software has improved, random-effects mod- els that require greater computing power have become more frequently conducted. This is desirable because the strong assumption that the effect of interest is the same in all studies is frequently untenable. Moreover, the fixed effects model is not appropriate when statistical het- erogeneity (τ2) is present in the results of studies in the meta-analysis. In the random-effects model, studies are weighted with the inverse of their variance and the het- erogeneity parameter. Therefore, it is usually a more con-
servative approach with wider confidence intervals than the fixed-effects model where the studies are weighted only with the inverse of their variance. The most com- monly used random-effects method is the DerSimonian and Laird method35. Furthermore, it is suggested that comparing the fixed-effects and random-effect models developed as this process can yield insights to the data36.
Heterogeneity Arguably, the greatest benefit of conducting meta-
analysis is to examine sources of heterogeneity, if pres- ent, among studies. If heterogeneity is present, the sum- mary measure must be interpreted with caution 37. When heterogeneity is present, one should question whether and how to generalize the results. Understanding sources of heterogeneity will lead to more effective targeting of prevention and treatment strategies and will result in new research topics being identified. Part of the strategy in conducting a meta-analysis is to identify factors that may be significant determinants of subpopulation analysis or covariates that may be appropriate to explore in all stud- ies.
To understand the nature of variability in studies, it is important to distinguish between different sources of het- erogeneity. Variability in the participants, interventions, and outcomes studied has been described as clinical di- versity, and variability in study design and risk of bias has been described as methodological diversity10. Variability in the intervention effects being evaluated among the dif- ferent studies is known as statistical heterogeneity and is a consequence of clinical or methodological diversity, or both, among the studies. Statistical heterogeneity mani- fests itself in the observed intervention effects varying by more than the differences expected among studies that would be attributable to random error alone. Usually, in the literature, statistical heterogeneity is simply referred to as heterogeneity.
Clinical variation will cause heterogeneity if the inter- vention effect is modified by the factors that vary across studies; most obviously, the specific interventions or par- ticipant characteristics that are often reflected in different levels of risk in the control group when the outcome is dichotomous. In other words, the true intervention effect will differ for different studies. Differences between stud- ies in terms of methods used, such as use of blinding or differences between studies in the definition or measure- ment of outcomes, may lead to differences in observed effects. Significant statistical heterogeneity arising from differences in methods used or differences in outcome as- sessments suggests that the studies are not all estimating the same effect, but does not necessarily suggest that the true intervention effect varies. In particular, heterogene- ity associated solely with methodological diversity indi- cates that studies suffer from different degrees of bias. Empirical evidence suggests that some aspects of design can affect the result of clinical trials, although this may not always be the case.
The scope of a meta-analysis will largely determine
HIPPOKRATIA 2010, 14 (Suppl 1) ��
the extent to which studies included in a review are di- verse. Meta-analysis should be conducted when a group of studies is sufficiently homogeneous in terms of sub- jects involved, interventions, and outcomes to provide a meaningful summary. However, it is often appropriate to take a broader perspective in a meta-analysis than in a single clinical trial. Combining studies that differ sub- stantially in design and other factors can yield a mean- ingless summary result, but the evaluation of reasons for the heterogeneity among studies can be very insightful. It may be argued that these studies are of intrinsic interest on their own, even though it is not appropriate to produce a single summary estimate of effect.
Variation among k trials is usually assessed using Cochran’s Q statistic, a chi-squared (χ2) test of heteroge- neity with k-1 degrees of freedom. This test has relatively poor power to detect heterogeneity among small numbers of trials; consequently, an α-level of 0.10 is used to test hypotheses38,39.
Heterogeneity of results among trials is better quanti- fied using the inconsistency index I 2, which describes the percentage of total variation across studies40. Uncertainty intervals for I 2 (dependent on Q and k) are calculated us- ing the method described by Higgins and Thompson41. Negative values of I 2 are put equal to zero, consequently I 2 lies between 0 and 100%. A value >75% may be con- sidered substantial heterogeneity41. This statistic is less influenced by the number of trials compared with other methods used to estimate the heterogeneity and provides a logical and readily interpretable metric but it still can be unstable when only a few studies are combined42.
Given that there are several potential sources of het- erogeneity in the data, several steps should be considered in the investigation of the causes. Although random-ef- fects models are appropriate, it may be still very desirable to examine the data to identify sources of heterogeneity and to take steps to produce models that have a lower lev- el of heterogeneity, if appropriate. Further, if the studies examined are highly heterogeneous, it may be not appro- priate to present an overall summary estimate, even when random effects models are used. As Petiti notes43, statis- tical analysis alone will not make contradictory studies agree; critically, however, one should use common sense in decision-making. Despite heterogeneity in responses, if all studies had a positive point direction and the pooled confidence interval did not include zero, it would not be logical to conclude that there was not a positive effect, provided that sufficient studies and subject numbers were present. The appropriateness of the point estimate of the effect is much more in question.
Some of the ways to investigate the reasons for het- erogeneity; are subgroup analysis and meta-regression. The subgroup analysis approach, a variation on those described above, groups categories of subjects (e.g., by age, sex) to compare effect sizes. The meta-regression approach uses regression analysis to determine the influ- ence of selected variables (the independent variables) on the effect size (the dependent variable). In a meta-regres-
sion, studies are regarded as if they were individual pa- tients, but their effects are properly weighted to account for their different variances44.
Sensitivity analyses have also been used to examine the effects of studies identified as being aberrant concern- ing conduct or result, or being highly influential in the analysis. Recently, another method has been proposed that reduces the weight of studies that are outliers in meta-analyses45. All of these methods for examining het- erogeneity have merit, and the variety of methods avail- able reflects the importance of this activity.
Presentation of results A useful graph, presented in the PRISMA statement11,
is the four-phase flow diagram (Figure 3).
This flow-diagram depicts the flow of information through the different phases of a systematic review or meta-analysis. It maps out the number of records identi- fied, included and excluded, and the reasons for exclu- sions. The results of meta-analyses are often presented in a forest plot, where each study is shown with its ef- fect size and the corresponding 95% confidence interval (Figure 4).
The pooled effect and 95% confidence interval is shown in the bottom in the same line with “Overall”. In the right panel of Figure 4, the cumulative meta-analysis is graphically displayed, where data are entered succes- sively, typically in the order of their chronological ap- pearance46,47. Such cumulative meta-analysis can retro- spectively identify the point in time when a treatment
Υear of publication
Figure 3: PRISMA 2009 Flow Diagram (From Moher D, Liberati A, Tetzlaff J, Altman DG; PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. J Clin Epidemiol 2009;62:1006-12, For more information, visit www.prisma-statement.org).
effect first reached conventional levels of significance. Cumulative meta-analysis is a compelling way to exam- ine trends in the evolution of the summary-effect size, and to assess the impact of a specific study on the overall conclusions46. The figure shows that many studies were performed long after cumulative meta-analysis would have shown a significant beneficial effect of antibiotic prophylaxis in colon surgery.
Biases in meta-analysis Although the intent of a meta-analysis is to find and
assess all studies meeting the inclusion criteria, it is not always possible to obtain these. A critical concern is the papers that may have been missed. There is good reason to be concerned about this potential loss because stud- ies with significant, positive results (positive studies)
are more likely to be published and, in the case of interventions with a commercial value, to be promoted, than studies with non-significant or “negative” results (negative stud- ies). Studies that produce a positive result, especially large studies, are more likely to have been published and, conversely, there has been a re- luctance to publish small studies that have non-significant results. Further, publication bias is not solely the responsibility of editorial policy as there is reluctance among research- ers to publish results that were either uninteresting or are not random- ized48. There are, however, problems with simply including all studies that have failed to meet peer-review standards. All methods of retrospec- tively dealing with bias in studies are imperfect.