6. Systematic reviews and meta-analyses in CAM
contribution and challenges
Klaus Linde and Ian D. Coulter
Chapter contents
Introduction119
The protocol120
The question120
Searching the literature121
Selecting the relevant studies122
Extracting information123
Assessing quality123
Summarizing the results124
Checking the robustness of results127
Who should do systematic reviews?127
How to identify existing systematic reviews129
Limitations of systematic reviews129
How should we use systematic reviews in CAM?131
Introduction
Every year more than two million articles are published in over 20 000 biomedical journals (Mulrow 1995). Even in speciality areas it is impossible to keep up to date with all relevant new information. In this situation systematic reviews hold a key position to summarize the state of the current knowledge. A review is called systematic if it uses predefined and explicit methods for identifying, selecting and assessing the information (typically research studies) deemed relevant to answer the particular question posed. A systematic review is called a meta-analysis if it includes an integrative statistical analysis (pooling) of the included studies.
Within complementary medicine systematic reviews are of major relevance. This chapter aims to give an introduction on how to read and how to do a systematic review or a meta-analysis, and discusses advances and limitations of this method.
Most available systematic reviews on complementary medicine have focused on treatment effects. Therefore, most of the content of this chapter refers to such reviews. But of course, it is possible to do systematic reviews on other topics, for example the validity and reliability of diagnostic methods (for example, on iridology: Ernst 1999), side-effects (for example, on side-effects and interactions of St John’s wort: Knüppel & Linde 2004), on surveys (for example, on reasons for and characteristics associated with complementary and alternative medicine (CAM) use among adult cancer patients: Verhoef et al. 2005), observational studies of association of risk or protective factors with major diseases (for example, whether the consumption of green tea is associated with a lower risk of breast cancer: Seely et al. 2005), or even on other systematic reviews (for example, a review of reviews of acupuncture: Linde et al. 2001a). The principles are the same as for reviews of studies on treatment effects but details in the methodological steps can differ.
The protocol
A systematic review is, in principle, a retrospective study. The unit of investigation is the original research study (primary study), for example, a randomized controlled trial (RCT). The retrospective nature limits the conclusiveness of systematic reviews. Nevertheless, the methods of retrospective studies should be defined as far as possible in advance.
The protocol of a systematic review should have subheadings similar to those in this article. For writing a feasible and useful protocol it is necessary to have at least some basic ideas what and how much primary research is available. Therefore, in practice the protocol of a systematic review often has to be developed in a stepwise approach where the methods are defined rather loosely in the early screening phase and then become increasingly specific.
Examples of detailed protocols of systematic reviews can be found in the Cochrane Library (www.cochrane.org), the electronic publication of a worldwide network for systematic reviews of health care interventions.
The question
As in any research a clear and straightforward question is a precondition for a conclusive answer. A clearly defined question for a systematic review on a treatment is, for example, whether extracts of St John’s wort (Hypericum perforatum) are more effective than placebo in reducing the clinical symptoms of patients with major depression (Linde et al. 2008). A question of this type already predefines the group of patients (those with major depression), the type of experimental (extracts of St John’s wort) and control intervention (placebo) as well as the outcome of interest (clinical symptoms of depression). Systematic reviews on such narrowly defined questions make sense, particularly when a number of similar studies are available but their results are either not widely known or contradictory, or if the number of patients in the primary studies is too small to detect relevant differences with sufficient likelihood.
In complementary medicine the number of studies on a particular topic is often small. If the question is very narrow it may be that only a few or even no relevant trials can be identified. For example, a systematic review has been performed to assess whether there is evidence from randomized trials that acupuncture is effective for treating swallowing difficulties in patients with stroke (Xie et al. 2008). Only one trial met the inclusion criteria. While it can be easily concluded from such a review that there is very little evidence, this is not very satisfying for reviewers or readers.
Sometimes it may be useful to ask broader questions, for example: ‘Is there evidence that acupuncture can reduce symptoms associated with acute stroke?’ Such a review will give a more descriptive overview of inhomogeneous studies (regarding study design, outcomes, prevention or treatment). It will be more hypothesis-generating than hypothesis-testing.
Searching the literature
An obvious precondition for a good systematic review is that the relevant literature is covered comprehensively. Until the late 1990s the majority of studies in the area of complementary medicine were published in journals that were not covered by the most important electronic databases such as PubMed/Medline or Embase. This has changed over the last 15 years. Several journals focusing on research in CAM are now included in these databases, and many studies are published in ‘conventional’ medical journals. Studies published in journals not listed in these major databases tend to have lower quality (Linde et al. 2001b). However, literature searches restricted to the major databases might miss relevant high-quality studies. Therefore, depending on the subject it will be necessary that reviewers should search more specialized databases, and in any case they should use additional search methods.
A very effective and simple method is checking the references of identified studies and reviews relevant to the topic. This method often also identifies studies which are published only as abstracts, in conference proceedings or books. The problem of this method can be that studies with ‘undesired’ results are systematically undercited. Contacting persons, institutions or industry relevant to the field can help both in identifying new sources to search (for example, a specialized database which was unknown to you before) and obtaining articles directly. Finally, handsearching of journals and conference proceedings is a possibility, although this often surpasses the time resources of reviewers.
A problem that has to be kept in mind is that specific complementary therapies and research activities might be concentrated in certain countries. For example, thousands of randomized trials on traditional Chinese medicine have been performed in China. These articles have been published almost exclusively in Chinese journals and in Chinese language. While western researchers are often reluctant to search and include studies from China due to their generally low quality (Wang et al. 2007), the almost exclusively positive results, raising fundamental doubts about their reliability (Vickers et al. 1998) and the need of resources for search and translation, this practice will have to change in the future.
In practice, the scrutiny of the literature search will depend strongly on the resources available. Reviews which are based on literature searches in only one or two of the mentioned sources should, however, be interpreted with caution, keeping in mind that a number of relevant studies might have been missed.
A major problem pertinent to systematic reviews is publication bias (Dickersin et al., 1987 and Kleijnen and Knipschild, 1992). Publication bias occurs when studies with undesired results (mostly negative results) are published less often than those with desired results (mostly positive results). Especially small negative or inconclusive studies are less likely to be published. Sometimes authors do not submit such studies; sometimes journal editors reject them. Publication bias typically leads to overly optimistic results and conclusions in systematic reviews.
Reviewers should try to find out whether unpublished studies exist. Informal contacts with researchers in the field are an effective way of achieving this. However, it is often difficult to get written reports on these studies. Handsearching the abstract books and proceedings of research meetings is another potential method of identifying otherwise unpublished studies. Recently, there have been developments that randomized clinical trials are registered beforehand (http://clinicaltrials.gov/ or http://www.controlled-trials.com/). Such registers will probably become the best way in the future to check for unpublished or ongoing studies, but even today many small trials do not become registered.
There are no foolproof ways to detect and quantify publication bias (see section on checking the robustness of results, below, for a method of estimating the influence). Every reviewer and all readers should always be aware of this risk.
Selecting the relevant studies
Selecting the studies for detailed review from the often large number of references identified from the search is the next crucial step in the review process. Readers should check carefully whether this process was transparent and unbiased. Readers should be aware that minor changes in inclusion criteria can result in dramatic differences in the number of included studies in reviews addressing the same topic (Linde & Willich 2003).
A good systematic review of studies on diagnosis or therapy should explicitly define inclusion and exclusion criteria (Box 6.1). It has been outlined above that ‘the question’ of the review already crudely predefines these criteria. In the methods section of a systematic review these criteria should be described in more detail. For example, in a review on garlic for treating hypercholesterinaemia (such as Stevinson et al. 2000), it should be stated at which cholesterol level patients were considered as hypercholesterinaemic. Garlic can be applied in quite different ways (fresh, dried preparations, oil) and we need to know whether all of them were considered (this was not reported explicitly in this specific review).
BOX 6.1
▪ Type of patients/participants (for example, patients with serum cholesterol > 200mg/dl)
▪ Type of studies (for example, randomized double-blind studies)
▪ Type of intervention (for example, garlic mono-preparations in the experimental group and placebo in the control group)
▪ Type of outcomes (for example, trials reporting total cholesterol levels as an endpoint)
▪ Eventually other restrictions (for example, only trials published in English, published in a peer-reviewed journal, no abstracts)
Another selection criterion typically applies to the type of studies considered. For example, many reviews are limited to randomized trials. The measurement and reporting of predefined outcomes as inclusion criteria are of particular relevance when a meta-analysis is planned. This often leads to the exclusion of a relevant proportion of otherwise relevant studies and the reader has to consider whether this might have influenced the findings of the review. Some reviews have language restrictions. This might not only result in the exclusion of a relevant proportion of trials but also change the results (Pham et al. 2005).
In practice the selection process is mostly performed in two steps. In the first step all the obviously irrelevant material is discarded, for example, all articles on garlic which are clearly not clinical trials in patients with cardiovascular problems. To save time and money this step is normally done by only one reviewer in a rather informal way. The remaining articles should then be checked carefully for eligibility by at least two independent reviewers (so that one does not know the decision of the other). In the publication of a systematic review it is advisable to list the potentially relevant studies which were excluded and give the reasons for exclusion. A consensus paper on how meta-analyses should be reported recommends the use of flow charts displaying the number of papers identified, excluded and selected at the different levels of the review and assessment process (Moher et al. 2009). This makes the selection process transparent for the reader.
Extracting information
Finally if a number of eligible studies have been identified, obtained and have passed the selection process, relevant information has to be extracted. If possible the extraction should be standardized, for example, by using a pretested form. The format should allow one to enter the data into a database and to perform basic statistical analyses. Another efficient method is to enter the data directly into a prestructured table. Regardless of what method is used, reviewers have to have a clear idea of what information they need for their analysis and what readers will need to get their own picture. Extraction and all assessments should be done by at least two independent reviewers. Coding errors are inevitable and personal biases can influence decisions in the extraction process. The coded information of the reviewers must be compared and disagreements discussed.
Assessing quality
A major criticism of sceptics towards meta-analysis (and, to a lesser extent, also to non-meta-analytic systematic reviews) is the garbage-in, garbage-out problem. The results of unreliable studies do not become more reliable by lumping them together. It has been shown in conventional and complementary medicine that less rigorous trials tend to yield more positive results (Schulz et al., 1995, Lijner et al., 1999 and Linde et al., 1999). Sometimes it may be better to base the conclusions on a few, rigorous trials and discard the findings of the bulk of unreliable studies. This approach has been called ‘best-evidence synthesis’ (Slavin 1986). While it is sometimes considered as an alternative, it is in principle a subtype of systematic reviews in which defined quality aspects are used as additional inclusion criteria (see White et al. 1997, as an example from complementary medicine).
However, the assessment of quality is difficult. The first problem is that quality is a complex concept. Methodologists tend to define quality as the likelihood that the results of a study are unbiased. This dimension of quality is sometimes referred to as internal validity or methodological quality. But a perfectly internally valid study may have fundamental flaws from a clinician’s point of view if, for example, the outcomes measured are irrelevant for the patients or patients are not representative of those commonly receiving the treatment.
A second problem is that quality is difficult to operationalize in a valid manner. An experienced reviewer will find a lot of subtle information giving an indication on the quality of the study ‘between the lines’ from omissions and small details. However, subjective global ways to assess quality are not transparent and prone to subjective biases.
Many systematic reviews on treatment interventions include some standardized assessments of internal validity. There is agreement that key criteria for the internal validity of treatment studies are random allocation, blinding and adequate handling of dropouts and withdrawals. In the past these and other criteria have typically been combined in scores (see Moher et al., 1995 and Moher et al., 1996 for overviews). But the validity of such scores is doubtful (Jüni et al. 1999). Today it is clearly preferred to assess single validity items without summarizing them in a score, and to investigate whether quality has an impact on findings. Currently, the most important tool is the Cochrane Collaboration’s ‘risk of bias’ assessment (Higgins & Altman 2008). Whether the criteria are combined in scores or applied separately, the problems remain that the formalized assessment is often crude and the reviewers have to rely on the information reported.
In conclusion, assessments of methodological quality are necessary but need to be interpreted with caution. The assessment of other dimensions of quality is desirable, but the problems in the development of methods for this purpose (which have to regard the specific characteristics of the interventions and conditions investigated) are even greater than for internal validity.
Summarizing the results
The clinical reader of a systematic review is mainly interested in its results. While the majority of readers will only look on the abstract and meta-analytic summary, the review also has to provide sufficient information for those who want to get their own idea of the available studies and their results. For example, in a review of acupuncture in headache it will be relevant for the specialist to know what type of headache was studied in each primary study, to have information on the sex and age of the patients, and where they were recruited. Regarding the methods readers should know what the design was, whether there was some blinding, how long the patients were followed and whether follow-up was complete. They need details on the nature of the experimental (which acupuncture points, how many treatments) and the control interventions (type of sham acupuncture). And, of course, readers want to know which outcomes have been measured and what the results were. This detailed information is typically summarized in a table.
If the primary studies provide sufficient data the results are summarized in effect size estimates. Table 6.1 lists some of the most common measures.
a = number of patients with an event in the experimental group | ||
b = number of patients without event in the experimental group | ||
c = number of patients with an event in the control group | ||
d = number of patients without event in the control group | ||
xe = mean experimental group | ||
xc = mean control group | ||
sd = standard deviations (either of the control group or pooled for both groups) | ||
Estimate | Calculation | Advantages/disadvantages |
---|---|---|
For dichotomous data (e.g. response, death etc.) | ||
Odds ratio | Most widespread estimate in epidemiology/intuitively difficult to understand | |
Relative risk (rate ratio) | Easy to understand/problematic in case of very low or high control group event rates | |
For continuous data (e.g. blood pressure, enzyme activity etc.) | ||
Weighted mean difference | xe – xc weighted by 1 / variance | Easy to interpret/only applicable if all trials measure the outcomes with the same scale |
Standardized mean difference | Applicable over different scales/clinically difficult to interpret |
Tables including a graphical display of the results are extremely helpful. Table 6.2 shows the standard display from a meta-analysis from the Cochrane Library, in this case on randomized trials comparing hypericum extracts and standard antidepressants (separated in the subgroups older antidepressants and selective serotonin reuptake inhibitors (SSRIs)) in patients with major depression (Linde et al. 2008
Buy Membership for Complementary Medicine Category to continue reading. Learn more here