5. Randomized controlled trials
Claudia M. Witt and George T. Lewith
Chapter contents
Confirmatory and exploratory studies102
Study hypothesis and hypothesis testing103
Determining the sample size104
The essential elements of a randomized controlled trial105
Randomization106
Blinding109
Controls110
The influence of expectation111
Pragmatic versus experimental studies114
Analyzing randomized controlled trials115
Conclusion116
Introduction
We plan to discuss the principles and concepts that underpin randomized controlled trials (RCTs). This research methodology can be employed in a number of different contexts but RCTs are usually used in clinical settings. A trial, holding other factors constant, can be run to compare performance about a number of criteria, such as: ‘which treatment works better than a placebo?’ or ‘which device is most energy-efficient?’ The fundamental principles of RCTs remain the same in whichever context they are applied and this chapter aims to outline the steps that are important in setting up an RCT and meeting the requirements of a research protocol. RCTs allow us to answer very specific questions. They set out to evaluate the effect of a particular treatment or strategy in a population where an intervention is introduced, often by comparing the outcome with a control group where no intervention, or a standard intervention, may have been used. The population must be well defined and carefully selected.
A number of implicit assumptions underpin the RCT as follows (adapted from Vickers et al. 1997):
• We have an incomplete understanding of the world and knowledge evolves and develops. It is contingent and never definitive.
• Research methods evolve and change as we continue to learn about the world. Thus any trial will be as good as we can make it at the time and the knowledge gained is likely to be modest and incremental.
• Logically, cause precedes effect, or put another way, A leads to B.
• Beliefs cannot influence random events.
• In a well-designed study, the researcher’s beliefs cannot influence the outcome.
• Good research aims to minimize the effects of bias, chance variation and confounding.
• Research that investigates whether treatments do more good than harm must be a priority.
These tenets lead to the claim that the RCT provides the ’gold standard’ for research. It is the best means of attributing real cause and effect and therefore adds to our stock of knowledge. Not all researchers necessarily believe all of these, but they provide a sound framework for conducting a RCT.
The research question
An RCT is challenging and researchers contemplating a trial must ask themselves:
• What is a good question?
• How can questions be matched to the research design?
• What is the best strategic approach to the research?
• How can we interpret the results appropriately?
The first prerequisite for refining a research question is to carry out a thorough literature search. A literature search is an iterative and developmental process that will contribute directly and indirectly to protocol development. It will help to identify whether the question one wants to ask has already been asked and also point out the strengths and weaknesses of previous research in addressing and answering the question.
A question is likely to be answerable if it is explicit, focused and feasible. In other words, it should be possible to link the effect of an intervention explicitly to a specific outcome. The research should be focused. There should be a very clear, simple primary question and a research method that will provide an answer – the trick is not to ask too many primary questions simultaneously, even in a complex study. If there are multiple questions, then the primary research question must be given priority. The primary research question must be framed so that it is both possible and practical to answer the question. Some examples of questions are given in Table 5.1.
Category of question | Examples | Suitable for RCT |
---|---|---|
Attributing cause and clinical effect | Does homeopathically prepared grass pollen reduce symptoms of hayfever more than a non-active (placebo) treatment? | Y |
Is polypharmacy more effective than a single-remedy approach in the homeopathic treatment of chronic hayfever? | Y | |
What happens in clinical practice? | What is the cost-effectiveness of adding homeopathic treatment to a standard care package in hayfever? | Y |
What are the patterns of cross-referral between conventional and CAM practitioners in a multidisciplinary pain clinic? | N | |
How common are serious neurological complications following chiropractic cervical manipulation? | N | |
What do people do? | How many people visit practitioners of CAM each year? | N |
What do patients tell their primary care physician about usage of CAM? | N | |
How many nurses practice complementary medicine? | N | |
What do people believe and how do they explain it? | What do nurses believe about therapeutic touch? | N |
What is the patient’s experience of the acupuncture consultation? | Y | |
By what mechanisms does a therapy work? | What are the effects of needling the Hoku point on the production of endogenous opiates? | Y |
Does something proposed in a therapy actually exist? | Does peppermint oil reduce histamine-induced contractions of tracheal smooth muscle? | Y |
Does homeopathically prepared copper ameliorate the effects of copper poisoning in a plant model? | Y | |
Is a diagnostic or prognostic test accurate? | How sensitive and specific is detection of gallbladder disease by examining photos of the iris? | N |
Is tongue diagnosis reliable? | N |
The findings must be achievable within a reasonable period of time and within the bounds of the scientific and financial resources available. Randomization is designed to even out all the things we ‘don’t know’ about the groups we are comparing. These factors may allow for misinterpretation of the study’s findings and they are usually considered to be ‘confounding factors’. Randomization also involves minimizing ‘bias’: if either confounding factors or biases are known to the researcher, then they should be introduced into the study protocol at an early stage and may result in possibly modifying the research question or trial methodology, thus allowing an appropriate trial design to emerge.
Confirmatory and exploratory studies
In effect RCTs are mainly used to test a predefined hypothesis. When planning a confirmatory trial it is helpful to do a pilot trial with the aim of generating a hypothesis and to test the planned outcome measures and the study protocol for feasibility. In addition, pilot studies can be helpful in providing an idea of the effect size of the intervention and this in turn will inform the sample size for any larger, more definitive study. In these pilot trials statistics are used on an exploratory basis. The present published literature suggests that in CAM research confirmatory studies are often done without pilot trials and quite a large proportion of the confirmatory (definitive) studies seem like pilot studies and are too small to have enough power to detect a significant difference between groups. The reasons for this might be limited financial resources for CAM research and the fact that CAM has fewer qualified and experienced researchers than conventional medical research environments.
When planning a study it is important to clarify the study in more detail by considering the following aspects (Chow & Liu 2004):
• What aspects of the intervention are being studied?
• Is it important to investigate other issues that may have an impact on the intervention?
• Which control(s) or placebos might be used or considered?
Developing a hypothesis is an important issue when planning these studies. A hypothesis always consists of a null hypothesis (H0) which assumes no effect and an alternative hypothesis (HA) which holds the null hypothesis not to be true and consequently assumes an effect. The alternative hypothesis is more directly connected to the specific research question.
The aim of confirmatory studies is to answer research questions based on the hypothesis proposed by the researcher. Therefore, posing an adequate and answerable research question is essential as a clear, well-researched, thoughtful and specific question based on appropriate study design has a reasonable chance of providing a valid answer. One of the main reasons (other than failure to recruit) why the majority of research proposals are either unclear or fail to provide a useful answer is that the initial research question itself lacks clarity. The core elements of a precise research question for a confirmatory RCT can be summarized by PICO (patients, intervention, control intervention, outcome = primary endpoint; see example in Figure 5.1). Using PICO can be very helpful in developing your hypothesis.
FIGURE 5.1 |
Study hypothesis and hypothesis testing
The research question and the subsequent hypothesis build the basis for hypothesis testing. If the null hypothesis can be rejected based on a predefined significance level (e.g. 5% or 1%), the alternative hypothesis will be accepted. This means that, from the example shown in Figure 5.1, if there is a significant difference on the visual analogue scale (P < 0.05) between the acupuncture and the diclofenac group, the null hypothesis (H0: acupuncture = diclofenac) can be rejected and the alternative hypothesis (HA: acupuncture ≠ diclofenac) is applicable.
If there is no significant difference between both groups this does not necessarily mean that the null hypothesis is true as there may simply not be enough evidence to reject it. For example, if the sample size is too small the study may be underpowered. The chance of the study providing a statistically significant outcome is based on assuming that a 5% (or less) chance of this event occurring randomly is significant. This could mean that, with a P-value of 0.04 (a 4% chance of this happening randomly), the null hypothesis would be rejected, whereas with a P-value of 0.06 (a 6% chance of this happening randomly) there would not be enough evidence to reject the null hypothesis. The choice of a significance level of 5% is arbitrary but applicable throughout biological science and therefore the outcome of the study needs to be interpreted with caution, particularly with respect to the number of people entered. The greater sample size or number of people entered into the study, the more statistical ‘power’ it has and the more its statistical conclusions can be ‘trusted’.
However a significant difference between groups does not relate to the clinical importance of this finding as the statistical significance depends to a large extent on the sample size (number of people) and the variability of the condition within the study. Accordingly a large study might find a small but clinically unimportant difference between treatments, which is highly significant. Equally, a small study with few people might find a large difference that could be clinically important. Clinical importance describes a difference between two treatments that has a relevant effect size which is noticeable for, and valuable to, a patient with that condition.
Most randomized controlled studies evaluate if one intervention is superior to another. This is generally the case for treatment comparisons with a waiting list or a placebo intervention as controls. However, for some comparisons we test for similarity between two treatments (equivalence) or non- inferiority, but these equivalence studies, which include both non-inferiority and non-superiority trials, are rare in CAM. A non-inferiority hypothesis could, however, be very useful when comparing a CAM treatment with a conventional treatment. Non-inferiority or equivalence trials have methodological features that differ from superiority trials. In an equivalence trial, the null and alternative hypotheses are reversed. This means the null hypothesis is that treatment A is different from treatment B, whereas the alternative hypothesis is that there is no difference between the treatments.
In addition the effect size of the reference treatment has to be known and the margin for the non-inferiority has to be predefined for the power calculation and subsequent sample size. This margin represents the smallest value for a clinically relevant effect and outcomes within this margin are defined as non-inferior (Piaggio et al. 2006). The interpretation of the results depends on where the confidence interval lies relative to both the margin of non-inferiority and the null effect. For two-sided evidence (equivalence trials) two margins are needed (one above zero and one below) (Figure 5.2).
FIGURE 5.2 |
Determining the sample size
When planning a study probably the most important question is how many patients are needed to have enough statistical power to detect a clinically meaningful difference between the treatment groups. If the budget is limited or the medical conditions allow only a small number of patients to be included in the study, the study may fail to reach adequate power and could be of very limited value. An appropriate sample size is essential for a worthwhile study. The sample size calculation requires that the hypothesis should be clearly stated and a valid design and appropriate test statistics should be used. In hypothesis testing two errors can occur: the type I error (α) takes place when the null hypothesis is rejected although it was true and the type II error (β) is when the null hypothesis is not rejected when it was false. The probability of making a type I error is called the level of significance whereas the type II error is called power. A type I error is usually considered to be the most important of the two items this approach primarily tries to control for. To determine the sample size the investigator should provide the following information:
• the significance level
• the power
• a clinically meaningful difference
• information about the standard deviation.
In clinical studies a significance level of 5% is usually chosen to reflect 95% confidence in the conclusions and usually a power of 80% or 90% is used in conjunction with this. Estimating a clinically meaningful difference and the standard deviation for the planned trial is often difficult, especially for CAM treatments, where there may be no previous data for the outcome for the intervention being evaluated. If the expected difference is large, fewer patients will be needed than if the expected difference is small. An important influence on the sample size is the variability of the patient outcomes for the primary outcome. If a high standard deviation is expected (lots of variability) more patients are needed to detect a significant difference between treatments than if the patient outcomes are more homogeneous.
Many CAM studies are conducted with small research budgets and consequently small sample sizes; therefore the sample size which would detect a significant difference between treatment groups may not be achieved. From a methodological perspective it is bad science to perform an underpowered study. If the budget is not sufficient to perform a trial of adequate size it is better to do no study, unless it is a pilot.
The essential elements of a randomized controlled trial
The RCT is the most reliable study design to detect whether the difference in the outcomes of two or more interventions is caused by a specific treatment. In effect it is the ‘gold standard’ for clinical research. With the aim for control for baseline differences, patients are allocated at random to receive one of several interventions. In addition, the treatment group is compared to at least one control group. Different control groups exist and the choice of control depends on the research question (Table 5.2). Placebo-controlled trials only tell us if the treatment is better than a placebo and we always need more information that that to inform clinical practice, where many treatments may be available for one condition.
Research question | Adequate control group |
---|---|
Is the treatment effect specific? | Placebo |
Is the treatment superior to no treatment? | Waiting list control |
Is the treatment superior (or non-inferior) to standard therapy? | Standard treatment |
Is the treatment superior (or non-inferior) to another treatment? | Other treatment |
Is the treatment in addition to usual care superior to usual care alone? | Usual care |
Buy Membership for Complementary Medicine Category to continue reading. Learn more here