26 Understanding the strengths and weaknesses of clinical research in cancer
What are the important elements when assessing reports of trials?
By far the most important elements are the design and conduct of the trial. The methods of analysis, while still important, are generally less likely to lead to inappropriate conclusions than inappropriate design and management of a trial. This chapter is based on the methods recommended by various EBM groups (Box 26.1).
Because there is such a huge volume of medical literature published each year, it is important to take a systematic approach to what you appraise thoroughly (Box 26.2). The first step in appraisal is to screen the paper to see if it is worthy of careful reading. It may be possible to answer these screening questions on reading the title and abstract.
Box 26.2
Systematic method of research appraisal
Step 1 – Screening questions
Step 2 – Appraising a paper reporting a trial
There are a number of crucial questions that need to be answered (Box 26.2):
Concurrent or historical controls?
Did the authors use randomization?
The purpose of randomization is to ensure, as far as possible, that all factors (known and unknown) that may influence the treatment outcome are balanced between the treatment groups. This is done to reduce the risk of a chance bias. Randomization requires the use of a random device; this is normally a table of random numbers. Systematic allocation; i.e., alternating between treatments, odd or even birth date or hospital number is not an acceptable method, though it is sometimes referred to as pseudo-randomization. Since the researchers know which treatment each person is allocated to before they consent, selective allocation can occur, which will skew the results. Hence the process used for randomization must ensure that neither the trial subjects nor the investigator can influence the treatment arm each person ends up in (‘allocation concealment’).
2 Was the study based on a pre-specified protocol?
When reading a paper bear in mind the following elements of trial design:
Subgroup analysis
Where a subgroup is found to behave differently, you should consider if there is a plausible biological mechanism for this and whether other trials have found a similar finding. Where there is a statistical analysis, this should be done as a formal test of interaction. Strong empirical evidence suggests that post hoc subgroup analyses often lead to false positive results (Example Box 26.2).
3 Did the investigators stick to the protocol?
Changes in a protocol that may cause major problems includes, introducing new exclusion criteria after the trial has finished or during the course of the trial and introducing new end points. So, for instance, new exclusion criteria may improve the results by leaving only those patients who get apparent benefit. The addition of unplanned end points may turn a negative study around by concentrating the focus on positive results for the new end points, when the old endpoints were negative.
Fraudulent conduct of clinical research
Although we all hope that such fraud is very rare, there are too many instances where there has been clear fraud (Example Box 26.3). Detecting such fraud is very difficult. Peer review may help, but in many instances journals do not have the original protocol and access to the raw data is extremely unusual. In the above example, it was an independent outside review that revealed the fraud. The review was carried out as the trial was at variance with other literature and the subject was of great importance.
4 Who knew what and when?
5 Which patients were included or excluded and what happened to those lost to follow up or who went off protocol or were left out?
Who was lost to follow-up or dropped out during the study?
All patients should be analysed in the groups they were randomized to, regardless of whether they crossed over or changed treatment, were lost to follow up or dropped out. This is known as ‘intention to treat’ analysis and should always be reported, even if the authors also choose to present analyses excluding selected patients (Box 26.3). Intention to treat analysis is the only way we can ensure that the original randomization has retained and thereby, the groups are comparable.
Box 26.3
Sample size slippages in randomized trials
6 How much did the outcomes change and were they measured and reported appropriately?
It is important to ensure that the end points used are appropriate. There is a tendency to measure, easier and quicker, surrogate end points, rather than those that are important to the patient or their clinicians. Patients want information on what effect an intervention will have on their chance of survival, what the side effects will be, and whether it will change their quality of life. Use of surrogate measures, such as a fall in a tumour marker, will fail to answer these questions; even if they may provide evidence supporting potential benefit.
Were measurements appropriate and well done?
Although it is probably less of a problem in cancer studies than some other areas of medicine, be wary of end points that measure short-term outcomes. For instance, in studies concerned with management of complications of cancer or the side effects of its treatment, trials may report effectiveness of an intervention after, say, six-week therapy. Since it is easier to get a short-term change, it is important to consider if the investigators should have been measuring long-term effectiveness. For instance, several treatments in the review ‘Non-surgical interventions for late radiation proctitis in patients who have received radical radiotherapy to the pelvis,’ (Denton et al., 2008), were often only given for one month and then response was assessed at one time point.
Which is the better treatment and how much difference?
It is not sufficient to know that one treatment is better than another, it is important to quantify how much better the treatment is than the other. While a p-value may tell you which treatment is better, the confidence interval is a better measure. It not only tells you which is the better treatment, but quantifies the difference (Box 26.4). This is particularly important where any gain has to be balanced against the side effects of that treatment.
Box 26.4
Is the result important?
Were there sufficient subjects?
There has been a tendency for cancer RCTs to have too few subjects, or more accurately patients reaching the end point of the study. Investigators should provide an a priori justification of the sample size in the paper itself. If there are no power calculations, it is useful to look at the width of the confidence intervals. If there is an inadequate sample size, the confidence intervals will be abnormally wide.
Systematic reviews and meta-analyses
Introduction
Did the reviewers find all of the trials?
One of the greatest risks of bias in a meta-analysis is omitting relevant studies. Often studies are never published; evidence suggests that such studies are likely to be negative. This is known as publication bias (Example Box 26.4). Registration of trials, before they start, is being introduced as a way of avoiding publication bias.
Heterogeneity
Even when the trials do not have obvious clinical heterogeneity the results may turn out to be very different. An example is RCTs testing the efficacy of adding paclitaxel to platinum therapy for the primary chemotherapy of ovarian cancer. The initial two studies, GOG 111 and OV10, showed an advantage for the addition of paclitaxel. Two subsequent trials, GOG132 and ICON3 (by far the biggest trial), have failed to show an advantage for the addition of paclitaxel. Examining the results there appears to be such an extreme heterogeneity that pooling of the data in a meta-analysis should be avoided. Despite this NICE in its appraisal did include a meta-analysis (Box 26.5).
The main areas where heterogeneity occurs include:
Managing heterogeneity
Subgroup analysis
When there is substantial heterogeneity, you can look and compare subgroups of the studies. As the case in Box 26.5, of the use of paclitaxel chemotherapy in ovarian cancer, suggests rather than pooling all the data, the review could have pooled the data from the two groups of positive and negative trials and then discussed the reasons for the different outcomes.
Taking the easy option – a narrative review
Intentional exclusion of studies
In any meta-analysis, you have to draw a line somewhere. Studies that fail to meet your criteria will not be included in the results. Where the cut-off point is based on judgment, such as trial quality, this can sometimes cause serious controversy. A Cochrane Review of mammographic screening for breast cancer found seven eligible studies, but only two were deemed to be of sufficient quality to include in the review; meta-analysis of the two studies found no benefit from screening. If all seven studies were included the outcome was positive. This Cochrane review provoked a furious response and intense debate over the rights and wrongs of excluding most of the available trials (Example Box 26.5).
Evidence-based medicine (EBM)
Why do we need EBM?
Clinicians are hungry for edible sized bites of reliable information:
Moher D, Schulz KF, Altman DG. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet. 2001;357(9263):1191-1194.
Denton AS, Clarke N, Maher J. Non-surgical interventions for late radiation proctitis in patients who have received radical radiotherapy to the pelvis. Cochrane Database of Systematic Reviews. (Issue 4):2008.
Straus SE, Sackett DL. Using research findings in clinical practice. BMJ. 1998;317(7154):339-342.
Montori VM, Guyatt GH. Intention-to-treat principle. CMAJ. 2001;165(10):1339-1341.
Straus SE, Sackett DL. Applying evidence to the individual patient. Ann Oncol. 1999;10(1):29-32.
Straus SE, McAlister FA. Evidence-based medicine: a commentary on common criticisms. CMAJ. 2000;163(7):837-841.
Jackson R, Ameratunga S, Broad J, et al. The GATE frame: critical appraisal with pictures. Evid Based Med. 2006;11(2):35-38.
Sackett DL, Straus SE. Finding and applying evidence during clinical rounds: the ‘evidence cart’. JAMA. 1998;280(15):1336-1338.
Williams CJ. The pitfalls of narrative reviews in clinical medicine. Ann Oncol. 1998;9:601-605.