Understanding the strengths and weaknesses of clinical research in cancer

Published on 09/04/2015 by admin

Filed under Hematology, Oncology and Palliative Medicine

Last modified 22/04/2025

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 4 (1 votes)

This article have been viewed 5859 times

26 Understanding the strengths and weaknesses of clinical research in cancer

What are the important elements when assessing reports of trials?

By far the most important elements are the design and conduct of the trial. The methods of analysis, while still important, are generally less likely to lead to inappropriate conclusions than inappropriate design and management of a trial. This chapter is based on the methods recommended by various EBM groups (Box 26.1).

Because there is such a huge volume of medical literature published each year, it is important to take a systematic approach to what you appraise thoroughly (Box 26.2). The first step in appraisal is to screen the paper to see if it is worthy of careful reading. It may be possible to answer these screening questions on reading the title and abstract.

Step 1 – Screening questions

Step 2 – Appraising a paper reporting a trial

There are a number of crucial questions that need to be answered (Box 26.2):

Concurrent or historical controls?

When a new therapy is being tested some investigators will give the new treatment to all the patients and will then compare the outcomes with those obtained from the records of similar patients treated in the past – so-called historical controls. Any such comparison is fraught with danger as factors other than the new treatment may have change over time.

2 Was the study based on a pre-specified protocol?

A protocol written before the trial starts is a prerequisite to good research and some journals are now recommending that the original protocol be submitted with the final paper to ensure that there was such a protocol and to identify any deviations from the original design. Any deviation from the original study design can result in skewed observation of clinical benefit.

When reading a paper bear in mind the following elements of trial design:

Subgroup analysis

Subgroup analyses are fairly common in trials; they use the data from a study to compare one endpoint across different subgroups. There are three major problems with this approach:

Where a subgroup is found to behave differently, you should consider if there is a plausible biological mechanism for this and whether other trials have found a similar finding. Where there is a statistical analysis, this should be done as a formal test of interaction. Strong empirical evidence suggests that post hoc subgroup analyses often lead to false positive results (Example Box 26.2).

Example Box 26.2
International study of infarct survival

Sleight P. Subgroup analyses in clinical trials: fun to look at – but don’t believe them! Curr Control Trials Cardiovasc Med 2000;1(1):25–27.

Analysis of subgroup results in a clinical trial is surprisingly unreliable, even in a large trial. This is due to a combination of reduced statistical power, increased variance and the play of chance. Reliance on such analyses is likely to be erroneous. Plausible explanations can usually be found for effects that are, in reality, simply due to the play of chance. When clinicians believe such subgroup analyses, there is a real danger of harm to the individual patient.

In order to study the effect of examining subgroups the investigators of the ISIS trial, testing the value of aspirin and streptokinase after MI, analysed the results by astrological star sign. All of the patients had their date of birth entered as an important ‘identifier’. They divided population into 12 subgroups by astrological star sign. Even in a highly positive trial such as ISIS-2, in which the overall statistical benefit for aspirin over placebo was extreme (p < 0.00001), division into only 12 subgroups threw up two (Gemini and Libra) for which aspirin had a non-significantly adverse effect (9% ± 13%).

ISIS-2 was carried out in 16 countries. For the streptokinase randomization, two countries had non-significantly negative results, and a single (different) country was non-significantly negative for aspirin.

There is no plausible explanation for such findings except for the entirely expected operation of the statistical play of chance. It is very important to realize that lack of a statistically significant effect is not evidence of lack of a real effect.

3 Did the investigators stick to the protocol?

Clinical research is rarely predictable, so changes or alterations to a protocol are often needed. However, they should be clearly reported, as major deviations may reduce the reliability of the data.

Changes in a protocol that may cause major problems includes, introducing new exclusion criteria after the trial has finished or during the course of the trial and introducing new end points. So, for instance, new exclusion criteria may improve the results by leaving only those patients who get apparent benefit. The addition of unplanned end points may turn a negative study around by concentrating the focus on positive results for the new end points, when the old endpoints were negative.

Without access to the original protocol, it may be difficult for the reader to know if there were major protocol deviations. Where a deviation is known the acid test is to ask if the change would have made sense if it had been considered when the protocol was being designed.

There is also an increase in trend towards stopping the trials too early after an interim analysis. The ethical reason for this decision is to minimize the number of patients receiving an unsafe, ineffective or inferior treatment. However, stopping trials for apparent benefit will systematically overestimate treatment effects, especially when the sample size is small. The best strategy to minimize the problems associated with early stopping is not to stop early. Alternative strategies include a low p-value as the threshold for stopping at the time of interim analysis, not to analyse before a sufficiently large number of events had accrued and continuation of enrolment and follow-up for further period.

Fraudulent conduct of clinical research

Although we all hope that such fraud is very rare, there are too many instances where there has been clear fraud (Example Box 26.3). Detecting such fraud is very difficult. Peer review may help, but in many instances journals do not have the original protocol and access to the raw data is extremely unusual. In the above example, it was an independent outside review that revealed the fraud. The review was carried out as the trial was at variance with other literature and the subject was of great importance.

Example Box 26.3
On-site audit for high-dose chemotherapy

Weiss RB et al. An on-site audit of the South African Trial of high-dose chemotherapy for metastatic breast cancer and associated publications. J Clin Oncol 2001;19:2771–2777.

This article reported the results of an on-site audit to verify the results of a randomized study reported by Bezwoda et al. on high-dose chemotherapy (HDC) for treatment of metastatic breast cancer. In the original study, 90 patients were reported to have been randomized and treated. However, even after searching more than 15,000 sets of medical records only 61 of the 90 patients could be found. Of these 61, only 27 had sufficient records to verify eligibility for the trial by the published criteria. Of these 27, 18 did not meet one or more eligibility criteria. Only 25 patients appeared to have received their assigned therapy temporally associated with their enrolment date, and all but three of these 25 received HDC. The treatment details of individual patients were at great variance from the published data.

In the accompanying editorial (High-dose chemotherapy for breast cancer: ‘how do you know?’ J Clin Oncol 2001;19(11):2769–2770) Larry Norton wrote ‘In this regard, Dr Weiss et al. … have done a great service. They offer us unequivocal evidence that Dr Werner Bezwoda’s critical study in high-dose chemotherapy for advanced breast cancer is fake and completely inadmissible information regarding the safety and efficacy of such treatment was included. This work was previously and wrongly reported as positive at ASCO’s 1992 Annual Meeting, in the Journal in 1995, and in multiple subsequent publications [six] … That the original publication, now being retracted by the Journal, has influenced major thinkers in this field and may have put patients in danger raises the stakes as we consider how we can improve the process to make sure that this never happens again.’

4 Who knew what and when?

5 Which patients were included or excluded and what happened to those lost to follow up or who went off protocol or were left out?

There is a major problem when studies, particularly RCTs, become unrepresentative of the population or setting which the treatment is potentially suitable for. Frequently, patients in trials are fitter and younger than those encountered in routine practice.

Who was lost to follow-up or dropped out during the study?

Loss of some patients in a study is inevitable. These patients may have a different prognosis than those remaining in the trial. Excluding those lost to follow-up or dropped out will often result in overestimate of treatment benefit.

All patients should be analysed in the groups they were randomized to, regardless of whether they crossed over or changed treatment, were lost to follow up or dropped out. This is known as ‘intention to treat’ analysis and should always be reported, even if the authors also choose to present analyses excluding selected patients (Box 26.3). Intention to treat analysis is the only way we can ensure that the original randomization has retained and thereby, the groups are comparable.

Box 26.3
Sample size slippages in randomized trials

Schulz KF and Grimes DA. Sample size slippages in randomized trials: exclusions and the lost and wayward. Lancet 2002;359:781–785.

6 How much did the outcomes change and were they measured and reported appropriately?

It is important to ensure that the end points used are appropriate. There is a tendency to measure, easier and quicker, surrogate end points, rather than those that are important to the patient or their clinicians. Patients want information on what effect an intervention will have on their chance of survival, what the side effects will be, and whether it will change their quality of life. Use of surrogate measures, such as a fall in a tumour marker, will fail to answer these questions; even if they may provide evidence supporting potential benefit.

Which is the better treatment and how much difference?

It is not sufficient to know that one treatment is better than another, it is important to quantify how much better the treatment is than the other. While a p-value may tell you which treatment is better, the confidence interval is a better measure. It not only tells you which is the better treatment, but quantifies the difference (Box 26.4). This is particularly important where any gain has to be balanced against the side effects of that treatment.

Box 26.4
Is the result important?

In any study, the beneficial effect/risk of the experimental arm can be due to three possible reasons:

P-value is used to measure the probability of occurring the benefit/risk by a chance, e.g. a p-value of 0.01 means that there is a 1 in 100 (1%) probability of the result occurring by chance. Conventionally, a p value of <0.05 (<1 in 20 probability) is set as the statistically significant result.

Confidence interval (CI) is used to measure the sampling error. Since any study can only examine a sample of a population, we would expect the sample to be different from the population (sampling error). Conventionally a 95% CI is used, which specifies that there is a 95% chance that the population’s true value lies between the two limits. If the 95% CI crosses the ‘line of no difference’ between interventions, the result is not statistically significant.

Once the bias and chance have been ruled out, the benefit/risk of intervention can be quantified using the following measures:

Just because a statistical test shows an intervention to be superior this does not tell you whether the difference is clinically important. This requires, in addition, an estimate of the numbers of patients seen in routine practice who might benefit from the ‘better’ treatment, the toxicity of the treatment, its ease of administration and cost. Assessing clinical significance requires sound judgment. The decision whether to use that treatment will also ultimately depend on the personal preferences of patients.

Systematic reviews and meta-analyses

Introduction

A systematic review is one carried out in accordance with a written protocol and using methodology designed to reduce the risk of bias. Meta-analysis is the quantitative pooling of data from two or more studies. When you are examining the results of a meta-analysis, you should ask the following questions:

Did the reviewers find all of the trials?

One of the greatest risks of bias in a meta-analysis is omitting relevant studies. Often studies are never published; evidence suggests that such studies are likely to be negative. This is known as publication bias (Example Box 26.4). Registration of trials, before they start, is being introduced as a way of avoiding publication bias.

Heterogeneity

Even when the trials do not have obvious clinical heterogeneity the results may turn out to be very different. An example is RCTs testing the efficacy of adding paclitaxel to platinum therapy for the primary chemotherapy of ovarian cancer. The initial two studies, GOG 111 and OV10, showed an advantage for the addition of paclitaxel. Two subsequent trials, GOG132 and ICON3 (by far the biggest trial), have failed to show an advantage for the addition of paclitaxel. Examining the results there appears to be such an extreme heterogeneity that pooling of the data in a meta-analysis should be avoided. Despite this NICE in its appraisal did include a meta-analysis (Box 26.5).

Box 26.5
NICE appraisal on addition of paclitaxel to platinum in ovarian cancer

‘While design differences between the four trials, in terms of severity of disease of included patients, differences in treatment and control drugs and doses, length of follow-up, and the extent of cross-over (before and after disease progression), may hamper statistical pooling of results, meta-analyses have been undertaken … These take account of statistical heterogeneity as far as possible, and their results appear consistent, reporting that the findings for progression-free survival (hazard ratios = 0.84, 95% CI = 0.70 to 1.02 [MRC] and 0.87, 95% CI 0.72 to 1.05 [BMS]) and overall survival (hazard ratios = 0.82, 95% CI 0.66 to 1.01 [MRC] and 0.82, 95% CI 0.68 to 1.00 [BMS]) across the trials do not show statistically significant differences between paclitaxel/platinum and the alternatives.’

‘The four trials showed consistently that treatment with paclitaxel in combination with platinum leads to more side effects …’

‘The Committee took account of this range of trial evidence as well as other factors that would differentiate between the two regimens including the side-effect profiles of the treatments … On this basis the Committee considered that paclitaxel/platinum combination treatment should no longer be recommended exclusively as standard therapy for women receiving first-line chemotherapy for ovarian cancer. [The original recommendation was made when ICON3 had not been published.] … both platinum therapy alone and a combination of paclitaxel and a platinum compound were appropriate first-line treatments for women with ovarian cancer.’

This guidance allows clinicians to exercise their prejudice, either to believe the two positive trials or the two negative trials, containing many more patients. The meta-analysis, although showing no statistical benefit for the addition of paclitaxel, has not added to the guidance since its conclusion has been ignored. The statistical heterogeneity was so great that it would have been better to have discussed the two pairs of trials in narrative fashion, rather than to have produced a meta-analysis of the four trials.

The main areas where heterogeneity occurs include:

Taking the easy option – a narrative review

A classical narrative review frequently suffers from a lack of systematic ways of avoiding bias. There is also a worrying tendency to choose papers that support the reviewer’s own point of view. This is fine in an opinion piece designed to make people think, but is very dangerous if the reviewer purports to give a balanced assessment of the literature.

Traditional narrative reviews generally do not use meta-analysis, but will often undertake an exercise often referred to as vote counting. In such a process, there is a sum of the number of positive and negative studies; the overall interpretation depending on whether or not the number of positive studies exceeds the negative group or vice versa. However, there is a major flaw with this approach as it ignores the possibility that some studies are negative because they are simply too small.

Intentional exclusion of studies

In any meta-analysis, you have to draw a line somewhere. Studies that fail to meet your criteria will not be included in the results. Where the cut-off point is based on judgment, such as trial quality, this can sometimes cause serious controversy. A Cochrane Review of mammographic screening for breast cancer found seven eligible studies, but only two were deemed to be of sufficient quality to include in the review; meta-analysis of the two studies found no benefit from screening. If all seven studies were included the outcome was positive. This Cochrane review provoked a furious response and intense debate over the rights and wrongs of excluding most of the available trials (Example Box 26.5).

Evidence-based medicine (EBM)

Evidence-based medicine (EBM) brings together best research evidence with clinical expertise and patient expectations. Best research evidence means clinically relevant research (basic sciences), clinical research into the accuracy of diagnostic tests (includes clinical examination), the power of prognostic and predictive markers, and the efficacy and safety of all types of interventions. Clinical expertise includes the ability to use our clinical skills and past experience to recognize a patient’s individual diagnosis and general health, the potential risks and benefits of interventions, and to integrate this with their personal expectations.