Modern medicine is sometimes accused of callous application of science to human problems and of subordinating the interest of the individual to those of the group (society).¹ Official regulatory bodies rightly require scientific evaluation of drugs. Drug developers need to satisfy the official regulators and they also seek to persuade the medical profession to prescribe their products. Patients, too, are more aware of the comparative advantages and limitations of their medicines than they used to be. To some extent, this helps encourage patients to participate in trials so that future patients can benefit, as they do now, from the knowledge gained from such trials. An ethical framework is required to ensure that the interests of the individual participant take precedence over those of society (and, more obviously, those of an individual or corporate investigator).

Research involving human subjects

The definition of research continues to present difficulties. The distinction between medical research and innovative medical practice derives from the intent. In medical practice the sole intention is to benefit the individual patient consulting the clinician, not to gain knowledge of general benefit, though such knowledge may incidentally emerge from the clinical experience gained. In medical research the primary intention is to advance knowledge so that patients in general may benefit; the individual patient may or may not benefit directly.²

Consider also the process of audit, which is used extensively to assess performance, e.g. by individual health-care workers, by departments within hospitals or between hospitals. Audit is a systematic examination designed to determine the degree to which an action or set of actions achieves predetermined standards. Whereas research seeks to address ‘known unknowns’ and often discovers ‘unknown unknowns,³ audit is limited to the monitoring of ‘known knowns’: maybe important, but clearly limited.

Ethics of research in humans ⁴

Some dislike the word ‘experiment’ in relation to humans, thinking that its mere use implies a degree of impropriety in what is done. It is better that all should recognise from the true meaning of the word, ‘to ascertain or establish by trial’,⁵ that the benefits of modern medicine derive almost wholly from experimentation and that some risk is inseparable from much medical advance.

The issue of (adequately informed) consent is a principal concern for Research Ethics Committees (also called Institutional Review Boards). People have the right to choose for themselves whether or not they will participate in research, i.e. they have the right to self-determination (the ethical principle of autonomy). They should be given whatever information is necessary for making an adequately informed choice (consent) with the right to withdraw at any stage. Consent procedures, especially information on risks, loom larger in research than they do in medical practice. This is appropriate given that in research, patients may be submitting themselves to extra risks, or simply to extra inconvenience (e.g. more or longer visits). It is a moot point whether more consent in routine practice might not go amiss. It is also likely that patients participating in well-conducted trials receive more, and sometimes better, care and attention than might otherwise be available. Sometimes the unintended consequences of ethical procedures include causing unnecessary apprehension to patients with long, legalistic documents, and creating a false impression of clinical researchers as people from whom patients need protection.

The moral obligation of all doctors lies in ensuring that in their desire to help patients (the ethical principle of beneficence) they should never allow themselves to put the individual who has sought their aid at any disadvantage (the ethical principle of non-maleficence) for ‘the scientist or physician has no right to choose martyrs for society’.⁶

In principle, it may be thought proper to perform a therapeutic trial only when doctors (and patients) have genuine uncertainty as to which treatment is best.⁷ Not all trials are comparisons of different treatments. Some, especially early phase trials of new drugs, are comparisons of different doses. Comparisons of new with old should usually offer patients the chance of receiving current best treatment with one which might be better. Since this is often rather more than is offered in resource-constrained routine care, the obligatory patient information sheet mantra that ‘the decision whether to take part has no bearing on your usual care’ may be economical with the truth. But it is also simplistic to view the main purpose of all trials with medicines as comparative.

The past decade has seen the pharmaceutical industry struggle to match the pace of new understanding about disease pathogenesis, and models of research are being adapted to the complexity of common disease that is now apparent. In diseases where many good medicines already exist, the industry spent much time developing minor modifications which were broadly equivalent to current therapy with possible advantages for some patients. With many of the standard blockbusters now off patent, new drugs for such diseases are unattractive, and the industry is concentrating more on harder therapeutic targets where no satisfactory treatment yet exists. Just as in basic science, non-hypothesis-led ‘fishing expedition’ research – genome scans, microarrays – is no longer frowned upon, so the imaginative clinical investigator must throw his stone – a new medicine – into the pond, and be able to make sense of the ripples. One such approach is to move away from trial design in which it is the average response of the group which is of interest towards the design in which the investigator attempts to match differences in response to differences – ethnic, gender, genetic – among the patients. Matches at a molecular level give clues both to how the drug may best be used, and who will benefit most.

The ethics of the randomised and placebo-controlled trial

Providing that ethical surveillance is rooted in the ethical principles of justice,⁸ there should be no difficulty in clinical research adapting to current needs. And even if the nature of early phase research is changing, the randomised controlled trial will remain the cornerstone of how cause and effect is proven in clinical practice, and how drugs demonstrate the required degree of efficacy and safety to obtain a licence for their prescription.

The use of a placebo (or dummy) raises both ethical and scientific issues (see placebo medicines and the placebo effect, Ch. 2). There are clear-cut cases when placebo use would be ethically unacceptable and scientifically unnecessary, e.g. drug trials in epilepsy and tuberculosis, when the control groups comprise patients receiving the best available therapy.

The pharmacologically inert (placebo) treatment arm of a trial is useful:

• To distinguish the pharmacodynamic effects of a drug from the psychological effects of the act of medication and the circumstances surrounding it, e.g. increased interest by the doctor, more frequent visits, for these latter may have their placebo effect. Placebo responses have been reported in 30–50% of patients with depression and in 30–80% with chronic stable angina pectoris.

• To distinguish drug effects from natural fluctuations in disease that occur with time, e.g. with asthma or hay fever, and other external factors, provided active treatment, if any, can be ethically withheld. This is also called the ‘assay sensitivity’ of the trial.

• To avoid false conclusions. The use of placebos is valuable in Phase 1 healthy volunteer studies of novel drugs to help determine whether minor but frequently reported adverse events are drug related or not. Although a placebo treatment can pose ethical problems, it is often preferable to the continued use of treatments of unproven efficacy or safety. The ethical dilemma of subjects suffering as a result of receiving a placebo (or ineffective drug) can be overcome by designing clinical trials that provide mechanisms to allow them to be withdrawn (‘escape’) when defined criteria are reached, e.g. blood pressure above levels that represent treatment failure. Similarly, placebo (or new drug) can be added against a background of established therapy; this is called the ‘add on’ design.

• To provide a result using fewer research subjects. The difference in response when a test drug is compared with a placebo is likely to be greater than that when a test drug is compared with the best current, i.e. active, therapy (see later).

Investigators who propose to use a placebo, or otherwise withhold effective treatment, should justify their intention. The variables to consider are:

• The severity of the disease.

• The effectiveness of standard therapy.

• Whether the novel drug under test aims to give only symptomatic relief, or has the potential to prevent or slow up an irreversible event, e.g. stroke or myocardial infarction.

• The length of treatment.

• The objective of the trial (equivalence, superiority or non-inferiority; see p. 45). Thus it may be quite ethical to compare a novel analgesic against placebo for 2 weeks in the treatment of osteoarthritis of the hip (with escape analgesics available). It would not be ethical to use a placebo alone as comparator in a 6-month trial of a novel drug in active rheumatoid arthritis, even with escape analgesia.

The precise use of the placebo will depend on the study design, e.g. whether crossover, when all patients receive placebo at some point in the trial, or parallel group, when only one cohort receives placebo. Generally, patients easily understand the concept of distinguishing between the imagined effects of treatment and those due to a direct action on the body. Provided research subjects are properly informed and give consent freely, they are not the subject of deception in any ethical sense; but a patient given a placebo in the absence of consent is deceived and research ethics committees will, rightly, decline to agree to this. (See also: Lewis et al (2002) in Guide to further reading, at the end of this chapter.)

Injury to research subjects ⁹

The question of compensation for accidental (physical) injury due to participation in research is a vexed one. Plainly there are substantial differences between the position of healthy volunteers (whether or not they are paid) and that of patients who may benefit and, in some cases, who may be prepared to accept even serious risk for the chance of gain. There is no simple answer. But the topic must always be addressed in any research carrying risk, including the risk of withholding known effective treatment. The CIOMS/WHO Guidelines ⁴ state:

Research subjects who suffer physical injury as a result of their participation are entitled to such financial or other assistance as would compensate them equitably for any temporary or permanent impairment or disability. In the case of death, their dependants are entitled to material compensation. The right to compensation may not be waived.

Therefore, when giving their informed consent to participate, research subjects should be told whether there is provision for compensation in case of physical injury, and the circumstances in which they or their dependants would receive it.

Payment of subjects in clinical trials

Healthy volunteers are usually paid to take part in a clinical trial. The rationale is that they will not benefit from treatment received and should be compensated for discomfort and inconvenience. There is a fine dividing line between this and a financial inducement, but it is unlikely that more than a small minority of healthy volunteer studies would now take place without a ‘fee for service’ provision, including ‘out of pocket’ expenses. It is all the more important that the sums involved are commensurate with the invasiveness of the investigations and the length of the studies. The monies should be declared and agreed by the ethics committee.

There is an intuitive abreaction by physicians to pay patients (compared with healthy volunteers), because they feel the accusation of inducement or persuasion could be levelled at them, and because they assuage any feeling of taking advantage of the doctor–patient relationship by the hope that the medicines under test may be of benefit to the individual. This is not an entirely comfortable position.¹⁰

Rational introduction of a new drug to humans

When studies in animals predict that a new molecule may be a useful medicine, i.e. effective and safe in relation to its benefits, then the time has come to put it to the test in humans. Most doctors will be involved in clinical trials at some stage of their career and need to understand the principles of drug development. When a new chemical entity offers a possibility of doing something that has not been done before or of doing something familiar in a different or better way, it can be seen to be worth testing. But where it is a new member of a familiar class of drug, potential advantage may be harder to detect. Yet these ‘me too’ drugs are often worth testing. Prediction from animal studies of modest but useful clinical advantage is particularly uncertain and, therefore, if the new drug seems reasonably effective and safe in animals it is rational to test it in humans. From the commercial standpoint, the investment in the development of a new drug can be over £500 million, but will be substantially less for a ‘me too’ drug entering an already developed and profitable market.

Phases of clinical development

Human experiments progress in a commonsense manner that is conventionally divided into four phases (Fig. 4.1). These phases are divisions of convenience in what is a continuous expanding process. It begins with a small number of subjects (healthy subjects and volunteer patients) closely observed in laboratory settings, and proceeds through hundreds of patients, to thousands before the drug is agreed to be a medicine by a national or international regulatory authority. It is then licensed for general prescribing (though this is by no means the end of the evaluation). The process may be abandoned at any stage for a variety of reasons, including poor tolerability or safety, inadequate efficacy and commercial pressures. The phases are:

• Phase 1. Human pharmacology (20–50 subjects):

healthy volunteers or volunteer patients, according to the class of drug and its safety

pharmacokinetics (absorption, distribution, metabolism, excretion)

pharmacodynamics (biological effects) where practicable, tolerability, safety, efficacy.

• Phase 2. Therapeutic exploration (50–300 subjects):

patients

pharmacokinetics and pharmacodynamic dose ranging, in carefully controlled studies for efficacy and safety,¹¹ which may involve comparison with placebo.

• Phase 3. Therapeutic confirmation (randomised controlled trials; 250–1000 + subjects):

patients

efficacy on a substantial scale; safety; comparison with existing drugs.

• Phase 4. Therapeutic use (pharmacovigilance, post-licensing studies) (2000–10 000 + subjects):

surveillance for safety and efficacy: further formal therapeutic trials, especially comparisons with other drugs, marketing studies and pharmacoeconomic studies.

Fig. 4.1 The phases of drug discovery and development.

(With permission of Pharmaceutical Research and Manufacturers of America.)

Official regulatory guidelines and requirements ¹²

For studies in humans (see also Ch. 6) these ordinarily include:

• Studies of pharmacokinetics and bioavailability and, in the case of generics, bioequivalence (equal bioavailability) with respect to the reference product.

• Therapeutic trials (reported in detail) that substantiate the safety and efficacy of the drug under likely conditions of use, e.g. a drug for long-term use in a common condition will require a total of at least 1000 patients (preferably more), depending on the therapeutic class, of which (for chronic diseases) at least 100 have been treated continuously for about 1 year.

• Special groups. If the drug will be used in, for example, the elderly or children, then these populations should be studied. New drugs are not normally studied in pregnant women. Studies in patients having disease that affects drug metabolism and elimination may be needed, such as patients with impaired liver or kidney function.

• Fixed-dose combination products will require explicit justification for each component.

• Interaction studies with other drugs likely to be taken simultaneously. Plainly, all possible combinations cannot be evaluated; a rational choice, based on knowledge of pharmacodynamics and pharmacokinetics, is made.

• The application for a licence for general use (marketing application) should include a draft Summary of Product Characteristics for prescribers. A Patient Information Leaflet must be submitted. These should include information on the form of the product (e.g. tablet, capsule, sustained-release, liquid), its uses, dosage (adults, children, elderly where appropriate), contraindications, warnings and precautions (less strong), side-effects/adverse reactions, overdose and how to treat it.

The emerging discipline of pharmacogenomics seeks to identify patients who will respond beneficially or adversely to a new drug by defining certain genotypic profiles. Individualised dosing regimens may be evolved as a result (see p. 101). This tailoring of drugs to individuals is consuming huge resources from drug developers but has yet to establish a place in routine drug development.

Therapeutic investigations

There are three key questions to be answered during drug development:

• Does it work?

• Is it safe?

• What is the dose?

With few exceptions, none of these is easy to answer definitively within the confines of a pre-registration clinical trials programme. Effectiveness and safety have to be balanced against each other. What may be regarded as ‘safe’ for a new oncology drug in advanced lung cancer would not be so regarded in the treatment of childhood eczema. The use of the term ‘dose’, without explanation, is irrational as it implies a single dose for all patients. Pharmaceutical companies cannot be expected to produce a large array of different doses for each medicine, but the maxim to use the smallest effective dose that results in the desired effect holds true. Some drugs require titration, others have a wide safety margin so that one ‘high’ dose may achieve optimal effectiveness with acceptable safety. There are two classes of endpoint or outcome of a therapeutic investigation:

• The therapeutic effect itself (sleep, eradication of infection), i.e. the outcome.

• A surrogate effect, a short-term effect that can be reliably correlated with long-term therapeutic benefit, e.g. blood lipids or glucose or blood pressure. A surrogate endpoint might also be a pharmacokinetic parameter, if it is indicative of the therapeutic effect, e.g. plasma concentration of an antiepileptic drug.

Use of surrogate effects presupposes that the disease process is fully understood. They are best justified in diseases for which the true therapeutic effect can be measured only by studying large numbers of patients over many years. Such long-term outcome studies are indeed always preferable but may be impracticable on organisational, financial and sometimes ethical grounds prior to releasing new drugs for general prescription. It is in areas such as these that the techniques of large-scale surveillance for efficacy, as well as for safety, under conditions of ordinary use (below), would be needed to supplement the necessarily smaller and shorter formal therapeutic trials employing surrogate effects. Surrogate endpoints are of particular value in early drug development to select candidate drugs from a range of agents.

Therapeutic evaluation

The aims of therapeutic evaluation are three-fold:

1. To assess the efficacy, safety and quality of new drugs to meet unmet clinical needs.

2. To expand the indications for the use of current drugs (or generic drugs ¹³) in clinical and marketing terms.

3. To protect public health over the lifetime of a given drug.

The process of therapeutic evaluation may be divided into pre- and post-registration phases (Table 4.1), the purposes of which are set out below.

Table 4.1 Process of therapeutic evaluation

When a new drug is being developed, the first therapeutic trials are devised to find out the best that the drug can do under conditions ideal for showing efficacy, e.g. uncomplicated disease of mild to moderate severity in patients taking no other drugs, with carefully supervised administration by specialist doctors. Interest lies particularly in patients who complete a full course of treatment. If the drug is ineffective in these circumstances there is no point in proceeding with an expensive development programme. Such studies are sometimes called explanatory trials as they attempt to ‘explain’ why a drug works (or fails to work) in ideal conditions.

If the drug is found useful in these trials, it becomes desirable next to find out how closely the ideal may be approached in the rough and tumble of routine medical practice: in patients of all ages, at all stages of disease, with complications, taking other drugs and relatively unsupervised. Interest continues in all patients from the moment they are entered into the trial and it is maintained if they fail to complete, or even to start, the treatment; the need is to know the outcome in all patients deemed suitable for therapy, not only in those who successfully complete therapy.¹⁴

The reason some drop out may be related to aspects of the treatment and it is usual to analyse these according to the clinicians’ initial intention (intention-to-treat analysis), i.e. investigators are not allowed to risk introducing bias by exercising their own judgement as to who should or should not be excluded from the analysis. In these real-life, or ‘naturalistic’, conditions the drug may not perform so well, e.g. minor adverse effects may now cause patient non-compliance, which had been avoided by supervision and enthusiasm in the early trials. These naturalistic studies are sometimes called ‘pragmatic’ trials.

The methods used to test the therapeutic value depend on the stage of development, who is conducting the study (a pharmaceutical company, or an academic body or health service at the behest of a regulatory authority), and the primary endpoint or outcome of the trial. The methods include:

• Formal therapeutic trials.

• Equivalence and non-inferiority trials.

• Safety surveillance methods.

Formal therapeutic trials are conducted during Phase 2 and Phase 3 of pre-registration development, and in the post-registration phase to test the drug in new indications. Equivalence trials aim to show the therapeutic equivalence of two treatments, usually the new drug under development and an existing drug used as a standard active comparator. Equivalence trials may be conducted before or after registration for the first therapeutic indication of the new drug (see p. 46 below for further discussion). Safety surveillance methods use the principles of pharmacoepidemiology (see p. 51) and are concerned mainly with evaluating adverse events and especially rare events, which formal therapeutic trials are unlikely to detect.

Need for statistics

In order truly to know whether patients treated in one way are benefited more than those treated in another, it is essential to use numbers. Statistics has been defined as ‘a body of methods for making wise decisions in the face of uncertainty’.¹⁵ Used properly, they are tools of great value for promoting efficient therapy. More than 100 years ago Francis Galton saw this clearly:

The human mind is … a most imperfect apparatus for the elaboration of general ideas … In our general impressions far too great weight is attached to what is marvellous … Experience warns us against it, and the scientific man takes care to base his conclusions upon actual numbers … to devise tests by which the value of beliefs may be ascertained.¹⁶

Concepts and terms

(With permission from Baber N, Smith R N, Griffin J P, O’Grady J, D’Arcy P F (eds) 1998 Textbook of Pharmaceutical Medicine, 3rd edn. Queen’s University of Belfast Press, Belfast.)

A confidence interval expresses a range of values that contains the true value with 95% (or other chosen percentage) certainty. The range may be broad, indicating uncertainty, or narrow, indicating (relative) certainty. A wide confidence interval occurs when numbers are small or differences observed are variable and points to a lack of information, whether the difference is statistically significant or not; it is a warning against placing much weight on (or confidence in) the results of small or variable studies. Confidence intervals are extremely helpful in interpretation, particularly of small studies, as they show the degree of uncertainty related to a result. Their use in conjunction with non-significant results may be especially enlightening.¹⁹

A finding of ‘not statistically significant’ can be interpreted as meaning there is no clinically useful difference only if the confidence intervals for the results are also stated in the report and are narrow. If the confidence intervals are wide, a real difference may be missed in a trial with a small number of subjects, i.e. the absence of evidence that there is a difference is not the same as showing that there is no difference. Small numbers of patients inevitably give low precision and low power to detect differences.

Types of error

The above discussion provides us with information on the likelihood of falling into one of the two principal kinds of error in therapeutic experiments, for the hypothesis that there is no difference between treatments may either be accepted incorrectly or rejected incorrectly.

Type I error

(α) is the finding of a difference between treatments when in reality they do not differ, i.e. rejecting the null hypothesis incorrectly. Investigators decide the degree of this error which they are prepared to tolerate on a scale in which 0 indicates complete rejection of the null hypothesis and 1 indicates its complete acceptance; clearly the level for α must be set near to 0. This is the same as the significance level of the statistical test used to detect a difference between treatments. Thus α (or P = 0.05) indicates that the investigators will accept a 5% chance that an observed difference is not a real difference.

Type II error

(β) is the finding of no difference between treatments when in reality they do differ, i.e. accepting the null hypothesis incorrectly. The probability of detecting this error is often given wider limits, e.g. β = 0.1–0.2, which indicates that the investigators are willing to accept a 10–20% chance of missing a real effect. Conversely, the power of the study (1 − β) is the probability of avoiding this error and detecting a real difference, in this case 80–90%.

It is up to the investigators to decide the target difference ²⁰ and what probability level (for either type of error) they will accept if they are to use the result as a guide to action.

Plainly, trials should be devised to have adequate precision and power, both of which are consequences of the size of study. It is also necessary to make an estimate of the likely size of the difference between treatments, i.e. the target difference. Adequate power is often defined as giving an 80–90% chance of detecting (at 1–5% statistical significance, P = 0.01–0.05) the defined useful target difference (say 15%). It is rarely worth starting a trial that has less than a 50% chance of achieving the set objective, because the power of the trial is too low.

Types of therapeutic trial

A therapeutic trial is:

a carefully, and ethically, designed experiment with the aim of answering some precisely framed question. In its most rigorous form it demands equivalent groups of patients concurrently treated in different ways or in randomised sequential order in crossover designs. These groups are constructed by the random allocation of patients to one or other treatment … In principle the method has application with any disease and any treatment. It may also be applied on any scale; it does not necessarily demand large numbers of patients.²¹

This is the classical randomised controlled trial (RCT), the most secure method for drawing a causal inference about the effects of treatments. Randomisation attempts to control biases of various kinds when assessing the effects of treatments. RCTs are employed at all phases of drug development and in the various types and designs of trials discussed below. Fundamental to any trial are:

• A hypothesis.

• The definition of the primary endpoint.

• The method of analysis.

• A protocol.

Other factors to consider when designing or critically appraising a trial are:

• The characteristics of the patients.

• The general applicability of the results.

• The size of the trial.

• The method of monitoring.

• The use of interim analyses.²²

• The interpretation of subgroup comparisons.

The aims of a therapeutic trial, not all of which can be attempted at any one occasion, are to decide:

• Whether a treatment is effective.

• The magnitude of that effect (compared with other remedies – or doses, or placebo).

• The types of patients in whom it is effective.

• The best method of applying the treatment (how often, and in what dosage if it is a drug).

• The disadvantages and dangers of the treatment.

Dose–response trials

Response in relation to the dose of a new investigational drug may be explored in all phases of drug development. Dose–response trials serve a number of objectives, of which the following are of particular importance:

• Confirmation of efficacy (hence a therapeutic trial).

• Investigation of the shape and location of the dose–response curve.

• The estimation of an appropriate starting dose.

• The identification of optimal strategies for individual dose adjustments.

• The determination of a maximal dose beyond which additional benefit is unlikely to occur.

Superiority, equivalence and non-inferiority in clinical trials

The therapeutic efficacy of a novel drug is most convincingly established by demonstrating superiority to placebo, or to an active control treatment, or by demonstrating a dose–response relationship (as above).

In some cases the purpose of a comparison is to show not necessarily superiority, but either equivalence or non-inferiority. Such trials avoid the use of placebo, explore possible advantages of safety, dosing convenience and cost, and present an alternative or ‘second-line’ therapy. Examples of a possible outcome in a ‘head to head’ comparison of two active treatments appear in Figure 4.2.

There are in general, two types of equivalence trials in clinical development: bio-equivalence and clinical equivalence. In the former, certain pharmacokinetic variables of a new formulation have to fall within specified (and regulated) margins of the standard formulation of the same active entity. The advantage of this type of trial is that, if bioequivalence is ‘proven’, then proof of clinical equivalence is not required.

Design of trials

Techniques to avoid bias

The two most important techniques are:

• Randomisation.

• Blinding.

Randomisation

Introduces a deliberate element of chance into the assignment of treatments to the subjects in a clinical trial. It provides a sound statistical basis for the evaluation of the evidence relating to treatment effects, and tends to produce treatment groups that have a balanced distribution of prognostic factors, both known and unknown. Together with blinding, it helps to avoid possible bias in the selection and allocation of subjects.

Randomisation may be accomplished in simple or more complex ways, such as:

• Sequential assignments of treatments (or sequences in crossover trials).

• Randomising subjects in blocks. This helps to increase comparability of the treatment groups when subject characteristics change over time or there is a change in recruitment policy. It also gives a better guarantee that the treatment groups will be of nearly equal size.

• By dynamic allocation, in which treatment allocation is influenced by the current balance of allocated treatments.²³

Blinding

The fact that both doctors and patients are subject to bias due to their beliefs and feelings has led to the invention of the double-blind technique, which is a control device to prevent bias from influencing results. On the one hand, it rules out the effects of hopes and anxieties of the patient by giving both the drug under investigation and a placebo (dummy) of identical appearance in such a way that the subject (the first ‘blind’ person) does not know which he or she is receiving. On the other hand, it also rules out the influence of preconceived hopes of, and unconscious communication by, the investigator or observer by keeping him or her (the second ‘blind’ person) ignorant of whether he or she is prescribing a placebo or an active drug. At the same time, the technique provides another control, a means of comparison with the magnitude of placebo effects. The device is both philosophically and practically sound.²⁴

A non-blind trial is called an open trial.

The double-blind technique should be used wherever possible, and especially for occasions when it might at first sight seem that criteria of clinical improvement are objective when in fact they are not. For example, the range of voluntary joint movement in rheumatoid arthritis has been shown to be influenced greatly by psychological factors, and a moment’s thought shows why, for the amount of pain patients will put up with is influenced by their mental state.

Blinding should go beyond the observer and the observed. None of the investigators should be aware of treatment allocation, including those who evaluate endpoints, assess compliance with the protocol and monitor adverse events. Breaking the blind (for a single subject) should be considered only when the subject’s physician deems knowledge of the treatment assignment essential in the subject’s best interests.

Sometimes the double-blind technique is not possible, because, for example, side-effects of an active drug reveal which patients are taking it or tablets look or taste different; but it never carries a disadvantage (‘only protection against biased data’). It is not, of course, used with new chemical entities fresh from the animal laboratory, whose dose and effects in humans are unknown, although the subject may legitimately be kept in ignorance (single blind) of the time of administration. Single-blind techniques have a place in therapeutics research, but only when the double-blind procedure is impracticable or unethical.

Ophthalmologists are understandably disinclined to refer to the ‘double-blind’ technique; they call it double-masked.

Some common design configurations

Parallel group design

This is the most common clinical trial design for confirmatory therapeutic (Phase 3) trials. Subjects are randomised to one of two or more treatment ‘arms’. These treatments will include the investigational drug at one or more doses, and one or more control treatments such as placebo and/or an active comparator. Parallel group designs are particularly useful in conditions that fluctuate over a short term, e.g. migraine or irritable bowel syndrome, but are also used for chronic stable diseases such as Parkinson’s disease and some types of cancer. The particular advantages of the parallel group design are simplicity, the ability to approximate more closely the likely conditions of use, and the avoidance of ‘carry-over effects’ (see below).

Crossover design

In this design, each subject is randomised to a sequence of two or more treatments, and hence acts as his or her own control for treatment comparisons. The advantage of this design is that subject-to-subject variation is eliminated from treatment comparison so that the number of subjects is reduced.

In the basic crossover design each subject receives each of the two treatments in a randomised order. There are variations to this in which each subject receives a subset of treatments or ones in which treatments are repeated within the same subject (to explore the reproducibility of effects).

The potential disadvantage of the crossover design is carry-over, i.e. the residual influence of treatments on subsequent treatment periods. This can often be avoided either by separating treatments with a ‘wash-out’ period or by selecting treatment lengths based on a knowledge of the disease and the new medication. The crossover design is best suited for chronic stable diseases, e.g. hypertension, chronic stable angina pectoris, where the baseline conditions are attained at the start of each treatment arm. The pharmacokinetic characteristics of the new medication are also important, the principle being that the plasma concentration at the start of the next dosing period is zero and no dynamic effect can be detected.

Factorial designs

In the factorial design, two or more treatments are evaluated simultaneously through the use of varying combinations of the treatments. The simplest example is the 2 × 2 factorial design in which subjects are randomly allocated to one of four possible combinations of two treatments A and B. These are: A alone, B alone, A + B, neither A nor B (placebo). The main uses of the factorial design are to:

• Make efficient use of clinical trial subjects by evaluating two treatments with the same number of individuals.

• Examine the interaction of A with B.

• Establish dose–response characteristics of the combination of A and B when the efficacy of each has been previously established.

Multicentre trials

Multicentre trials are carried out for two main reasons. First, they are an efficient way of evaluating a new medication, by accruing sufficient subjects in a reasonable time to satisfy trial objectives. Second, multicentre trials may be designed to provide a better basis for the subsequent generalisation of their findings. Thus they provide the possibility of recruiting subjects from a wide population and of administering the medication in a broad range of clinical settings. Multicentre trials can be used at any phase in clinical development, but are especially valuable when used to confirm therapeutic value in Phase 3. Large-scale multicentre trials using minimised data collection techniques and simple endpoints have been of immense value in establishing modest but real treatment effects that apply to a large number of patients, e.g. drugs that improve survival after myocardial infarction.

N-of-1 trials

Patients give varied treatment responses and the average effect derived from a population sample may not be helpful in expressing the size of benefit or harm for an individual. In the future, pharmacogenomics may provide an answer, but in the meantime the best way to settle doubt as to whether a test drug is effective for an individual patient is the N-of-1 trial. This is a crossover design in which each patient receives two or more administrations of drug or placebo in random manner; the results from individuals can then be displayed. Two conditions apply. First, the disease in which the drug is being tested must be chronic and stable. Second, the treatment effect must wear off rapidly. N-of-1 trials are not used routinely in drug development and, if so, only at the Phase 3 stage.²⁵^,²⁶

Historical controls

Any temptation simply to give a new treatment to all patients and to compare the results with the past (historical controls) is almost always unacceptable, even with a disease such as leukaemia. The reasons are that standards of diagnosis and treatment change with time, and the severity of some diseases (infections) fluctuates. The general provision stands that controls must be concurrent and concomitant. An exception to this rule is the case–control study (see p. 52).

Size of trials

Before the start of any controlled trial it is necessary to decide the number of patients that will be needed to deliver an answer, for ethical as well as practical reasons. This is determined by four factors:

1. The magnitude of the difference sought or expected on the primary efficacy endpoint (the target difference). For between-group studies, the focus of interest is the mean difference that constitutes a clinically significant effect.

2. The variability of the measurement of the primary endpoint as reflected by the standard deviation of this primary outcome measure. The magnitude of the expected difference (above) divided by the standard deviation of the difference gives the standardised difference (Fig. 4.3).

3. The defined significance level, i.e. the level of chance for accepting a Type I (α) error. Levels of 0.05 (5%) and 0.01 (1%) are common targets.

4. The power or desired probability of detecting the required mean treatment difference, i.e. the level of chance for accepting a Type II (β) error. For most controlled trials, a power of 80–90% (0.8–0.9) is frequently chosen as adequate, although higher power is chosen for some studies.

It will be intuitively obvious that a small difference in the effect that can be detected between two treatment groups, or a large variability in the measurement of the primary endpoint, or a high significance level (low P value) or a large power requirement, all act to increase the required sample size. Figure 4.3 gives a graphical representation of how the power of a clinical trial relates to values of clinically relevant standardised difference for varying numbers of trial subjects (shown by the individual curves). It is clear that the larger the number of subjects in a trial, the smaller is the difference that can be detected for any given power value.

The aim of any clinical trial is to have small Type I and II errors, and consequently sufficient power to detect a difference between treatments, if it exists. Of the four factors that determine sample size, the power and significance level are chosen to suit the level of risk felt to be appropriate. The magnitude of the effect can be estimated from previous experience with drugs of the same or similar action; the variability of the measurements is often known from published experiments on the primary endpoint, with or without drug. These data will not be available for novel substances in a new class, and frequently the sample size in the early phase of development is chosen on a more arbitrary basis. Numbers required to detect the difference in frequency of a categorical outcome, e.g. fractures in a trial of osteoporosis or remissions in a cancer trial, are generally larger than numbers required to detect differences in a continuous quantitative variable. As an example, a trial that would detect, at the 5% level of statistical significance, a treatment that raised a cure rate from 75% to 85% would require 500 patients for 80% power.

Fixed sample size and sequential designs

Defining when a clinical trial should end is not as simple as it first appears. In the standard clinical trial the end is defined by the passage of all of the recruited subjects through the complete design. However, it is results and decisions based on the results that matter, not the number of subjects. The result of the trial may be that one treatment is superior to another or that there is no difference. These trials are of fixed sample size. In fact, patients are recruited sequentially, but the results are analysed at a fixed time-point.

The results of this type of trial may be disappointing if they miss the agreed and accepted level of significance.

It is not legitimate, having just failed to reach the agreed level (say, P = 0.05), to take in a few more patients in the hope that they will bring P value down to 0.05 or less, for this is deliberately not allowing chance and the treatment to be the sole factors involved in the outcome, as they should be.

An alternative (or addition) to repeating the fixed sample size trial is to use a sequential design in which the trial is run until a useful result is reached.²⁷ These adaptive designs, in which decisions are taken on the basis of results to date, can assess results on a continuous basis as data for each subject become available or, more commonly, on groups of subjects (group sequential design). The essential feature of these designs is that the trial is terminated when a predetermined result is attained and not when the investigator looking at the results thinks it appropriate. Reviewing results in a continuous or interim basis requires formal interim analysis and there are specific statistical methods for handling the data, which need to be agreed in advance. Group sequential designs are especially successful in large long-term trials of mortality or major non-fatal endpoints when safety must be monitored closely.

Such sequential designs recognise the reality of medical practice and provide a reasonable balance between statistical, medical and ethical needs. Interim analyses, however, reduce the power of statistical significance tests each time that they are performed, and carry a risk of false positive result if chance differences between groups are encountered before the scheduled end of a trial.

Sensitivity of trials

Definitive therapeutic trials are expensive and on occasion may be so prolonged that aspects of treatment have been superseded by the time a result is obtained. A single trial, however well designed, executed and analysed, can answer only the question addressed. The regulatory authorities give guidance as to the number and design of trials that, if successful, would lead to a therapeutic claim. But changing clinical practice in the longer term depends on many other factors, of which confirmatory trials in other centres by different investigators under different conditions are an important part.

Meta-analysis

The two main outcomes for therapeutic trials are to influence clinical practice and, where appropriate, to make a successful claim for a drug with the regulatory authorities. Investigators are eternally optimistic and frequently plan their trials to look for large effects. Reality is different. The results of a planned (or unplanned) series of clinical trials may vary considerably for several reasons, but most significantly because the studies are too small to detect a treatment effect. In common but serious diseases such as cancer or heart disease, however, even small treatment effects can be important in terms of their total impact on public health. It may be unreasonable to expect dramatic advances in these diseases; we should be looking for small effects. Drug developers, too, should be interested not only in whether a treatment works, but also how well, and for whom.

The collecting together of a number of trials with the same objective in a systematic review²⁸and analysing the accumulated results using appropriate statistical methods is termed meta-analysis. The principles of a meta-analysis are that:

• It should be comprehensive, i.e. include data from all trials, published and unpublished.

• Only randomised controlled trials should be analysed, with patients entered on the basis of ‘intention to treat’.²⁹

• The results should be determined using clearly defined, disease-specific endpoints (this may involve a re-analysis of original trials).

There are strong advocates and critics of the concept, its execution and interpretation. Arguments that have been advanced against meta-analysis are:

• An effect of reasonable size ought to be demonstrable in a single trial.

• Different study designs cannot be pooled.

• Lack of accessibility of all relevant studies.

• Publication bias (‘positive’ trials are more likely to be published).

In practice, the analysis involves calculating an odds ratio for each trial included in the meta-analysis. This is the ratio of the number of patients experiencing a particular endpoint, e.g. death, and the number who do not, compared with the equivalent figures for the control group. The number of deaths observed in the treatment group is then compared with the number to be expected if it is assumed that the treatment is ineffective, to give the observed minus expected statistic. The treatment effects for all trials in the analysis are then obtained by summing all the ‘observed minus expected’ values of the individual trials to obtain the overall odds ratio. An odds ratio of 1.0 indicates that the treatment has no effect, an odds ratio of 0.5 indicates a halving and an odds ratio of 2.0 indicates a doubling of the risk that patients will experience the chosen endpoint.

From the position of drug development, the general requirement that scientific results have to be repeatable has been interpreted in the past by the Food and Drug Administration (the regulatory agency in the USA) to mean that two well controlled studies are required to support a claim. But this requirement is itself controversial and its relation to a meta-analysis in the context of drug development is unclear.

In clinical practice, and in the era of cost-effectiveness, the use of meta-analysis as a tool to aid medical decision-making and underpinning ‘evidence-based medicine’ is here to stay.

Figure 4.4 shows detailed results from 11 trials in which antiplatelet therapy after myocardial infarction was compared with a control group. The number of vascular events per treatment group is shown in the second and third columns, and the odds ratios with the point estimates (the value most likely to have resulted from the study) are represented by black squares and their 95% confidence intervals (CI) in the fourth column.

Fig. 4.4 A clear demonstration of benefits from meta-analysis of available trial data, when individual trials failed to provide convincing evidence (see text).

(Reproduced with permission of Collins R 2001 Lancet 357:373–380.)

The size of the square is proportional to the number of events. The diamond gives the point estimate and CI for overall effect.

Results: implementation

The way in which data from therapeutic trials are presented can influence doctors’ perceptions of the advisability of adopting a treatment in their routine practice.

Relative and absolute risk

The results of therapeutic trials are commonly expressed as the percentage reduction of an unfavourable (or percentage increase in a favourable) outcome, i.e. as the relative risk, and this can be very impressive indeed until the figures are presented as the number of individuals actually affected per 100 people treated, i.e. as the absolute risk.

Where a baseline risk is low, a statement of relative risk alone is particularly misleading as it implies large benefit where the actual benefit is small. Thus a reduction of risk from 2% to 1% is a 50% relative risk reduction, but it saves only one patient for every 100 patients treated. But where the baseline is high, say 40%, a 50% reduction in relative risk saves 20 patients for every 100 treated.

To make clinical decisions, readers of therapeutic studies need to know: how many patients must be treated ³⁰ (and for how long) to obtain one desired result (number needed to treat). This is the inverse (or reciprocal) of absolute risk reduction.

Relative risk reductions can remain high (and thus make treatments seem attractive) even when susceptibility to the events being prevented is low (and the corresponding numbers needed to be treated are large). As a result, restricting the reporting of efficacy to just relative risk reductions can lead to great – and at times excessive – zeal in decisions about treatment for patients with low susceptibilities.³¹

A real-life example follows:

Antiplatelet drugs reduce the risk of future non-fatal myocardial infarction by 30% [relative risk] in trials of both primary and secondary prevention. But when the results are presented as the number of patients who need to be treated for one nonfatal myocardial infarction to be avoided [absolute risk] they look very different.

In secondary prevention of myocardial infarction, 50 patients need to be treated for 2 years, while in primary prevention 200 patients need to be treated for 5 years, for one non-fatal myocardial infarction to be prevented. In other words, it takes 100 patient-years of treatment in primary prevention to produce the same beneficial outcome of one fewer non-fatal myocardial infarction.³²

Whether a low incidence of adverse drug effects is acceptable becomes a serious issue in the context of absolute risk. Non-specialist doctors, particularly those in primary care, need and deserve clear and informative presentation of therapeutic trial results that measure the overall impact of a treatment on the patient’s life, i.e. on clinically important outcomes such as morbidity, mortality, quality of life, working capacity, fewer days in hospital. Without it, they cannot adequately advise patients, who may themselves be misled by inappropriate use of statistical data in advertisements or on internet sites.

Important aspects of therapeutic trial reports

• Statistical significance and its clinical importance.

• Confidence intervals.

• Number needed to treat, or absolute risk.

Pharmacoepidemiology

Pharmacoepidemiology is the study of the use and effects of drugs in large numbers of people. Some of the principles of pharmacoepidemiology are used to gain further insight into the efficacy, and especially the safety, of new drugs once they have passed from limited exposure in controlled therapeutic pre-registration trials to the looser conditions of their use in the community. Trials in this setting are described as observational because the groups to be compared are assembled from subjects who are, or who are not (the controls), taking the treatment in the ordinary way of medical care. These (Phase 4) trials are subject to greater risk of selection bias ³³ and confounding³⁴ than experimental studies (randomised controlled trials) where entry and allocation of treatment are strictly controlled (increasing internal validity). Observational studies, nevertheless, come into their own when sufficiently large randomised trials are logistically and financially impracticable. The following approaches are used.

Observational cohort ³⁵ studies

Patients receiving a drug are followed up to determine the outcomes (therapeutic or adverse). This is usually forward-looking (prospective) research. A cohort study does not require a suspicion of causality; subjects can be followed ‘to see what happens’ (event recording). Prescription event monitoring (below) is an example, and there is an increasing tendency to recognise that most new drugs should be monitored in this way when prescribing becomes general. Major difficulties include the selection of an appropriate control group, and the need for large numbers of subjects and for prolonged surveillance. This sort of study is scientifically inferior to the experimental cohort study (the randomised controlled trial) and is cumbersome for research on drugs.

Investigation of the question of thromboembolism and the combined oestrogen–progestogen contraceptive pill by means of an observational cohort study required enormous numbers of subjects ³⁶ (the adverse effect is, fortunately, uncommon) followed over years. An investigation into cancer and the contraceptive pill by an observational cohort would require follow-up for 10–15 years. Happily, epidemiologists have devised a partial alternative: the case–control study.

Case–control studies

This reverses the direction of scientific logic from a forward-looking, ‘what happens next’ (prospective) to a backward-looking, ‘what has happened in the past’ (retrospective)³⁷ investigation. The case–control study requires a definite hypothesis or suspicion of causality, such as an adverse reaction to a drug. The investigator assembles a group of patients who have the condition. A control group of people who have not had the reaction is then assembled (matched, e.g. for sex, age, smoking habits) from hospital admissions for other reasons, primary care records or electoral rolls. A complete drug history is taken from each group, i.e. the two groups are ‘followed up’ backwards to determine the proportion in each group that has taken the suspect agent. Case–control studies do not prove causation.³⁸ They reveal associations and it is up to investigators and critical readers to decide the most plausible explanation.

A case–control study has the advantage that it requires a much smaller number of cases (hundreds) of disease and can thus be done quickly and cheaply. It has the disadvantage that it follows up subjects backwards and there is always suspicion of the intrusion of unknown and so unavoidable biases in the selection of both patients and controls. Here again, independent repetition of the studies, if the results are the same, greatly enhances confidence in the outcome.