The Art of the Clinical Trial

Published on 27/03/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 1522 times

Chapter 210 The Art of the Clinical Trial

The purpose of the clinical trial is to determine the safety, efficacy, and/or effectiveness of a medical intervention. A clinical trial is, by definition, prospective and has a control group. The number of clinical trials that address diseases and interventions of the spine is increasing (Fig. 210-1). As discussed in Chapter 211, retrospective studies play a significant role in clinical spine surgery research; however, this chapter focuses exclusively on prospective studies.

FIGURE 210-1 Increase in number of spine clinical trials between 2000 and 2009.

The randomized controlled trial (RCT) is the gold standard in evidence-based medicine. Very few RCTs have been completed to date to guide spine surgeons. Making matters more complex, of the RCTs that do reach completion, many do not provide a definitive answer to the research question and as a result, do not change practice. Lack of blinding in most surgical RCTs also introduces an informational bias that can affect outcomes data both from patients and from clinical observers.

The goal of this chapter is to review the methodology behind the clinical trial and to understand how the clinical trial may advance the practice of neurosurgery in general and spine surgery in particular.

Comparative Effectiveness

The term comparative effectiveness has become widely used since passage of the American Recovery and Reinvestment Act of 2009, which allocated $1.1 billion for new research projects aimed at understanding the differences among established medical treatments.¹ Effectiveness differs from efficacy. An intervention is effective if it actually results in a satisfactory outcome when utilized broadly in the community. For example, an emergency tracheotomy might be efficacious when performed by trained personnel, but it might be ineffective or even dangerous when applied by untrained individual first responders. In general, a clinical trial is a carefully controlled scientific study that examines the efficacy, or safety, of an intervention. A clinical trial can evaluate the comparative effectiveness of an intervention versus an alternative, if and only if, the results are truly “generalizeable.” That is, the study population should represent the vast majority of patients with a particular condition, and, in the case of a surgical trial, the trial surgeons’ skill set should resemble that of an average practitioner. For example, transarticular C1-2 screws have been shown to be efficacious in treating C1-2 instability, but their effectiveness in the community has not been assessed in large practical trials involving surgeons without highly specialized spine surgery skills.²

Prospective Studies

A prospective cohort study is an investigation that includes two or more groups of similar patients who undergo different treatments and who are then assessed for a specific outcome. In spine surgery, most cohort studies are interventional in nature. The ongoing Arbeitsgemeinschaft für Osteosynthesefragen (AO) Foundation–sponsored cervical spondylotic myelopathy (CSM) trial is a typical example. In this multicenter effort, patients with CSM undergo either ventral or dorsal surgery and are assessed using validated outcomes instruments at regular intervals. The nonrandomized study has accrued over 300 unselected patients and therefore provides some comparative effectiveness data.³ In this study, the baseline characteristics between treatment groups (ventral and dorsal surgery patients) are significant, and therefore differences in outcome observed following ventral and dorsal approaches cannot be necessarily attributed to the surgical approach. In fact, the degree of selection bias in many prospective cohort studies precludes any real conclusion regarding the primary research question. Selection bias might not always be related to the actual pathology. In a different prospective cohort study comparing 36 patients who underwent microsurgical resection with 46 patients who underwent stereotactic radiosurgery for treating vestibular schwannoma, the groups were compared using health-related quality of life (HR-QOL), functional, and radiographic tumor control outcome measures. Although the groups were similar with regard to tumor size preoperatively, the surgical group was significantly younger, thereby raising questions about the validity and generalizability of the results.⁴

Nonrandomized data have been reported to demonstrate significant differences between treatments 56% of the time, whereas RCTs show “significant” differences only 30% of the time.⁵ In addition to information bias, a publication bias, or the tendency of authors and editors to favor publication of studies with “positive” results, may exist. Systematic efforts to compare RCTs and nonrandomized studies on a number of medical and surgical topics have reached different conclusions.⁶ For nonrandomized trials to match the validity of an RCT, the inclusion and exclusion criteria must be clear, and known prognostic factors should be balanced with the utilization of objective outcome assessments.^6,⁷

Randomized Controlled Trials

It is generally agreed that randomized controlled trials (RCTs) are the gold standard for determining whether one intervention is superior, equivalent, or inferior to an alternative.⁷ This is because all nonrandomized experiments might fail to balance important baseline prognostic variables, introducing bias into the results of the trial. It is estimated, however, that fewer than 1% of published papers in leading neurosurgical journals are RCTs.⁸ Some of the key completed RCTs (discussed in this chapter) that address spinal and other neurosurgical conditions are listed in Table 210-1.

TABLE 210-1 Summary of Key Recent Randomized Clinical Trials in Neurosurgery

There are a number of significant barriers to performing high-quality RCTs in spine surgery. One of these is the heterogeneity of spine diseases—the myriad symptoms caused by a single spinal anatomic abnormality and the clinical differences between patients with identical radiographic findings. The variation of the back pain population, for example, limits the ability to perform well-designed clinical trials comparing the nonoperative and operative treatments. Fairbank et al. performed an RCT (349 patients) comparing surgery to intensive rehabilitation therapy for back pain.⁹ Despite using a large sample size with validated and appropriate outcomes instruments, it was difficult for the investigators to draw conclusions about the surgical treatment of low back pain because the clinical entity was itself so heterogeneous. Making matters more complex, the most significant variables that result in patient heterogeneity are, often, unknown. A recent lumbar RCT comparing the CHARITÉ (DePuy, Raynham, MA) lumbar arthroplasty to anterior lumbar interbody fusion for the treatment of low back pain demonstrated statistically significant improvements in multiple outcome measures at 2 years.¹⁰ The study population, however, was not well defined and thus left clinicians to selectively choose ideal candidates for the study. Particularly when trials are designed to investigate a new technology and supported by corporate funding, this type of selection bias is common and can lead, in part, to a failure by a medical payer system (Centers for Medicare and Medicaid Services) to adopt and ultimately pay for the new technology despite its being supported by class I data.

One of the most important barriers to performing a surgical clinical trial is lack of equipoise. This term, popularized by Freedman in a classic 1987 paper, means “genuine uncertainty within the expert medical community” on the optimal approach for a certain medical condition.¹¹ RCTs are ethical and feasible only when there is clinical equipoise between the treatment arms. Lack of clinical equipoise affected the National Institutes of Health (NIH)–sponsored Spine Patient Outcomes Research Trial (SPORT), an RCT that compared surgery versus nonoperative management for symptomatic lumbar disc herniation.¹² The high crossover rate (30% from the nonoperative cohort to the operative cohort within 3 months), suggested that clinicians, patients, or both felt that surgery would provide a greater chance of clinical benefit after 6 weeks of failed conservative management. Conversely, almost as many patients randomized to receive surgery did not have an operation, indicating that patients had strong opinions favoring the role of conservative treatment when symptoms were mild or improving. In retrospect, the lack of clinical equipoise limited the ability of the study to detect better outcomes from surgery.¹³

Another SPORT RCT examined surgical versus nonsurgical treatment for degenerative lumbar spondylolisthesis.¹⁴ Patients were included if they had neurogenic claudication or radicular leg pain, with spinal stenosis and degenerative spondylolisthesis on imaging. These patients were randomized to either nonoperative treatment or to decompressive laminectomy, with or without bilateral single-level fusion, with or without iliac crest bone grafting, and with or without pedicle screw instrumentation. This RCT also demonstrated high rates of crossover due to a lack of clinical equipoise. However, the methodology also demonstrated that heterogeneity of treatment can limit the ability to generalize results. In this trial, the underlying assumption that instrumented fusion, noninstrumented fusion, and decompression alone are equivalent may not be true, and therefore, the trial does not provide meaningful information about which treatment is optimal for the management of grade I spondylolisthesis.

By strict statistical criteria, an RCT should be analyzed by the intent-to-treat principle. That is, the outcomes are analyzed not by which treatment the patient actually received but rather by which treatment group they were randomly assigned to. This approach preserves the integrity of randomization, which theoretically balances confounding risk factors—both known and unknown. For example, in the asymptomatic carotid atherosclerosis study (ACAS), patients randomized to receive surgery were analyzed as such even if they had an angiographic complication after randomization (not related to surgery) or even if they did not undergo surgery at all.¹⁵ When crossover rates are high, the intention-to-treat analysis is less likely to detect a difference between two treatments.¹³ In the SPORT lumbar discectomy trial, the intent-to-treat analysis did not detect any benefit from surgery (crossover rate was 30%), although the as-treated analysis showed a significant benefit from surgery.¹²

The validity of a study analysis is also compromised when significant clinical data are missing. Response bias can occur when a subject does not fully complete questionnaires at each time point of the study. If the reasons that subjects do not participate (e.g., anger over surgical outcome) differ between the arms of the study, then a response bias exists. In the first published study of SPORT, the degree of missing data was between 24% to 27%.^16,¹⁷

Another difficulty in designing RCTs for spine surgery is the learning curve associated with the clinical application of a new technology. If a practitioner has not performed a procedure with a new technology, it is likely the complication rate will be higher because of the learning curve associated with this technology. There has been a constant evolution of novel spine procedures, exemplified by the interbody fusion techniques. Current techniques for interbody fixation and fusion are changing at such a rapid pace that trials designed today to test these newer technologies might be obsolete and therefore irrelevant prior to the trials’ completion. A recent RCT compared use of femoral ring allograft versus a titanium cage in circumferential lumbar spine fusion.¹⁸ Clinical outcome was measured by the Oswestry Disability Index (ODI),¹⁹ Visual Analogue Score (VAS),²⁰ and Short-Form 36 (SF-36)²¹ with 2-year follow-up. The trial found greater clinical improvements in all outcome scales with femoral ring allograft than for titanium cages. These results, and the higher cost of titanium cages, prompted the authors to state that use of cages in lumbar fusion was not justified. However, the surgical procedure performed in this study is now rarely performed. This “front-and-back” approach, using dorsal screw fixation in addition to retroperitoneal ventral approach for placement of interbody graft, has been replaced by a single approach to achieve circumferential fusion. More recent lumbar techniques include minimally invasive transforaminal techniques (possibly that reduce muscle trauma) often supplemented with cages and recombinant bone morphogenetic protein (BMP). Although this RCT was well-designed, its results cannot be applied to more recent lumbar fusion techniques.

Informational Bias

One of the potential advantages of preserving motion when treating cervical spondylotic diseases is the opportunity to limit adjacent-level cervical disc degeneration following either anterior cervical discectomy and fusion (ACDF) or implantation of cervical disc arthroplasty. In one recently published RCT comparing ACDF with replacement using a Bryan cervical disc (Medtronic Sofamor Danek, Memphis, TN), the radiographic outcomes in patients who underwent ACDF were compared with those in patients who underwent cervical disc replacement.²² Radiologic evidence of adjacent-level change included new or enlarging osteophytes, new narrowing of the disc space, and calcification of the anterior longitudinal ligament. Measured at 20 months after surgery by plain radiograph, 23% of patients with single-level ACDF developed radiologic evidence of adjacent-level disease compared with 12.8% of patients treated with the Bryan artificial disc. Because the researchers could easily discern which patients had Bryan disc placement versus those with ACDF, the radiographic measurements could not be blinded. The researcher’s review of the radiographs was subjective and might have been biased in favor of those patients receiving the artificial disc replacement.

Patients can also be affected by informational bias. Patients who received the artificial disc returned to work sooner than those who underwent ACDF. Again, both patients (many of whom entered the RCT for a chance to gain access to potentially “better” technology) and surgeon might have been biased and sent patients to work sooner than the fusion groups because they “believed” that the artificial disc might be superior. Studies with potentially subjective outcome measures are considered less valid because the treatment effects may be overestimated by information bias.⁷

Another type of informational bias is the placebo effect, or the influence of the patient’s expectations on the treatment outcome. When comparing surgical treatment to nonoperative treatment, one way to limit the placebo effect is to perform a sham surgery in the control group. This raises difficult ethical questions in surgical RCTs because sham procedures might lead to harm in some control subjects without any potential clinical benefit. Although this is the case, some have argued that when genuine clinical equipoise exists in a surgical RCT, sham surgery is ethically justified.¹⁷ Two recent RCTs evaluated vertebroplasty for osteoporotic vertebral fractures and performed sham procedures.^23,²⁴ In both studies, subjects in the control arm were given conscious sedation and the periosteum of pedicles injected with bupivacaine. In one study, the pedicles were actually cannulated, but no polymethylmethacrylate was injected.²³ Neither study found a beneficial effect from vertebroplasty as compared with the sham procedure. However, subjects reported improvement in symptoms in both groups. Although the results of these studies have been questioned due to the potential mechanism of pain relief in vertebroplasty and the injection of periosteum with anesthetic, the powerful effect of placebo was demonstrated.

Research Question

When designing a clinical trial, the most important aspect is to clearly identify the primary question of the study. This question should test a hypothesis. To give a simple example, a primary question might be whether surgery improves back pain or not. To test this hypothesis, the researcher must define several points:

• Population with back pain being studied

• Exact indications for treatment (i.e., inclusion and exclusion criteria)

• Surgical procedure and its indications

• Nonoperative therapy

• How improvement will be measured

• What will constitute a meaningful difference

Many of these points were discussed earlier in this chapter. It is challenging to define inclusion-exclusion criteria that reduce heterogeneity in the spine population, but still allow the results of the trial to be generalizeable. Similarly, obstacles exist for choosing the surgical procedure, as many new technologies render old procedures obsolete. Careful selection of outcomes measurements is critical for spine surgery RCTs. Many assessment tools are available for measuring functional outcomes after spine surgery.²⁵

A functional outcomes scale must fulfill three criteria: (1) reliability—repetition should be consistent within and between observers; (2) validity—it must measure the property intended; and (3) responsiveness—it must detect differences in severity among populations and reflect these differences quantitatively. An outcome measure may be disease-specific, such as the Oswestry Disability Index (ODI),¹⁹ or a health-related quality of life (HR-QOL) measure, such as the EuroQOL-5D²⁶ or SF-36.²¹ Many RCTs use a combination of functional outcome measurements.⁹ For example, the CHARITÉ lumbar artificial trial measured outcomes using VAS, ODI, and SF-36 instruments.¹⁰ In some trials, a preference-based HR-QOL outcome (e.g., the Euro-QOL-5D) is used, because this score (that has been scaled to equal 0 for death, and 1 for perfect health) can be used to calculate quality-adjusted life-year (QALY) for performing an economic analysis.^27,²⁸

Influence

The primary goal of an RCT is to answer a question that will change clinical practice. The carotid trials (ACAS and NASCET) are good examples.^29,³⁰ There was uncertainty regarding the value of carotid endarterectomy (CEA), and both trials confirmed the utility of surgery. Today, CEA is considered the gold standard for the treatment of hemodynamically significant carotid stenosis, and newer therapies are being compared with CEA in ongoing well-designed RCTs (Carotid Revascularization Endarterectomy Versus Stenting Trial [CREST]).³¹

A trial designed to confirm an intervention that is already the standard of care is not likely to alter practice. Patchell et al. performed a clinical trial of 101 patients with spinal cord compression from metastatic disease to the spinal column. Patients were randomized over a 10-year period to receive either surgical decompression with radiotherapy or radiotherapy alone.³² The study was stopped after interim analysis because the primary end point, the ability to walk, was met in significantly more patients in the surgery plus radiotherapy group. This trial did not ultimately have broad influence because it demonstrated what was already widely considered to be the standard of care by the time the results of the trial were published.

Finally, reimbursement plays an important role in the adoption of newer technology in the United States. Even though the well-designed RCTs demonstrated that cervical arthroplasty was not inferior to cervical fusion, the decision by the Center for Medicare and Medicaid Services not to reimburse for these procedures has greatly limited their utilization.³³^–³⁵

Feasibility

The feasibility of a clinical trial is determined by the likelihood of completion and collection of meaningful results. The primary question and hypothesis of the trial should be of practical value to a clinician, and sufficient equipoise must exist between the interventions to be tested. Inclusion and exclusion criteria must be carefully determined so that the study population will truly represent the majority of patients treated for the condition. Data collected from multiple sites for an RCT have the advantage of increasing the generalizability of the results. However, careful preliminary research must confirm that there is adequate clinical volume at each site to meet enrollment goals. Each clinical site should have dedicated research personnel who are trained in regulatory issues and in the essentials of performing clinical research. Investigational review boards (IRBs) must approve all study protocols and ensure that patient confidentiality is preserved. Research coordinators should not only be familiar with all functional assessment scales and other measurements, but also should be versatile with complex data management. Often web-based data management platforms are needed. Ensuring adequate and complete follow-up is another difficult task, requiring the efforts of multiple trained clinical research staff.

Pilot Studies

The pilot study is essential to confirm feasibility. The pilot study confirms that each research site has adequate clinical volume and that the organizational structure is functioning properly (clinical study coordinators, data managers, IRBs) to collect high-quality prospective clinical data. Pilot studies should aim at collecting patients from each site involved with the goal of at least 80% compliance in collecting follow-up data over a specific time frame. A pilot study also permits formal biostatistical calculation of sample size, ensuring that there is adequate power to answer the primary question in the larger definitive trial.

Cost

Well-designed clinical trials cost millions of dollars. For example, the NIH-funded SPORT trials cost $13.5 million.³⁶ Multicenter trials require significant organization and mobilization of resources (e.g., steering committees, investigator meetings, data safety monitoring boards). In addition, data collection requires dedicated research coordinators and auditing systems to ensure adequate follow-up and high-quality data for analysis. This raises the question of whether the cost of a clinical trial is worth the data produced. Indeed, the cost of conducting high-quality clinical research can be prohibitive and might limit what can realistically be studied. Large clinical trials should generally be restricted to common diseases about which there is great uncertainty regarding treatment outcomes or for which there is a large differential in cost or safety.