The Art of the Clinical Trial

Published on 27/03/2015 by admin

Filed under Neurosurgery

Last modified 22/04/2025

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 0 (0 votes)

This article have been viewed 1229 times

Chapter 210 The Art of the Clinical Trial

The purpose of the clinical trial is to determine the safety, efficacy, and/or effectiveness of a medical intervention. A clinical trial is, by definition, prospective and has a control group. The number of clinical trials that address diseases and interventions of the spine is increasing (Fig. 210-1). As discussed in Chapter 211, retrospective studies play a significant role in clinical spine surgery research; however, this chapter focuses exclusively on prospective studies.

The randomized controlled trial (RCT) is the gold standard in evidence-based medicine. Very few RCTs have been completed to date to guide spine surgeons. Making matters more complex, of the RCTs that do reach completion, many do not provide a definitive answer to the research question and as a result, do not change practice. Lack of blinding in most surgical RCTs also introduces an informational bias that can affect outcomes data both from patients and from clinical observers.

The goal of this chapter is to review the methodology behind the clinical trial and to understand how the clinical trial may advance the practice of neurosurgery in general and spine surgery in particular.

Comparative Effectiveness

The term comparative effectiveness has become widely used since passage of the American Recovery and Reinvestment Act of 2009, which allocated $1.1 billion for new research projects aimed at understanding the differences among established medical treatments.1 Effectiveness differs from efficacy. An intervention is effective if it actually results in a satisfactory outcome when utilized broadly in the community. For example, an emergency tracheotomy might be efficacious when performed by trained personnel, but it might be ineffective or even dangerous when applied by untrained individual first responders. In general, a clinical trial is a carefully controlled scientific study that examines the efficacy, or safety, of an intervention. A clinical trial can evaluate the comparative effectiveness of an intervention versus an alternative, if and only if, the results are truly “generalizeable.” That is, the study population should represent the vast majority of patients with a particular condition, and, in the case of a surgical trial, the trial surgeons’ skill set should resemble that of an average practitioner. For example, transarticular C1-2 screws have been shown to be efficacious in treating C1-2 instability, but their effectiveness in the community has not been assessed in large practical trials involving surgeons without highly specialized spine surgery skills.2

Prospective Studies

A prospective cohort study is an investigation that includes two or more groups of similar patients who undergo different treatments and who are then assessed for a specific outcome. In spine surgery, most cohort studies are interventional in nature. The ongoing Arbeitsgemeinschaft für Osteosynthesefragen (AO) Foundation–sponsored cervical spondylotic myelopathy (CSM) trial is a typical example. In this multicenter effort, patients with CSM undergo either ventral or dorsal surgery and are assessed using validated outcomes instruments at regular intervals. The nonrandomized study has accrued over 300 unselected patients and therefore provides some comparative effectiveness data.3 In this study, the baseline characteristics between treatment groups (ventral and dorsal surgery patients) are significant, and therefore differences in outcome observed following ventral and dorsal approaches cannot be necessarily attributed to the surgical approach. In fact, the degree of selection bias in many prospective cohort studies precludes any real conclusion regarding the primary research question. Selection bias might not always be related to the actual pathology. In a different prospective cohort study comparing 36 patients who underwent microsurgical resection with 46 patients who underwent stereotactic radiosurgery for treating vestibular schwannoma, the groups were compared using health-related quality of life (HR-QOL), functional, and radiographic tumor control outcome measures. Although the groups were similar with regard to tumor size preoperatively, the surgical group was significantly younger, thereby raising questions about the validity and generalizability of the results.4

Nonrandomized data have been reported to demonstrate significant differences between treatments 56% of the time, whereas RCTs show “significant” differences only 30% of the time.5 In addition to information bias, a publication bias, or the tendency of authors and editors to favor publication of studies with “positive” results, may exist. Systematic efforts to compare RCTs and nonrandomized studies on a number of medical and surgical topics have reached different conclusions.6 For nonrandomized trials to match the validity of an RCT, the inclusion and exclusion criteria must be clear, and known prognostic factors should be balanced with the utilization of objective outcome assessments.6,7

Randomized Controlled Trials

It is generally agreed that randomized controlled trials (RCTs) are the gold standard for determining whether one intervention is superior, equivalent, or inferior to an alternative.7 This is because all nonrandomized experiments might fail to balance important baseline prognostic variables, introducing bias into the results of the trial. It is estimated, however, that fewer than 1% of published papers in leading neurosurgical journals are RCTs.8 Some of the key completed RCTs (discussed in this chapter) that address spinal and other neurosurgical conditions are listed in Table 210-1.

There are a number of significant barriers to performing high-quality RCTs in spine surgery. One of these is the heterogeneity of spine diseases—the myriad symptoms caused by a single spinal anatomic abnormality and the clinical differences between patients with identical radiographic findings. The variation of the back pain population, for example, limits the ability to perform well-designed clinical trials comparing the nonoperative and operative treatments. Fairbank et al. performed an RCT (349 patients) comparing surgery to intensive rehabilitation therapy for back pain.9 Despite using a large sample size with validated and appropriate outcomes instruments, it was difficult for the investigators to draw conclusions about the surgical treatment of low back pain because the clinical entity was itself so heterogeneous. Making matters more complex, the most significant variables that result in patient heterogeneity are, often, unknown. A recent lumbar RCT comparing the CHARITÉ (DePuy, Raynham, MA) lumbar arthroplasty to anterior lumbar interbody fusion for the treatment of low back pain demonstrated statistically significant improvements in multiple outcome measures at 2 years.10 The study population, however, was not well defined and thus left clinicians to selectively choose ideal candidates for the study. Particularly when trials are designed to investigate a new technology and supported by corporate funding, this type of selection bias is common and can lead, in part, to a failure by a medical payer system (Centers for Medicare and Medicaid Services) to adopt and ultimately pay for the new technology despite its being supported by class I data.

One of the most important barriers to performing a surgical clinical trial is lack of equipoise. This term, popularized by Freedman in a classic 1987 paper, means “genuine uncertainty within the expert medical community” on the optimal approach for a certain medical condition.11 RCTs are ethical and feasible only when there is clinical equipoise between the treatment arms. Lack of clinical equipoise affected the National Institutes of Health (NIH)–sponsored Spine Patient Outcomes Research Trial (SPORT), an RCT that compared surgery versus nonoperative management for symptomatic lumbar disc herniation.12 The high crossover rate (30% from the nonoperative cohort to the operative cohort within 3 months), suggested that clinicians, patients, or both felt that surgery would provide a greater chance of clinical benefit after 6 weeks of failed conservative management. Conversely, almost as many patients randomized to receive surgery did not have an operation, indicating that patients had strong opinions favoring the role of conservative treatment when symptoms were mild or improving. In retrospect, the lack of clinical equipoise limited the ability of the study to detect better outcomes from surgery.13

Another SPORT RCT examined surgical versus nonsurgical treatment for degenerative lumbar spondylolisthesis.14 Patients were included if they had neurogenic claudication or radicular leg pain, with spinal stenosis and degenerative spondylolisthesis on imaging. These patients were randomized to either nonoperative treatment or to decompressive laminectomy, with or without bilateral single-level fusion, with or without iliac crest bone grafting, and with or without pedicle screw instrumentation. This RCT also demonstrated high rates of crossover due to a lack of clinical equipoise. However, the methodology also demonstrated that heterogeneity of treatment can limit the ability to generalize results. In this trial, the underlying assumption that instrumented fusion, noninstrumented fusion, and decompression alone are equivalent may not be true, and therefore, the trial does not provide meaningful information about which treatment is optimal for the management of grade I spondylolisthesis.

By strict statistical criteria, an RCT should be analyzed by the intent-to-treat principle. That is, the outcomes are analyzed not by which treatment the patient actually received but rather by which treatment group they were randomly assigned to. This approach preserves the integrity of randomization, which theoretically balances confounding risk factors—both known and unknown. For example, in the asymptomatic carotid atherosclerosis study (ACAS), patients randomized to receive surgery were analyzed as such even if they had an angiographic complication after randomization (not related to surgery) or even if they did not undergo surgery at all.15 When crossover rates are high, the intention-to-treat analysis is less likely to detect a difference between two treatments.13 In the SPORT lumbar discectomy trial, the intent-to-treat analysis did not detect any benefit from surgery (crossover rate was 30%), although the as-treated analysis showed a significant benefit from surgery.12

The validity of a study analysis is also compromised when significant clinical data are missing. Response bias can occur when a subject does not fully complete questionnaires at each time point of the study. If the reasons that subjects do not participate (e.g., anger over surgical outcome) differ between the arms of the study, then a response bias exists. In the first published study of SPORT, the degree of missing data was between 24% to 27%.16,17

Another difficulty in designing RCTs for spine surgery is the learning curve associated with the clinical application of a new technology. If a practitioner has not performed a procedure with a new technology, it is likely the complication rate will be higher because of the learning curve associated with this technology. There has been a constant evolution of novel spine procedures, exemplified by the interbody fusion techniques. Current techniques for interbody fixation and fusion are changing at such a rapid pace that trials designed today to test these newer technologies might be obsolete and therefore irrelevant prior to the trials’ completion. A recent RCT compared use of femoral ring allograft versus a titanium cage in circumferential lumbar spine fusion.18 Clinical outcome was measured by the Oswestry Disability Index (ODI),19 Visual Analogue Score (VAS),20 and Short-Form 36 (SF-36)21 with 2-year follow-up. The trial found greater clinical improvements in all outcome scales with femoral ring allograft than for titanium cages. These results, and the higher cost of titanium cages, prompted the authors to state that use of cages in lumbar fusion was not justified. However, the surgical procedure performed in this study is now rarely performed. This “front-and-back” approach, using dorsal screw fixation in addition to retroperitoneal ventral approach for placement of interbody graft, has been replaced by a single approach to achieve circumferential fusion. More recent lumbar techniques include minimally invasive transforaminal techniques (possibly that reduce muscle trauma) often supplemented with cages and recombinant bone morphogenetic protein (BMP). Although this RCT was well-designed, its results cannot be applied to more recent lumbar fusion techniques.

Informational Bias

One of the potential advantages of preserving motion when treating cervical spondylotic diseases is the opportunity to limit adjacent-level cervical disc degeneration following either anterior cervical discectomy and fusion (ACDF) or implantation of cervical disc arthroplasty. In one recently published RCT comparing ACDF with replacement using a Bryan cervical disc (Medtronic Sofamor Danek, Memphis, TN), the radiographic outcomes in patients who underwent ACDF were compared with those in patients who underwent cervical disc replacement.22 Radiologic evidence of adjacent-level change included new or enlarging osteophytes, new narrowing of the disc space, and calcification of the anterior longitudinal ligament. Measured at 20 months after surgery by plain radiograph, 23% of patients with single-level ACDF developed radiologic evidence of adjacent-level disease compared with 12.8% of patients treated with the Bryan artificial disc. Because the researchers could easily discern which patients had Bryan disc placement versus those with ACDF, the radiographic measurements could not be blinded. The researcher’s review of the radiographs was subjective and might have been biased in favor of those patients receiving the artificial disc replacement.

Patients can also be affected by informational bias. Patients who received the artificial disc returned to work sooner than those who underwent ACDF. Again, both patients (many of whom entered the RCT for a chance to gain access to potentially “better” technology) and surgeon might have been biased and sent patients to work sooner than the fusion groups because they “believed” that the artificial disc might be superior. Studies with potentially subjective outcome measures are considered less valid because the treatment effects may be overestimated by information bias.7

Another type of informational bias is the placebo effect, or the influence of the patient’s expectations on the treatment outcome. When comparing surgical treatment to nonoperative treatment, one way to limit the placebo effect is to perform a sham surgery in the control group. This raises difficult ethical questions in surgical RCTs because sham procedures might lead to harm in some control subjects without any potential clinical benefit. Although this is the case, some have argued that when genuine clinical equipoise exists in a surgical RCT, sham surgery is ethically justified.17 Two recent RCTs evaluated vertebroplasty for osteoporotic vertebral fractures and performed sham procedures.23,24 In both studies, subjects in the control arm were given conscious sedation and the periosteum of pedicles injected with bupivacaine. In one study, the pedicles were actually cannulated, but no polymethylmethacrylate was injected.23 Neither study found a beneficial effect from vertebroplasty as compared with the sham procedure. However, subjects reported improvement in symptoms in both groups. Although the results of these studies have been questioned due to the potential mechanism of pain relief in vertebroplasty and the injection of periosteum with anesthetic, the powerful effect of placebo was demonstrated.

Research Question

When designing a clinical trial, the most important aspect is to clearly identify the primary question of the study. This question should test a hypothesis. To give a simple example, a primary question might be whether surgery improves back pain or not. To test this hypothesis, the researcher must define several points:

Many of these points were discussed earlier in this chapter. It is challenging to define inclusion-exclusion criteria that reduce heterogeneity in the spine population, but still allow the results of the trial to be generalizeable. Similarly, obstacles exist for choosing the surgical procedure, as many new technologies render old procedures obsolete. Careful selection of outcomes measurements is critical for spine surgery RCTs. Many assessment tools are available for measuring functional outcomes after spine surgery.25

A functional outcomes scale must fulfill three criteria: (1) reliability—repetition should be consistent within and between observers; (2) validity—it must measure the property intended; and (3) responsiveness—it must detect differences in severity among populations and reflect these differences quantitatively. An outcome measure may be disease-specific, such as the Oswestry Disability Index (ODI),19 or a health-related quality of life (HR-QOL) measure, such as the EuroQOL-5D26 or SF-36.21 Many RCTs use a combination of functional outcome measurements.9 For example, the CHARITÉ lumbar artificial trial measured outcomes using VAS, ODI, and SF-36 instruments.10 In some trials, a preference-based HR-QOL outcome (e.g., the Euro-QOL-5D) is used, because this score (that has been scaled to equal 0 for death, and 1 for perfect health) can be used to calculate quality-adjusted life-year (QALY) for performing an economic analysis.27,28

Influence

The primary goal of an RCT is to answer a question that will change clinical practice. The carotid trials (ACAS and NASCET) are good examples.29,30 There was uncertainty regarding the value of carotid endarterectomy (CEA), and both trials confirmed the utility of surgery. Today, CEA is considered the gold standard for the treatment of hemodynamically significant carotid stenosis, and newer therapies are being compared with CEA in ongoing well-designed RCTs (Carotid Revascularization Endarterectomy Versus Stenting Trial [CREST]).31

A trial designed to confirm an intervention that is already the standard of care is not likely to alter practice. Patchell et al. performed a clinical trial of 101 patients with spinal cord compression from metastatic disease to the spinal column. Patients were randomized over a 10-year period to receive either surgical decompression with radiotherapy or radiotherapy alone.32 The study was stopped after interim analysis because the primary end point, the ability to walk, was met in significantly more patients in the surgery plus radiotherapy group. This trial did not ultimately have broad influence because it demonstrated what was already widely considered to be the standard of care by the time the results of the trial were published.

Finally, reimbursement plays an important role in the adoption of newer technology in the United States. Even though the well-designed RCTs demonstrated that cervical arthroplasty was not inferior to cervical fusion, the decision by the Center for Medicare and Medicaid Services not to reimburse for these procedures has greatly limited their utilization.3335

Cost

Well-designed clinical trials cost millions of dollars. For example, the NIH-funded SPORT trials cost $13.5 million.36 Multicenter trials require significant organization and mobilization of resources (e.g., steering committees, investigator meetings, data safety monitoring boards). In addition, data collection requires dedicated research coordinators and auditing systems to ensure adequate follow-up and high-quality data for analysis. This raises the question of whether the cost of a clinical trial is worth the data produced. Indeed, the cost of conducting high-quality clinical research can be prohibitive and might limit what can realistically be studied. Large clinical trials should generally be restricted to common diseases about which there is great uncertainty regarding treatment outcomes or for which there is a large differential in cost or safety.

References

1. Steinbrook R. Health care and the American Recovery and Reinvestment Act. N Engl J Med. 2009;360(11):1057-1060.

2. Dickman C.A., Sonntag V.K. Posterior C1-C2 transarticular screw fixation for atlantoaxial arthrodesis. Neurosurgery. 1998;43(2):275-280. discussion 280–281

3. Fehlings M., Kopjar B., Massicotte E., et al. Surgical treatment for cervical spondylotic myelopathy: one year outcomes of a prospective multicenter study of 316 patients. Spine J. 2008;8:S33-S34.

4. Pollock B.E., Driscoll C.L., Foote R.L., et al. Patient outcomes after vestibular schwannoma management: a prospective comparison of microsurgical resection and stereotactic radiosurgery. Neurosurgery. 2006;59(1):77-85. discussion 77–85

5. Chalmers T.C., Celano P., Sacks H.S., Smith H.Jr. Bias in treatment assignment in controlled clinical trials. N Engl J Med. 1983;309(22):1358-1361.

6. McKee M., Britton A., Black N., et al. Methods in health services research. Interpreting the evidence: choosing between randomised and non-randomised studies. BMJ. 1999;319(7205):312-315.

7. Concato J., Shah N., Horwitz R.I. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887-1892.

8. Gnanalingham K.K., Tysome J., Martinez-Canca J., Barazi S.A. Quality of clinical studies in neurosurgical journals: signs of improvement over three decades. J Neurosurg. 2005;103(3):439-443.

9. Fairbank J., Frost H., Wilson-MacDonald J., et al. Randomised controlled trial to compare surgical stabilisation of the lumbar spine with an intensive rehabilitation programme for patients with chronic low back pain: the MRC spine stabilisation trial. BMJ. 2005;330(7502):1233.

10. Guyer R.D., McAfee P.C., Banco R.J., et al. Prospective, randomized, multicenter Food and Drug Administration investigational device exemption study of lumbar total disc replacement with the CHARITE artificial disc versus lumbar fusion: five-year follow-up. Spine J. 2009;9(5):374-386.

11. Freedman B. Equipoise and the ethics of clinical research. N Engl J Med. 1987;317(3):141-145.

12. Weinstein J.N., Lurie J.D., Tosteson T.D., et al. Surgical vs nonoperative treatment for lumbar disk herniation: the Spine Patient Outcomes Research Trial (SPORT) observational cohort. JAMA. 2006;296(20):2451-2459.

13. Ghogawala Z.B.F., Carter B.S. Clinical equipoise and the surgical randomized controlled trial. Neurosurgery. 2008;62(6):N9-N10.

14. Weinstein J.N., Lurie J.D., Tosteson T.D., et al. Surgical versus nonsurgical treatment for lumbar degenerative spondylolisthesis. N Engl J Med. 2007;356(22):2257-2270.

15. Buchbinder R., Bombardier C., Yeung M., Tugwell P. Which outcome measures should be used in rheumatoid arthritis clinical trials? Clinical and quality-of-life measures’ responsiveness to treatment in a randomized controlled trial. Arthritis Rheum. 1995;38(11):1568-1580.

16. Weinstein J.N., Tosteson T.D., Lurie J.D., et al. Surgical vs nonoperative treatment for lumbar disk herniation: the Spine Patient Outcomes Research Trial (SPORT): a randomized trial. JAMA. 2006;296(20):2441-2450.

17. Flum D.R. Interpreting surgical trials with subjective outcomes: avoiding UnSPORTsmanlike conduct. JAMA. 2006;296(20):2483-2485.

18. McKenna P.J., Freeman B.J., Mulholland R.C., et al. A prospective, randomised controlled trial of femoral ring allograft versus a titanium cage in circumferential lumbar spinal fusion with minimum 2-year clinical results. Eur Spine J. 2005;14(8):727-737.

19. Fairbank J.C., Couper J., Davies J.B., O’Brien J.P. The Oswestry low back pain disability questionnaire. Physiotherapy. 1980;66(8):271-273.

20. Wewers M.E., Lowe N.K. A critical review of visual analogue scales in the measurement of clinical phenomena. Res Nurs Health. 1990;13(4):227-236.

21. Ware JEJ K.M., Keller S.D. SF-36: Physical and mental health summary scales: a manual for users of version 1, ed 2. Lincoln, RI: Quality Metric; 2001.

22. Kim S.W., Limson M.A., Kim S.B., et al. Comparison of radiographic changes after ACDF versus Bryan disc arthroplasty in single and bi-level cases. Eur Spine J. 2009;18(2):218-231.

23. Buchbinder R., Osborne R.H., Ebeling P.R., et al. A randomized trial of vertebroplasty for painful osteoporotic vertebral fractures. N Engl J Med. 2009;361(6):557-568.

24. Kallmes D.F., Comstock B.A., Heagerty P.J., et al. A randomized trial of vertebroplasty for osteoporotic spinal fractures. N Engl J Med. 2009;361(6):569-579.

25. Resnick D.K., Choudhri T.F., Dailey A.T., et al. Guidelines for the performance of fusion procedures for degenerative disease of the lumbar spine. Part 2: assessment of functional outcome. J Neurosurg Spine. 2005;2(6):639-646.

26. EuroQol—a new facility for the measurement of health-related quality of life. The EuroQol Group. Health Policy. 1990;16(3):199-208.

27. Gold M. Panel on cost-effectiveness in health and medicine. Med Care. 1996;34(Suppl 12):DS197-DS199.

28. Resnick D.K., Choudhri T.F., Dailey A.T., et al. Guidelines for the performance of fusion procedures for degenerative disease of the lumbar spine. Part 3: assessment of economic outcome. J Neurosurg Spine. 2005;2(6):647-652.

29. . Clinical alert: benefit of carotid endarterectomy for patients with high-grade stenosis of the internal carotid artery. National Institute of Neurological Disorders and Stroke Stroke and Trauma Division. North American Symptomatic Carotid Endarterectomy Trial (NASCET) investigators. Stroke. 1991;22;6:816-817.

30. . Endarterectomy for asymptomatic carotid artery stenosis. Executive Committee for the Asymptomatic Carotid Atherosclerosis Study. JAMA. 1995;273;18:1421-1428.

31. Roubin G.S., New G., Iyer S.S., et al. Immediate and late clinical outcomes of carotid artery stenting in patients with symptomatic and asymptomatic carotid artery stenosis: a 5-year prospective analysis. Circulation. 2001;103(4):532-537.

32. Patchell R.A., Tibbs P.A., Regine W.F., et al. Direct decompressive surgical resection in the treatment of spinal cord compression caused by metastatic cancer: a randomised trial. Lancet. 2005;366(9486):643-648.

33. Murrey D., Janssen M., Delamarter R., et al. Results of the prospective, randomized, controlled multicenter Food and Drug Administration investigational device exemption study of the ProDisc-C total disc replacement versus anterior discectomy and fusion for the treatment of 1-level symptomatic cervical disc disease. Spine J. 2009;9(4):275-286.

34. Mummaneni P.V., Burkus J.K., Haid R.W., et al. Clinical and radiographic analysis of cervical disc arthroplasty compared with allograft fusion: a randomized controlled clinical trial. J Neurosurg Spine. 2007;6(3):198-209.

35. Heller J.G., Sasso R.C., Papadopoulos S.M., et al. Comparison of BRYAN cervical disc arthroplasty with anterior cervical decompression and fusion: clinical and radiographic results of a randomized, controlled, clinical trial. Spine (Phila Pa 1976). 2009;34(2):101-107.

36. Birkmeyer N.J., Weinstein J.N., Tosteson A.N., et al. Design of the Spine Patient Outcomes Research Trial (SPORT). Spine (Phila Pa 1976). 2002;27(12):1361-1372.

37. Weinstein J.N., Tosteson T.D., Lurie J.D., et al. Surgical versus nonsurgical therapy for lumbar spinal stenosis. N Engl J Med. 2008;358(8):794-810.