Kinematics of the Aging Spine: A Review of Past Knowledge and Survey of Recent Developments, with a Focus on Patient-Management Implications for the Clinical Practitioner

Published on 11/04/2015 by admin

Last modified 22/04/2025

Print this page

This article have been viewed 1778 times

10 Kinematics of the Aging Spine: A Review of Past Knowledge and Survey of Recent Developments, with a Focus on Patient-Management Implications for the Clinical Practitioner

Adam K. Deitz, Alan C. Breen, Fiona E. Mellor, Deydre S. Teyhen, Kris W.N. Wong, Monohar M. Panjabi

KEY POINTS

• Functional testing of the spine (the flexion/extension and lateral bending x-rays that have been the standard of care for over 60 years) is used clinically in the detection of hypermobility and pseudarthrosis.

• Over the years, many investigators have published normative ranges of intervertebral range of motion (RoM) from asymptomatic subjects using the current standard of care; however, all of these studies have been conducted at a single clinical site and thus have not accounted for the RoM variability attributable to use of different imaging equipment and testing methods that can be found in today’s clinical practice.

• By performing a meta-analysis of these studies to account for this variability among clinical sites, the authors put forward a new set of lumbar and cervical RoM thresholds for both ruling in and ruling out normal motion, hypermobility, and hypermobility.

• Many new technologies for assessing spine function have been proposed in the literature, and several of these have demonstrated the ability to deliver improved diagnostic efficacy. These newer technologies have also revealed important new insights into the function of the aging spine that have implications for the clinical practitioner.

• The authors put forward a set of suggested guidelines for the clinical use of functional testing, including suggested guidelines for the current standard of care for functional testing as well as for the newer technologies that have been proposed in the literature.

An Introduction to Functional Diagnostics of the Spine

Generally speaking, functional diagnostics are used to assess organ systems for the purpose of detecting dysfunction, identifying the underlying physiological defects, and indicating options for therapeutic intervention. For example, blood chemistry tests are used to assess liver function, while pulse rate monitoring and blood pressure testing are used to assess cardiovascular function. The spine is a series of multiarticulating joints whose primary functions are threefold: (1) to allow multidirectional motions between individual vertebrae, (2) to carry multidirectional external and internal loads, and (3) to protect the delicate spinal nerves and spinal cord. Therefore, functional diagnostics of the spine focus on the assessment and measurement of intervertebral motion under various environmental and movement conditions. The results are then used to help guide the management of patients suffering from various conditions of the spine.

In discussing spinal function as it relates to the aging spine, it is worthwhile to begin with a critical analysis of past knowledge and recent developments regarding spinal functional testing to establish a baseline understanding of the current state of orthopedic science. Such an analysis reveals that the functional testing method used in today’s clinical practice — the standard flexion/extension and lateral side bending radiographs with which all practitioners are familiar — fails to deliver much useful diagnostic information, and is particularly poorly suited to the management of the aging spine. This analysis further reveals that there has never before been a comprehensive set of evidence-based guidelines put forward for the interpretation of functional testing results. This lack of a comprehensive set of evidence-based guidelines is especially problematic given that the clinical standard of care for functional testing has been part of the medical practice for seven decades, has been widely adopted by the vast majority of spine practitioners, and is routinely used on a large number of patients suffering from a wide array of spine diseases.

Therefore the objectives of this chapter are to present this critical analysis of past knowledge and recent developments regarding functional testing of the spine for the purpose of highlighting for the clinical practitioner: (1) recommendations on how best to interpret functional testing results, (2) how the interpretation of these testing results is best applied to gain insights into the kinematics of the aging spine, and (3) how newer functional testing technologies should be assessed and adopted to improve the management of the aging spine.

The Current State of the Art: Diagnostic Efficacy of Today’s Functional Testing Method

The current clinical standard of care for performing functional testing of the spine was introduced in the 1940s ¹ and has since been the subject of scores of published investigations. Today’s method is beset by multiple performance problems^2,³ and, although many practitioners are unaware of the fact, has been proven useless in differentiating normal from abnormal spinal function.⁴^–⁷ In holding true to the tenets of evidence-based medicine it is critical that, as a starting point, practitioners understand the limitations of this method so testing results are interpreted appropriately.

Range of Motion (RoM) Measurements

Today’s method for conducting functional testing of the spine (flexion/extension and lateral bending radiographs, which are referred to in this text as the clinical standard of care) involves capturing standard radiographs of the spine as subjects bend, and then hold their spines fixed in the extremes of motion in either the sagittal (in the case of flexion/extension) or coronal (in the case of lateral bending) planes. These studies are separate to, but often used as an adjunct with, other medical imaging studies such as plain radiographs or CT scans in the diagnostic assessment of a patient’s spine. When performing these motions, each subject bends in each direction to his or her own maximum voluntary bending angle (MVBA).

These two images taken at the extremes of trunk bending within a single plane are then interpreted — either manually using a pen, ruler, and protractor or more recently, with the advent of digital imaging, an imaging workstation — to derive range of motion (RoM) measurements. RoM measurements represent the total displacement between any two vertebrae during MVBA bending, and are expressed as both angulations, as measured in degrees and referred to in this text as the intervertebral angle (IVA) in either the coronal or sagittal plane, and translations in the sagittal plane, measured in millimeters and referred to in this text as the intervertebral translation (IVT). See Figure 10-1 for a simplified diagram showing how IVA and IVT are derived from radiographic images.

FIGURE 10-1 Simplified diagram of how IVA and IVT are derived from radiographic images.

RoM is defined by the rotation of the body (IVA) and the translation of a point on the body (IVT). While the rotation is unambiguous, the translation is not. The translation is different for different points of the vertebral body and, additionally, it is subject to magnification and distortion on radiographs. This ambiguity has led to: (1) the introduction of multiple techniques for selecting points on the vertebral body and measuring IVT;^2,^4,^8,^21,²² (2) attempts to define standardized displacement thresholds for what constitutes translational instability;⁹ and (3) the proposal of multiple systems for scoring and classifying translational instabilities (there have been the Myerding scale,¹⁰ the Newman Scale,¹¹ and the modified Newman scale¹² for scoring translational instabilities, as well as the Wiltse¹³ system for classifying them).

Despite the multiplicity of different methods that have been proposed over the years, the Myerding system has become the most widely used in clinical practice and has thus emerged as the standard system by which translational instability is graded. The Myerding system categorizes the severity of a translational instability based upon IVT measurements expressed as a percentage of the total superior vertebral body length (also measured in millimeters): grade 1 is 0% to 25%, grade 2 is 25% to 50%, and grade 3 is 50% to 75%;

Grade 4 is 75% to 100%; over 100% is spondyloptosis, when the vertebra completely falls off the supporting vertebra. One key advantage of the Myerding system is that it is a relative grading system, meaning that it helps to control for distortion and magnification errors that can be associated with absolute measurements of displacement (millimeters) derived from radiographic images.

Although IVT measurements have been the subject of intense investigation over the years, it is not a topic about which there is currently much debate. This topic was thoroughly explored in studies published in the 1970s through 1990s; however, in the past 15 to 20 years a de facto consensus has emerged with respect to the use of the Myerding system as the clinical gold standard for grading translational instability cases. The same is not true for IVA measurements, as no consensus has emerged with respect to the clinical application of IVA despite a very large volume of recent investigational activity. Therefore the remainder of this chapter will present a review of past and current knowledge with respect to IVA, with a particular focus on patient-management implications for treatment of the aging spine.

IVA is used clinically to assess intervertebral articulation in either the sagittal or coronal planes, and as such should theoretically be capable of detecting six specific types of intervertebral functional presentations (see Figure 10-2):

1. Normal Motion: IVA that is considered normal (i.e., between the second and ninety-eighth percentile of what is observed among normal healthy subjects)

2. Hypomobility: IVA that is abnormally low (i.e., below the second percentile). Note that stiffness and hypomobility are not the same thing; stiffness is a mechanical characteristic of the functional spinal unit (FSU), while hypomobility is a measurement representing the observed response of the FSU to gross spine bending. In that sense, hypomobility can be viewed as a proxy measurement of stiffness.^∗

3. Rotational Hypermobility: IVA that is abnormally large (i.e., above the ninety-eighth percentile). In today’s medical practice, rotational hypermobility is considered a form of instability.

4. Immobility: The lack of any motion at all (IVA = 0°). In practice, the U.S. Food and Drug Administration (FDA) considers any IVA in the lumbar or cervical spine of up to 5° as effectively immobile for the purpose of evaluating arthrodesis status following a fusion, although the literature is equivocal and contradictory regarding the use of this 5° threshold,^14,¹⁵ and recently published treatment guidelines endorse this use of IVA in assessing arthrodesis status only as an adjunct.¹⁶

5. Pseudarthrosis: The presence of motion in a level for which a fusion has been previously attempted. Although theoretically this would include any IVA greater than 0°, according to the FDA standards described above, this only includes IVA of greater than 5°.

6. Paradoxical Motion: The presence of motion in the direction opposite to that of the spine bend (IVA < 0°). The term “paradoxical motion” was coined by Kirkaldy-Willis,¹⁷ although it was first observed by Knutsson. It has been more recently discussed in other published studies.¹⁸ In today’s medical practice, paradoxical motion would be considered a form of instability.

FIGURE 10-2 Theoretical framework for the detection of six functional presentations based on IVA measurements.

However, there is a large gap between those six presentations that should theoretically be detectable, and those that are actually detectable with the current clinical standard of care. This gap is thoroughly explored in the following sections, and must be understood by the clinical practitioner in order to properly interpret functional testing results.

Measurement Variability in Range of Motion (RoM) Measurements

As with any quantitative diagnostic measurement parameter, measurement variability is the key driver of diagnostic efficacy in the application of such measurements to differentiate between the various types of patient presentations. Simply stated, measurement variability is the enemy of effective diagnosis: the higher the measurement variability, the less effective the resulting diagnosis. In the case of RoM measurements, it has been shown that measurement variability is high ^2,³ and diagnostic efficacy is low.⁴^–⁷ The causes and effects of this measurement variability are well understood; however, the implications for the clinical practitioner have rarely been discussed in the published literature. Therefore one of the main goals of this section is to present a data-driven analysis of RoM measurement variability and how this variability should be taken into account in the interpretation of functional testing results used in the diagnosis of spine disease and management of the aging spine.

RoM measurement variability is composed of variability between/within observers, and variability between/within subjects. Variability between observers is referred to as interobserver variability, while variability associated with a single observer taking multiple measurements at different points in time is called intraobserver variability (also called test/re-test variability). Similarly, variability between patients is referred to as intersubject variability, while the variability of any given patient between multiple tests taken at different points in time is referred to intrasubject variability. For example, intersubject variability can include the effects of physiologic differences from patient to patient, whereas intrasubject variability can include variability in the willingness of a patient to perform bending motions from test to test (which can often be due to the influence of pain and/or fear of pain among other things).

There is also a third component of RoM measurement variability that relates to the variability that exists between different testing sites. Different testing sites utilize different radiography platforms, and different imaging platforms can produce different types of image distortion, magnification, and other image variants. Further, different sites utilize different practices for patient positioning and image analysis. These variations among testing sites can directly contribute to RoM measurement variability and therefore must also be taken into account. For the purpose of this discussion, this variability among different testing sites will be referred to as intersite variability.

The different types of RoM measurement variability mentioned in the preceding paragraphs are interrelated in several ways that can be best understood through the concept of “accumulating” variability. As previously discussed, intra-subject variability is a measurement of the test/re-test variation within a given subject, while inter-subject variability is a measurement of the variability across a population of subjects. However, since the RoM measurement from any given subject is affected by intra-subject variation, then any measurement of inter-subject RoM variability across multiple subjects would necessarily “accumulate” the combined effects of intra-subject variation and inter-subject variation. The same concept holds true for measurements of inter-observer RoM variability, namely that these measurements accumulate the effects of both intra-observer and inter-observer variation.

This concept of “accumulation” of variability also applies to the overall relationship between observer-related variability (interobserver and intraobserver variability) and subject-related variability (intersubject and intrasubject variability). Subject-related variation in intervertebral motion exists as an inherent property of the physiology of the spine. In other words, there is a certain amount of variation that is inherent to the way the spines of different people move, or in the way a given person’s spine moves at different points in time. For this discussion, we will refer to this inherent variation as the “pure” intrasubject and intersubject variability. However it is impossible to measure this “pure” intrasubject and intersubject variability without constructing an observational system to take measurements, and any observational system constructed to take measurements is also subject to both intraobserver and interobserver variability. Therefore any measurement of intersubject variability, for this discussion called “observed intersubject variability,” necessarily “accumulates” the combined effects of both observer-related variability and subject-related variability.

See Figure 10-3 for a simplified conceptual diagram of how selected types of RoM measurement variability interrelate through the accumulation of measurement variability.

FIGURE 10-3 Simplified conceptual diagram of the “accumulation” of RoM measurement variability, which applies to both IVA and IVT measurements. Note that this diagram is considered simplified because it does not represent every possible type of measurement variability. For example, observed intrasubject variability is not represented. This simplified diagram represents the interrelationships between those types of measurement variability that are most important for the clinical practitioner to understand in evaluating the performance of today’s in vivo methods of spinal functional testing.

Using Normative IVA Data to Detect Normal Motion, Hypomobility, and Hypermobility

As previously discussed, it is theoretically possible to use normative IVA data from a population of asymptomatic subjects to differentiate normal from hypomobile and hypermobile intervertebral motion (see Figure 10-2). However, with the current standard of care for conducting spinal functional testing, only hypermobility and pseudarthrosis can be detected with an acceptable level of statistical confidence. This fact, although not widely discussed, has very significant implications in terms of patient management, which are discussed later in this section. However as a starting point to this discussion, it is necessary to first re-examine the conventional wisdom regarding what is currently considered “normal healthy” intervertebral rotation.

As a general biostatistical principle, a quantitative diagnostic value is considered an outlier and therefore abnormal if it lies above or below two standard deviations of the mean value that is observed among a representative sample of normal healthy subjects (the mean plus and minus two standard deviations represents approximately 95.5% of all observed values). Therefore, the magnitude of such standard deviations will determine the specific ranges or IVA that should be considered normal versus hypomobile or hypermobile. Many investigators over the years have conducted studies of IVA values across asymptomatic populations for the purpose of producing such ranges, yet all of these investigators are plagued by the same Achilles’ heel: they are all single-site studies and therefore fail to account for intersite variability. Thus every single-site study underestimates IVA measurement variability and therefore produces unreliable ranges of what constitutes normal versus hypomobile or hypermobile intervertebral rotation. However, by conducting a meta-analysis of these studies it is possible to account for this intersite variability and produce more representative ranges of what constitutes normal IVA.

In conducting this meta-analysis, a total of 22 published IVA datasets were identified (15 lumbar and 7 cervical). Each dataset was carefully examined and screened to ensure that: (1) the method for measuring IVA was consistent with the current clinical standard of care, and (2) the variability (standard deviation, or SD) among observed IVA values was published along with the mean. After applying this screen, three lumbar datasets and four cervical datasets qualified for this meta-analysis. See Table 10-1 for a list of all 22 datasets that were considered.

TABLE 10-1 IVA Datasets Consulted and Screened in This Analysis, and the Reason for Exclusion

After including all qualifying datasets, the following values were tabulated for the mean and standard deviation of observed IVA values taken from multiple populations of asymptomatic subjects across multiple sites (Table 10-2). The standard deviation values in the “Aggregated Across Sites” column at the far right of each table represent the standard deviation of the superset created by combining the observed values from all sites, and represents the observed intersite variability associated with the current standard of care for measuring IVA at each level, while the standard deviation values for each investigator represent that investigator’s site’s observed intersubject/intrasite variability.

TABLE 10-2 Normative IVA Data That Account for the Effects of Intersite Variability, Thereby Allowing for a More Representative Account of Mean IVA Values Than Has Ever Been Published in Any Single-Site Study

Using these normative values that account for the effects of intersite variability, it is possible to produce threshold IVA values that represent hypomobility and hypermobility, as given in Table 10-3.

TABLE 10-3 IVA Thresholds for Hypomobility and Hypermobility

Effects of IVA Measurement Variability on the Diagnostic Efficacy of Functional Testing of the Spine

To quantitatively assess the diagnostic efficacy of using IVA to detect different functional presentation (hypomobility, hypermobility, normal motion, etc.), it would be necessary to have a gold standard method for identifying true positives and true negatives for each type of functional presentation. If such a gold standard method existed, it would then be possible to quantitatively assess diagnostic efficacy with the traditional diagnostic efficacy parameters of sensitivity (Sn), Specificity (Sp), and the positive/negative likelihood ratios (+LR and −LR). however, the authors are unaware of that any such gold standard exists ^∗ and it is therefore impossible to measure these traditionally used diagnostic efficacy parameters. Therefore in this discussion of diagnostic efficacy associated with IVA measurements, these efficacy parameters will be described qualitatively in lieu of being able to quantitatively measure them.

As reflected in the hypomobility and hypermobility thresholds given in Table 10-3, the current standard of care for measuring IVA involves a high degree of measurement variability. This high degree of measurement variability, in turn, has disastrous consequences on the diagnostic efficacy of using IVA to detect intervertebral motion dysfunction. The first problem lies with the very low thresholds for detecting intervertebral hypomobility. Vertebral levels with IVA measurements of less than 2° to 5° are generally considered to be fused.^14,¹⁵ As previously discussed, the FDA considers any IVA of up to 5° as effectively immobile for the purpose of evaluating arthrodesis status following a fusion. Therefore, because the hypomobility thresholds are all below the FDA’s 5° threshold for what is considered a fused FSU (except at C4/C5; Table 10-3), it is impossible to use IVA to differentiate hypomobile motion from a fusion, effectively rendering hypomobility an undetectable condition. A second consequence of this overlap between what is considered normal and hypomobile motion with what is considered a fused FSU is that one is guaranteed reduced specificity in detecting immobility as well as reduced sensitivity in detecting normal motion (because a “true normal” with an observed IVA of less than 5° is both a false negative in the detection of normal motion as well as a false positive in the detection of immobility).

The second problem lies with the thresholds for detecting both intervertebral hypermobility and hypomobility. The thresholds for hypermobility are so high because IVA measurement variability is so large. Having such a high threshold for hypermobility (the average threshold for lumbar levels is 22° and for cervical levels is 26°, from Table 10-3) ensures that only the grossest of rotational hypermobilities will register as being definitively hypermobile; thus subtle hypermobilities remain undetected and register as “normal.” Similarly, with hypomobility, high IVA variability makes the hypomobility thresholds so low that only the grossest of hypomobilities could register as being definitively hypomobile. As a consequence, the sensitivity of using IVA to detect hyper/hypomobility as well as the specificity of using IVA to detect normal motion are both reduced (those patients who register as normal but who have a subtle hyper/hypomobility are a false positive in the detection of normal motion as well as a false negative in the detection of hyper/hypomobility).

A third problem arises when one tries to use IVA to rule out hypomobility or hypermobility. It is theoretically possible to rule out hypomobility if observed IVA is sufficiently high. For example, if IVA is confirmed to be above the mean for any level, then it would be possible to rule out hypomobility (even the subtle hypomobilities described in the previous paragraph). It is similarly possible to rule out hypermobility if observed IVA is sufficiently low. However, one must consider the effects of interobserver variability in IVA measurements to be sure that a measurement is above or below the mean in producing threshold values to rule out hypomobility and hypermobility. In quantifying the interobserver variability at one investigational site, Lim et al.³ reported that the 95% confidence interval for the interobserver variability in lumbar IVA measurements is ±5.2°. However, as this study took place at only one site, it almost certainly underestimates the actual interobserver variability that exists across different clinical sites. Nonetheless, if one uses the Lim estimate and assumes that an IVA measurement must be 5.2 ° above/below the mean to be 95% confident that the observed IVA is actually above/below the mean, and if one further assumes that any IVA measurement above/below the mean rules out hypo/hyper-mobility, then one can produce the “rule-out” thresholds for hypomobility and hypermobility given in Table 10-4. However, there are some limitations associated with the data used to create these threshold values (as described in the caption for Table 10-4), so therefore they should be considered nondefinitive until these limitations are addressed and new thresholds can be produced.

TABLE 10-4 IVA Thresholds for Ruling In/Out Hypomobility and Hypermobility

In conclusion, the diagnostic efficacy of using IVA to detect the following conditions can be summarized as:

• Immobility: Low specificity (high rate of false positives), so immobility should not be “ruled in” for IVA of 5° or less. May be definitively “ruled out” for IVA greater than 5°.

• Pseudarthrosis: May be definitively ruled in for IVA greater than 5°. Low sensitivity (high rate of false negatives) so pseudarthrosis should not be ruled out for IVA less than 5°.

• Hypomobility: Effectively undetectable (thresholds below what is considered fused). A nondefinitive rule-out diagnosis for hypomobility can be made if IVA is above the threshold values listed in Table 10-4.

• Normal Motion: Rule in diagnosis of normal motion should be considered non-definitive, because both sensitivity and specificity are low. May be ruled out with a high degree of confidence if IVA is above hypermobility thresholds (i.e., if hypermobility is ruled in).

• Hypermobility: May be definitively ruled in for IVA values above the hypermobility thresholds given in Table 10-3. Low sensitivity (i.e., high rate of false negatives), so hypermobility should not be ruled out if IVA is below the thresholds. May be nondefinitively ruled out if IVA is less than the threshold values given in Table 10-4.

The root cause of this poor diagnostic efficacy in the use of IVA in the detection of different functional presentations is the high degree of measurement variability associated with the current standard of care for measuring IVA. As a consequence, any reduction to IVA measurement variability would serve to increase the diagnostic efficacy of using IVA in the detection of the functional presentations given earlier.

Conclusions: Implications for the Practitioner Regarding the Clinical Application of RoM Measurements

The current standard of care for functional testing of the spine provides IVA results that can be overinterpreted if measurement variability is not properly accounted for. Based on a comprehensive analysis of the effects of this variability, it is possible to put forward a set of clinical practice suggestions that are consistent with the published literature and that properly account for the effects of all sources of measurement variability:

1. Definitive diagnoses that can be made using the current standard of care for functional testing of the spine:

• When an instability is suspected, any IVA measurement above the hypermobility thresholds given in Table 10-3 should be considered definitively hypermobile.

• When pseudarthrosis is suspected in a previously fused segment, any measurement above 5° should be considered definitive pseudarthrosis.

• Any measurement below −5° (i.e., 5° of motion in the direction opposite the bend) should be considered definitively paradoxical.

2. Nondefinitive diagnostic results possible with IVA measurements

• Due to the significant false negative rate when it comes to the detection of hypermobility, any IVA measurement above 5° but below the hypermobility thresholds given in Table 10-3 should be considered nondefinitive, but potentially normal. It is currently impossible to definitively rule in normal motion using today’s clinical standard of care.

• Any IVA measurement ranging from −5° to 5° should be considered nondefinitive, but potentially hypomobile, immobile, paradoxical, or normal. If pseudarthrosis is suspected and an IVA of less than 5° is observed, a corroborative spine CT view can be used to assist in the detection of pseudarthrosis.¹⁶ ^∗

• Hypomobility and hypermobility may be nondefinitively ruled out based on the threshold values given in Table 10-4.

Technological Advances that Improve the Diagnostic Efficacy of Spinal Functional Testing

As stated throughout this text, the current standard of care for measuring IVA includes a high degree of both observer-related and subject-related variability. Technological developments in recent years have been effective at reducing both of these types of variability, and are discussed in this section. However, this section only includes those methods which could feasibly be adopted by the clinical practitioner and thus it does not discuss techniques which are purely investigational or are otherwise infeasible for immediate adoption (such as Roentgen Stereophotogrammatric Analysis,¹⁹ external skin-marker−based motion measurement techniques,²⁰ as well as a variety of in vitro measurement methods).

Reducing IVA Observer-Related Variability by Improving the Reliability of Image Analysis Techniques

With respect to observer-related variability, previous studies have confirmed widely variable IVA results from measurements of the same images taken by different observers. Lim et al. demonstrated that a difference of 9.6 degrees must exist between the IVA measurements from two observations in order to be 95% confident that there really is a difference in IVA.³ This high degree of interobserver variability is a major contributor to overall observed measurement variability. However, recent advances have successfully reduced this interobserver variability through several novel techniques.

There have been improvements over the years with respect to the methods for landmarking the radiographic images and deriving IVA and IVT measurements from these images. Variability in IVA and IVT measurements can be introduced through distortion errors inherent to all radiographic images. Further, if patients move out of plane or have any significant axial rotation in their spines during imaging, the resulting IVA and IVT measurements can become more variable. A group led by W. Frobin found that interobserver variability in IVA and IVT measurements could be reduced simply by using a more sophisticated method of landmarking radiographic images.^21,²² This technique was found to significantly reduce the variability in IVA and IVT measurements associated with radiographic image distortion and with out-of-plane positioning of the subject during imaging.

There have been multiple groups who have successfully developed software-based image analysis tools that have been shown to reduce this interobserver variability. For example, one of the authors of this chapter, Kris Wong, recently developed a software algorithm for automatically deriving IVA measurements from bending images. Wong et al. published two datasets of normative values, one dataset that was derived manually,²³ and a second dataset that was derived using automated software image processing algorithms.²⁴ Both datasets were measured from active flexion-extension bending of the lumbar spine. The average standard deviation across the lumbar levels measured in this study (a measurement of the observed intersubject/intrasite variability) decreased over 50%, from 2.8° to 1.3°, as a result of using automated software-driven image analysis versus manual image analysis. Other groups have been able to demonstrate similar results using commercially available image analysis software. Using an automated image analysis software program operated as a core lab service (QMA software operated by Medical Medtrics, Inc., Houston, Texas) instead of a manual image analysis process, Reitman et al. published a cervical IVA dataset²⁵ of 155 asymptomatic subjects and Hipp & Wharton published a lumbar IVA dataset²⁶ of 67 asymptomatic subjects. The Reitman study reported an average standard deviation across cervical levels of 4.0°, while the Hipp & Wharton study published an average standard deviation across all lumbar levels of 3.6. While these are the lowest published standard deviation among different cervical or lumbar IVA datasets, these do represent a 20% (cervical) and 18% (lumbar) reduction relative to the average value for intersubject/intrasite variability (i.e., the average of all individual sites’ average standard deviation) from the datasets listed in Table 10-2 (5.0° cervical and 4.4° lumbar). From the Wong, Hipp & Wharton, and Reitman datasets it can be shown that using automated software image analysis methods as opposed to manual methods for measuring IVA can reduce interobserver variability and thus also reduce observed intersite variability.

Collecting Dynamic Images “During the Bend” through the Diagnostic Use of Fluoroscopy for Functional Testing of the Spine

With the current standard of care for conducting functional testing of the spine, only static images are collected while subjects hold static posture in their MVBAs; no dynamic images are collected, and no images are collected during the bend. There have been several research groups who have addressed this potential shortcoming by collecting dynamic images at points throughout spine bending by using fluoroscopy.²⁷^–³³ The principal advantage of using fluoroscopy instead of standard radiographs is that if a functional problem is only present dynamically, or if it is only visible at positions other than MVBA, it would never be detectable using the current standard of care. However, although arguably superior to the current standard of care, this method of functional imaging has never become widely used in the United States, because most major American payer organizations have refused to reimburse practitioners for such a use of diagnostic fluoroscopy.

Reducing the Subject-Related IVA Variability Introduced through Uncontrolled Bending During Imaging

In addition to the diurnal variation that any given patient exhibits in spine bending MVBAs, there is also a high degree of variability in MVBA from subject to subject. This variability can be expected to be considerable, given the range of sensitivity or stoicism of subjects, their level of pain, and their fear or resilience in the face of it. In the cervical spine, there is a wide range of MVBA observed in normal asymptomatic subjects. The 95% confidence interval on observed sagittal plane cervical spine MVBA was measured to range from 34° to 82° of total gross motion — a very large range. The authors of that study, which measured both total gross cervical spine motion and cervical intervertebral motion (IVA), observed that “… this variation in gross motion between individuals had a highly significant effect on all measures of IVM [intervertebral motion].”²⁵ Variation in MVBA has also been measured in the lumbar spine among sufferers of chronic back pain.³⁴ The 95% confidence interval on observed sagittal plane lumbar MVBA was reported to range from 25° to 93° of total gross motion, an even larger range than was observed in the cervical spine among asymptomatics.

Clearly, this high degree of variability in MVBA plays a large role in driving the high levels of overall variability in IVA measurements. As discussed previously in this chapter, it is the high degree of variability in IVA that renders these measurements so clinically ineffective. Controlling the variability associated with MVBA bending therefore should be expected to reduce IVA measurement variability, and thus increase the diagnostic efficacy of functional testing of the spine.

One means of addressing for the variability in MVBA bending is to normalize IVA measurements against the measurements of total range of motion between an entire spinal region. For example, in the lumbar spine the IVA from any given level can be divided by the total bending that occurs between L1 and S1 to express IVA as a percentage of total lumbar range of motion. By doing this, it is possible to reduce the effects of the variability introduced by MVBA bending. In the case of the lumbar spine, this method has been shown to be an effective means of addressing the variability inherent in MVBA bending.^29,³⁰ This method has also been shown to be successful in studies involving the cervical spine.²⁵

Another means of addressing IVA measurement variability caused by MVBA bending is through the use of passive rather than active spine bending. Dvorak and Panjabi (one of the authors of this chapter) published a study in 1991 in which they used a passive bending technique to decrease the variability in MVBA.⁸ In this study, an assistant applied a pulling force to subjects as they bent into flexion. The assistants attempted to pull the patients into passive flexion with as constant a force as possible, and in so doing provided a level of standardization in the bending angles of the patients. In this study of 41 patients, the authors reported an average standard deviation of 2.8° in the IVA measurements across the lumbar levels from passive lumbar bending, which represents a 36% reduction to the observed intersubject/intrasite variability as compared to the mean value of 4.4° for the average standard deviation across lumbar levels from the MVBA datasets listed in Table 10-2.

Another means of addressing the IVA variability caused by MVBA bending is to take IVA measurements from standardized bending angles (SBA). For the remainder of this text, IVA measurements taken from SBA will be referred to as sIVA, while IVA measurements taken from MVBA will be referred to as IVA. Wong (an author of this chapter) et al. developed a novel method of measuring sIVA that involved the use of an electrogoniometer connected to a fluoroscope, such that the electrogoniometer could trigger the capturing of images of the lumbar spine at every 10° of lumbar bending.^23,^24,³¹ Once images were collected, automated image analysis software was utilized to derive sIVA measurements from the fluoroscopic images. In that study, the authors reported an average standard deviation of 1.3° in the measurements of sIVA across the lumbar levels, which represents a 72% reduction to the observed intersubject/intrasite variability as compared to the mean value of 4.4° for the average standard deviation across lumbar levels from the IVA datasets listed in Table 10-2.

Motion Control Technology Used in Combination with Digital Videofluoroscopy and Automated Image Analysis Software

A group in Bournemouth, England led by Alan Breen, one of the authors of this chapter, has developed a patient handling system intended to reduce subject-related variability by controlling and standardizing the bending of the subject during imaging. This system involves a powered articulating device that is capable of rotating the subject’s spine through a controlled and standardized sweep of spine bending during imaging. These devices are capable of providing controlled standardized spine bending in flexion/extension and lateral bending, cervical and lumbar spine motion, and standing active (weightbearing) as well as recumbent passive (nonweightbearing) spine bending. Using this device, sIVA can be measured in recumbent passive spine bending and both sIVA and IVA can be measured in standing active spine bending.

Breen et al. have integrated other recent technological developments — namely the use of digital videofluoroscopy plus the development of automated image analysis software to track vertebral bodies in sequential fluoroscopic images — together with these patient handling devices to produce a new system for conducting functional testing of the spine. Breen et al. have called this the OSMIA system, which stands for Objective Spinal Motion Imaging Assessment. Various components of this system have been discussed in a string of publications starting in 1988.^32,³⁵^–³⁹ Performance and validation testing of the passive recumbent integrated system was published in 2006.⁴⁰ The results from this performance and validation testing suggest that the OSMIA system provides several important technical performance advantages relative to the current clinical standard of care.

The OSMIA system is intended to integrate all of the key technical performance benefits associated with other recent innovations in spinal functional testing into a single, integrated system. First, by measuring sIVA, the OSMIA system is intended to reduce subject-related variability similar to that observed by Wong et al. Second, by using digital videofluoroscopy imaging rather than standard radiographic imaging, the OSMIA system collects data “during the bend” in a way similar to previous investigators. Third, by using digital image-processing software to automatically track and measure movements of vertebral bodies, the OSMIA system is also intended to reduce observer-related variability.

See Figure 10-4 for an example of how the OSMIA system plots sIVA against the gross lumbar bending angle (the angle between the thorax and the pelvis). The OSMIA system has been tested on a normative cohort of 30 asymptomatic subjects, and among these subjects, motion patterns were generally similar to that depicted in Figure 10-4.

FIGURE 10-4 An example of a plot of sIVA vs. the gross lumbar bending angle from the OSMIA system. The graph depicts a typical sinusoidal curve moving in the same direction as the trunk bend taken from sIVA collected at L4/L5 from a patient tested with the OSMIA system in passive recumbent lateral side bending to 40° in each direction.

In addition to the asymptomatic subjects tested with the OSMIA system, symptomatic patients have been tested prior to surgical fusion or dynamic stabilization procedures. Among this patient cohort, there is case evidence that many of the “theoretically detectable” functional presentations depicted in Figure 10-2 are detectable with the OSMIA system. See Figure 10-5 for case evidence of patients presenting with paradoxical motion, immobility, and intervertebral hypomobility.

FIGURE 10-5 Case evidence of patients with lumbar degenerative disc disease presenting with apparent paradoxical motion, immobility, and apparent hypomobility. These plots depict motion at the index level as measured directly presurgical to a fusion or dynamic stabilization procedure. These motion plots represent sIVA measurements from passive recumbent side bending. In contrast to the motion plot depicted in Figure 10-4, which includes both the left and right phases of lateral lumbar spine bending, these motion plots represent intervertebral motion from only right lateral bending (to 40° of right lateral bending). The dashed “Normal” line on each graph is representative of the motion plots that were observed among the asymptomatic cohort.

New Insights into the Biomechanics of the Aging Spine

Making use of these recent advances in functional testing technology, it is now possible to begin to sharpen our understanding of the biomechanics of the aging spine. Having these new capabilities opens up a new world of insights into in vivo spine biomechanics that has been effectively off limits due to the prohibitively high variability in IVA measurements associated with the current clinical standard of care.

Physiologic Variation in sIVA among Normal Subjects Is Very Low

By producing such a dramatic reduction to the observed measurement variability, the Wong et al. data yield two profound discoveries. First, it is clear that there is actually very little physiologic variation in the sIVA measurements among asymptomatic subjects. This fact has remained obscured by the high variability inherent in today’s standard of care for functional testing of the spine. In fact, there is such little physiologic variation that it becomes possible to define very tight ranges for the 95% confidence interval of observed sIVA values. These ranges are narrow enough that it is possible to dramatically outperform the current clinical standard of care by: (1) being able to differentiate hypomobility from immobility, (2) differentiating hypomobility from normal motion, (3) detecting hypo/hypermobility with much tighter thresholds, which improves both the sensitivity of hypomobility/hypermobility detection as well as the specificity of the detection of normal motion. See Table 10-5 for the ranges for the detection of flexion-extension hypomobility and hypermobility for the measurement system devised by Wong et al.

TABLE 10-5 Mean sIVA, sIVA Standard Deviation (SD), and Hypomobility and Hypermobility sIVA Thresholds for the Measurement System Described by Wong et al.

Rethinking the Conventional Wisdom Regarding Intervertebral Hypomobility and Age

A second profound finding of Wong et al. is that when sIVA is examined, vertebral levels in normal subjects became less hypomobile as normal subjects experience healthy aging, not more hypomobile, as has been the conventional wisdom. See Figure 10-6 for these results as reported by Wong et al. This has very significant implications for the management of the aging spine. While it has been shown that a patient’s MVBA decreases with progressing age,⁴¹^–⁴³ Wong et al. have proved that this is not due to a decreased motion response of lumbar FSUs to gross lumbar bending. Therefore, intervertebral hypomobility as observed with sIVA should be considered to be the result of a pathological change, rather than a result of the normal aging process. Because intervertebral hypomobility is often associated with older patients with compromised disc height, it is important for practitioners to recognize intervertebral hypomobility observed with sIVA as being pathological, and not assume that intervertebral hypomobility in older patients is to be expected as part of the normal aging process.

FIGURE 10-6 Plot of sIVA versus gross lumbar bending angle for four age-defined cohorts. Wong et al. took sIVA measurements from 100 asymptomatic volunteers, subdividing this group into four 25-patient age-defined cohorts (Group A = 21 to 30; Group B = 31 to 40; Group C = 41 to 50; and Group D = 51 and above). Note that in each graph, the oldest cohort appeared to have the greatest sIVA values.

Age-Related Differences in the Functional Presentations of Degenerative Spondylolisthesis Patients

After conducting a study of sIVA in normal asymptomatic subjects, Wong et al. used this new measurement system to examine sIVA in 91 degenerative spondylolisthesis sufferers. Among these 91 patients, Wong et al. found the following spinal segmental mobility patterns:⁴⁴

• 12/91: (13%): Immobility

• 27/91: (30%): Hypomobility

• 13/91: (14%): Normal

• 39/91: (43%): Hypermobility

A multiple regression analysis was then conducted to compare the predictive power of gender, age, grade of slippage, and disc height (as measured in the anatomical starting position) in predicting the mobility patterns that were observed among this population of degenerative spondylolisthesis sufferers. This analysis revealed that grade of slippage, followed by age, was a significant predictor of the observed mobility patterns. Specifically, younger patients with grade 1 L4/5 degenerative spondylolisthesis predicted hypermobility, whereas elder patients with grade 2 or above predicted a hypomobility pattern. These findings are consistent with the findings of Takayanagi et al,³³ who found that IVT and IVA in bending radiographs are both reduced in degenerative spondylolisthesis patients as compared to asymptomatic controls, and that both IVT and IVA decrease as the grade of slippage increases.

Suggestions for the Clinical Use of Functional Testing Methods

A review of past knowledge shows that the current standard of care for assessing spinal function is poorly suited to the management of the aging spine. Hypomobility appears to be a condition that is more often associated with the diseased aging spine than with diseased younger patients; however, this is the one mobility pattern that is completely undetectable with the current standard of care. Further, while the current standard of care is arguably more effective in detecting hypermobility than any other mobility pattern, this condition is most commonly associated with younger patients as opposed to older patients. With respect to the use of functional diagnostics to assist in the management of the aging spine, there is a strong case to be made for the adoption of improved methods.

Suggestions Regarding the Clinical Use of the Current Standard of Care

The current standard of care for conducting functional testing of the spine using standard radiographs and MVBA spine bending is the only method that is widely available to all practitioners, and will remain so until improved methods become commercially available. Therefore the authors put forward the suggestions given in Table 10-6 regarding the use of the current clinical standard of care for conducting functional diagnostics of the spine.

TABLE 10-6 Summary of Suggestions for the Clinical Use of the Current Standard of Care for Spinal Functional Testing

Suggestions Regarding the Clinical Use of Recently Developed Methods for Conducting Functional Testing of the Spine

There have been innovations in functional testing technology that offer the promise of definitively detecting those functional presentations most relevant to the aging spine (immobility, hypomobility, and normal motion). These innovations involve a set of three potential changes to the current clinical standard of care:

• The use of automated image analysis software to derive IVA measurements from radiographic images (as opposed to manual landmarking methods).

• The use of fluoroscopy to capture dynamic data regarding intervertebral motion during spine bending (as opposed to taking standard radiographs of patients holding static postures at the extremes of spine bending).

• The use of sIVA and IVA rather than IVA alone.

The authors have already put forward the improvements to diagnostic efficacy that are potentially attainable through the adoption of these improved methods. However, if any of these newer methods is to be adopted, it is critical that all issues affecting patient safety are fully explored. The authors have put the key considerations regarding patient safety associated with the adoption of these new methods for functional testing in Table 10-7.

TABLE 10-7 The Authors’ Suggestions Regarding the Key Patient-Safety–Related Issues Related to the Adoption of Any of the Newer Methods for Conducting Functional Testing of the Spine

Change to Functional Testing Method	Key Patient-Safety Issues and Authors’ Suggestions
The use of automated image analysis software instead of manual landmarking techniques	• The software’s observer-related variability in IVA measurements must be validated to be lower than what has been reported for manual landmarking techniques. • The accuracy and precision of the software in measuring IVA must be known. • If the observer-related variability is low enough, and if the accuracy and precision are good enough, it may be feasible to institute different thresholds for the detection of pseudarthrosis, immobility, and paradoxical motion than are currently used.

Change to Functional Testing Method

Key Patient-Safety Issues and Authors’ Suggestions

The use of automated image analysis software instead of manual landmarking techniques

• The software’s observer-related variability in IVA measurements must be validated to be lower than what has been reported for manual landmarking techniques.

• The accuracy and precision of the software in measuring IVA must be known.

• If the observer-related variability is low enough, and if the accuracy and precision are good enough, it may be feasible to institute different thresholds for the detection of pseudarthrosis, immobility, and paradoxical motion than are currently used.

The use of fluoroscopy to capture dynamic images of intervertebral motion during spine bending instead of standard radiographs to capture images of statically held spine bending postures

• Fluoroscopy imaging may be substituted for standard radiographic imaging for the purpose of conducting functional testing.

• However, as image contrast for fluoroscopy can be poorer than that of standard radiographs, fluoroscopic images may fail to detect certain conditions that require the high contrast provided by standard radiographs (such as infection, skeletal neoplasia, etc.).

• Therefore, for any patient for whom fluoroscopy is substituted for standard radiographs for conducting functional testing of the spine, a recently taken standard radiograph of the spine should also be available.

• The total dose of radiation to the patient associated with any fluoroscopy-based protocol for conducting functional testing should be measured and compared to that which would be received by the patient with standard radiographic imaging. Any increase in effective dose to the patient needs to be carefully evaluated.

The use of sIVA and IVA rather than IVA alone

• Using SBA instead of MVBA from which to take IVA measurements has been shown to reduce the subject-related variability in these measurements (i.e., sIVA has less measurement variability than IVA).

• The total observed intersite variability associated with sIVA would need to be validated before new thresholds for detecting hypomobility, normal motion, and rotational hypermobility are adopted.

• Testing protocols would likely need to include assessments of sIVA as well as IVA, as there are potentially valuable diagnostic insights to be gained from observing IVA at the physiologic operating ranges of gross trunk motion.

References

1. Knutsson F. The instability associated with disc degeneration in the lumbar spine. Acta. Radiol.. 1944;25:593-608.

2. Panjabi M., Chang D., Dvorak J. An analysis of errors in kinematic parameters associated with in vivo functional radiographs. Spine. 1992;17:200-205.

3. Lim M.R., Loder R.T., Huang R.C., Lyman S., et al. Measurement error of lumbar total disc replacement range of motion. Spine. 2006;31(10):E291-E297.