Chapter 44 The Cincinnati Knee Rating System
INTRODUCTION
The assessment of outcome following the treatment of knee injuries and disorders has received tremendous attention in the sports medicine literature. Over 40 knee rating instruments have been published since the mid 1980s; however, only a few have undergone rigorous testing of the psychometric properties of reliability, validity, and responsiveness.26,39,56 Historically, early scoring scales and systems designed to rate the outcome of anterior cruciate ligament (ACL) reconstruction were introduced into the medical community without undergoing an assessment of these properties.15,19,32,34,40,63,66 With no consensus regarding which variables to include in the measurement of patient outcome, it is not surprising that studies that compared the results of ACL reconstruction using different rating systems showed distinct differences in results and conclusions.1,6,7,22,55,57
In order to provide a comprehensive analysis of the knee condition and its impact on activity and function after ACL reconstruction (and other surgical procedures), clinical investigators have suggested that rating systems measure a variety of symptoms, sports and daily activity functions, patient satisfaction, and objective physical findings.1,57,65 Only two such systems are currently available that have established reliability, validity, and responsiveness: the Cincinnati Knee Rating System (CKRS)4,43 and the International Knee Documentation Committee (IKDC) system.23,24 Each of these rating systems measures pain, swelling, giving-way, functions of sports and daily activities, sports activity levels, patient perception of the knee condition, range of knee motion, joint effusion, tibiofemoral and patellofemoral crepitus, knee ligament subluxations, compartment narrowing on radiographs, and lower limb symmetry during single-leg hop tests.
The authors agree with Zarins65 that although the subjective assessment of symptoms and functional limitations is important, the final outcome of a specific treatment must also take into account objective measures that are appropriate for the diagnosis or injury under study. The determination of patient outcome according to subjective questionnaire-based data only59 does not provide a complete understanding of the ability of the treatment protocol to restore normal knee function. Indeed, knee rating scales exists in which a patient may be rated as “excellent” even though an ACL reconstruction failed to restore normal or nearly normal knee stability, which is one of the major goals of the procedure.63 It is well known for this injury that in the short-term, patients may function well, but over time, the knee joint deteriorates and functional limitations increase to frequently affect daily activities.2,5,9,52 In addition, investigators should use instruments sensitive to the condition under study. Generic health or quality of life questionnaires have limited usefulness in studies comprising patients with a specific diagnosis and, therefore, should be used in addition to disease-specific rating systems. This chapter describes the rationale and methodology for the major components of the CKRS. The IKDC system is discussed in detail in Chapter 45, The International Knee Documentations Committee (IKOC) Rating System.
The CKRS was first published over 20 years ago in concert with the largest ACL natural history study conducted during that time period.52 In the early 1980s, the dilemma of the appropriate treatment for complete ACL ruptures stemmed in part from limited knowledge of the functional limitations caused by the injury and the lack of a rigorous rating system that graded symptoms and limitations according to the specific type of activity during which they occurred. Over the ensuing decade, additional scales and modifications were developed in order for the CKRS to provide a comprehensive assessment of the knee condition.4,43,44,51 An overall rating scheme was devised to provide a final rating, which is available in either a numerical or a categorical manner, as is discussed later. The major components of the CKRS are shown in Table 44-1. This system is one of the most commonly used instruments in the orthopaedic literature to measure the results of ACL reconstruction and has been considered a “gold standard” in the development and content and criterion validity analyses of other knee rating scales.21,23,25,36 The CKRS was initially designed and validated in athletically active populations; however, it is also useful for patients who have undergone other operative procedures such as articular cartilage restorative procedures, meniscus repairs or transplants, osteotomies, or patellofemoral procedures.
REVIEW OF ANALYSES USED TO MEASURE PSYCHOMETRIC PROPERTIES OF OUTCOME INSTRUMENTS
Reliability is the extent to which scores on an instrument are reproducible and is measured either between subjects (test-retest) or between observers (interobserver). Patients complete questionnaires at separate time periods; a minimum 1-week interval was recommended by Deyo and coworkers12 to elapse between questionnaire administration. Reliability is measured with product-moment correlations and intraclass correlation coefficients (ICCs). ICC is the most commonly used statistic in modern studies and is calculated by
Correlations between test-retest data should be greater than 0.70, which is considered the standard for adequate reliability for questionnaires.54 The use of ICC rather than the more common Pearson correlation coefficient was suggested by Deyo and coworkers12 and Lin31 to provide a more sensitive assessment of variability within data. The problem that can occur with the Pearson correlation coefficient is that duplicate measurements may be systematically different yet correlate highly and, as a result, be falsely interpreted. For example, if every patient scored exactly 5 points lower on a scale on the second administration, the test-retest correlation would be a perfect 1.0, despite the fact that every patient had a lower score. The ICC handles this problem because it not only assesses the strength of the correlation but also determines whether the slope and intercept in the regression line of test-retest data vary from those expected with duplicate results. In our example, the ICC would correspondingly be reduced to demonstrate the systematic difference between the test-retest data.
Several measures have been described to determine the validity of an instrument, including content, construct, item-discriminant, convergent, and criterion. In general, validity is the psychometric criterion in which an instrument is tested to determine its ability to actually measure what it claims to measure.27 Content validity is the extent to which a question or instrument represents the area of interest and has been described in various manners. Face validity, one example of content validity, is determined by consulting both patients and experienced medical professionals regarding the development of a scale’s questions and their relevance to the diagnosis under study.16 For instance, for ACL reconstruction investigations, a questionnaire with good face validity would be believed by patients, surgeons, and therapists to measure the common problems caused by this injury, such as pain and instability. This represents a subjective analysis that is not statistically analyzed. Another method to determine content validity is to calculate floor (worst result) and ceiling (best result) effects.37,62 Scales in which the majority of patients score either the highest level or the lowest level do not allow for an assessment of deterioration or improvement over time. Floor and ceiling effects are present when greater than 30% of the population marks either the best possible or the worst possible scores on a scale.28
Construct validity is the extent to which a measure corresponds to expected theoretical concepts or hypotheses regarding the diagnosis. An instrument will accurately differentiate patients whose outcome is expected to vary with regard to certain characteristics known to the disease process.14,61 Researchers develop hypotheses based on prior research and clinical experience in which the questionnaire scores are expected to be significantly different between selected patient groups. The hypotheses are confirmed using the F and T-test at the level of P < .01. In addition, construct validity is determined by conducting Pearson product-moment correlation coefficients between scale items and either previously validated instruments or physician and patient assessments. A moderately strong coefficient is proved at R > 0.60.
Item-discriminant (or –divergent) validity is present when variables hypothesized to be dissimilar (such as patient age and anteroposterior knee [AP] displacements) are indeed found to be statistically unrelated.14,38 Pearson correlations are performed to detect statistical dissimilarities, proven at R = 0.28 or less. In the opposite manner, convergent validity is present when variables believed to be similar within the questionnaire are indeed found to be statistically similar.
Internal consistency is determined by a coefficient α greater than 0.60.28 The underlying concept of this measure is that the consistency with which a patient answers from one question to the next can be used to provide an estimate of reliability for the total test score.41 A high coefficient α indicates that the items in a questionnaire are consistently measured or are homogeneous with regard to the measurement of the underlying diagnosis or attribute.30
Responsiveness, or the ability of an instrument to detect clinically important change, is determined by calculating standardized response means (SRM) and effect sizes (ES) of the selected instrument categories. The magnitude of the SRM (mean change in score from preoperative to follow-up/SD of change in score)18 and the ES (mean change in score from preoperative to follow-up/SD of preoperative score)29 are interpreted using the Cohen standard of greater than 0.20 for small effects, greater than 0.50 for moderate effects, and greater than 0.80 for large effects.10 This analysis provides a more precise indication of the change in results over time from those obtained by the standard student T-test. An instrument’s sensitivity simply denotes its ability to measure any change, which by definition does not necessarily indicate one that is clinically meaningful.27
COMPONENTS OF THE CKRS
Rating of Symptoms
Pain, swelling, partial giving-way, and full giving-way are the major knee symptoms assessed in ACL investigations. Authors have proposed a variety of methods for the measurement of symptoms, from a binary system (“yes” or “no”19,34), to visual analog scales,17,21,39 to a severity rating (such as mild, moderate, severe) which can be done either alone32 or in combination with activities (such as “slight after strenuous sports”66).15,32,56,66
In 1983, Noyes and associates52 proposed that the assessment of knee symptoms should be scored according to the activity during which they occurred: strenuous sports, recreational sports, or walking. This rationale provided an understanding of the impact of a chronically deficient ACL-deficient knee, because 30% of 103 patients reported pain with walking alone, 47% with recreational sports, and 69% with strenuous sports in the authors’ natural history study.
The assessment of symptoms was later refined and the scale increased to a six-level gradient shown in Figure 44-1.43,44,50 Points are awarded for the highest activity level in which the patient is able to participate without incurring the symptom (Appendix A). If the symptom is present with activities of daily living, it is rated as either moderate (frequent, limiting) or severe (constant, not relieved). Definitions are provided for terms that might otherwise be ambiguous to patients, such as “moderate” sports (running, turning, twisting) and “strenuous” sports (jumping, hard pivoting).
FIGURE 44-1 Symptom Rating Scale.
(From Barber-Westin, S. D.; Noyes, F. R.; McCloskey, J. W.: Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati Knee Rating System in 350 subjects with uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med 27:402–416, 1999.)
When reporting symptoms before and after surgery, a distribution of the percentage of patients in each of the six levels should be shown (along with a mean and SD) for both time periods. An example is shown in Figure 44-2 for a group of patients who received a meniscus transplant.48 The data were also expressed in the body of the text as “the mean preoperative Cincinnati Knee Rating Scale pain score of 2.5 points (range, 0–6 points) improved to a mean of 5.8 points (range, 0–10 points) at follow-up (P < .0001). Before the meniscus allograft, thirty patients (79%) had moderate to severe pain with daily activities but at follow-up, only four patients (11%) had pain with daily activities.”48
One problem may occur in the rating of symptoms when patients have not attempted to return to strenuous sports activities. In these situations, a potential bias may occur if the patient or clinician attempts to project the correct symptom level based on a hypothetical answer. For instance, if a patient returned to bicycling or swimming and had no pain with those activities, then the pain score awarded would be a level 6 (see Fig. 44-1). However, if the patient is asked whether she or he believes pain would occur with level 8 activities (running, twisting, turning) and she or he responds that it probably would not occur, a bias would occur if this score was assigned without further verification that this was indeed true. This is often the case with the symptom of giving-way, because patients frequently return asymptomatically to level 6 or 8 activities postoperatively, but state that they participated a few times at level 10 activities (jumping, hard pivoting) without problems. Points are awarded only with a reasonable basis for the assessment and not speculated by the patient as to the level that may be possible.
Second, the clinician may use the modified symptom rating scale that was first introduced in an investigation of patients with varus osseous malalignment and ACL deficiency who were treated with multiple operative procedures.45 This modified scale consists of a four-level gradient. The levels of 0, 2, and 4 are the same as the original scale shown in Figure 44-1. The fourth and highest level (level 6) indicates that some type of sports participation is possible without the symptom (Fig. 44-3). This modified scale is intended for studies in which the majority of patients do not return to moderate or strenuous athletics indicated in levels 8 and 10. The reliability of this modified scale was previously shown to be adequate for patients and normal subjects.4 However, clinicians should be aware that the modified scales might have reduced sensitivity, especially if the results of a study that used the modified scale are compared with another study that used the original scale. This pertains only to the individual symptom results. The effect of the modified scale on the overall rating score, described later in this chapter, is small and has only a negligible impact when comparing the data of the overall scores between different populations.
Even though there is always the potential for a bias to exist regarding the scoring of subjective symptoms, the CKRS format allows for an accurate assessment of the activity levels the patients returned to on a routine basis. The scale was designed to not award points if a patient participates at a high activity level but has symptoms, thereby fulfilling the criteria of the “knee abuser.”52
Rating of Patient Perception of the Knee Condition
Modern knee rating systems incorporate some form of patient satisfaction, or rating of the patient’s perception of the knee condition, into the assessment of clinical outcome.4,23 In the CKRS, patients are asked to rate the overall condition of the knee by circling a number on a scale from 1 to 10 (Fig. 44-4). Four descriptors are provided to assist the patient in understanding the meaning of the numerical scale. Under the number 2 is the term “poor,” defined as “I have significant limitations that affect activities of daily living,” whereas under the number 10 are the terms “normal/excellent,” defined as “I am able to do whatever I wish (any sport) with no problems” (see Appendix A). For data reporting purposes, a distribution of responses is shown in a five-level gradient. Responses under numbers 1 and 2 are termed “poor”; those under numbers 3 and 4, “fair”; those under numbers 5 and 6, “good”; those under numbers 7 and 8, “very good”; and those under numbers 9 and 10, “normal.”
FIGURE 44-4 Patient Perception of the Knee Condition.
(From Barber-Westin, S. D.; Noyes, F. R.; McCloskey, J. W.: Rigorous statistical reliability, validity, and responsiveness testing of the Cincinnati Knee Rating System in 350 subjects with uninjured, injured, or anterior cruciate ligament-reconstructed knees. Am J Sports Med 27:402–416, 1999.)
An example is shown in Figure 44-5 for a group of patients who received a meniscus transplant.48 The data were also expressed in the body of the text as “the mean preoperative patient perception score of 3.2 points (range, 1–6 points) improved to a mean of 6.2 points (range, 1–9 points) at follow-up (P = .0001). Two patients rated the knee condition as the same, and two as worse.”