Risk Assessment and Quality Improvement

Published on 13/02/2015 by admin

Filed under Cardiothoracic Surgery

Last modified 13/02/2015

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 2 (3 votes)

This article have been viewed 2017 times

Chapter 41 Risk Assessment and Quality Improvement

Despite continued advances in perioperative care, cardiac surgery carries an appreciable risk for serious morbidity and for mortality. At the same time, there is a strong demand by the public, regulatory bodies, professional organizations, and the legal system for evidence that healthcare is being delivered to a high standard. In some jurisdictions, this demand has led to the reporting of mortality figures for individual institutions and surgeons.1 However, the likelihood of adverse outcomes for patients is highly dependent on their preoperative risk profiles. For example, the mortality risk provided in Chapter 10 for isolated mitral valve surgery is 6%. However, an otherwise well 50-year-old male undergoing mitral valve repair for severe mitral regurgitation has a mortality risk (predicted using the logistic EuroSCORE; see subsequent material) of 1.5%.

By contrast, a 70-year-old female, with poor left ventricular function, pulmonary hypertension, and renal failure, undergoing the same operation has a mortality risk of 35%. Thus, it is impossible to draw conclusions regarding the mortality figures of surgeons and institutions without first appreciating the risk profiles of their case mixes; hence, the role of scoring systems in which mortality rates are predicted on the basis of risk profiles. In the first part of this chapter, the applications, uses, and limitations of risk-adjusted scoring systems for patients undergoing cardiac surgery are discussed. In the second part, quality improvement in the intensive care unit (ICU) is reviewed.


General Versus Specific Risk-Scoring Systems

Risk-scoring systems may be classified as either general or specific. General scoring systems are widely used in ICU practice. Examples include the Acute Physiology and Chronic Health Evaluation (APACHE), which has undergone four iterations (I through IV); Mortality Probability Models (MPM) II; and the Simplified Acute Physiology Score (SAPS) II. With APACHE IV, mortality risk is calculated on the basis of multiple factors, including (1) acute derangements of 17 physiologic variables measured within 24 hours of ICU admission; (2) the patient’s age; (3) the presence of chronic, severe organ failure involving the heart, lungs, kidneys, liver, or immune system; (4) the admission diagnosis; (5) the pre-ICU length of hospital stay. (Downloadable risk calculators for APACHE IV are available from www.criticaloutcomes.cerner.com.) Although useful for most ICU patients, these general scoring systems are of limited value for cardiac surgery patients because many of the acute physiologic changes that occur in the first 24 hours are related to the effects of surgery and cardiopulmonary bypass and do not reflect patients’ risk for dying or suffering morbid events. Although APACHE IV has a mortality-prediction algorithm for use in patients who have had coronary artery bypass graft (CABG) surgery using postoperative day 1 variables, it may not be valid outside the United States and is not designed for use in patients undergoing other cardiac operations.2 General scoring systems are not discussed further.

Specific risk-scoring systems apply to particular diseases, operations, or groups of operations. Adult cardiac surgery, which has a small number of operations, a large number of patients, and a relatively high incidence of complications, is well suited to risk-adjusted scoring. A number of cardiac risk-scoring systems are in use, including the European System for Cardiac Operative Risk Evaluation (EuroSCORE), the French score, the Cleveland Clinic score, the Ontario Province Risk Score, the Parsonnet score, and the score based on the Society of Thoracic Surgeons (STS) National Cardiac Surgery Database.

Principles of Cardiac Risk-Scoring Systems

To appreciate the strengths and weaknesses of risk-scoring systems, it is important to consider how they are developed and validated.35

Development of Risk-Scoring Systems

The most widely used statistical tool for developing risk models is logistic regression.6 First, univariate regression analysis is used to identify statistical associations between predictor variables and the outcome variable. From these variables, a multivariate logistic regression model is developed in which independent predictor variables are identified and included in a regression equation of the following general form:

(41-1) image

Where Y is the outcome, β0 is the regression constant, X1 is a predictor variable, and β1 is a constant, reflecting the weighting given to X1. β0 is typically negative (e.g., −4.789594 for the EuroSCORE), whereas β1, β2, β3, etc., are typically positive and may be greater than 1. Outcome is calculated either directly, using the regression equation (logistic score) or by converting the β coefficients to integer values, which are then summed (additive score). Death is the primary outcome variable of cardiac risk scores. Some scores are also designed for morbidity prediction. Predictor variables must be chosen carefully. A large number of variables with multiple values (e.g., myocardial infarction within <6 hours; 6 to 24 hours; 1 to 7 days; 8 to 21 days; >21 days) may provide a highly accurate model but increases the likelihood of there being missing or incorrect data, leading to erroneous predictions. Variables that are likely to have missing data or are subject to observer bias should be removed from the model, even if they are independent outcome predictors. The number of predictor variables is limited by the size of the dataset from which the model is developed. As a rule of thumb, the number of deaths must be 10 times the number of predictor variables. Thus, for a mortality rate of 3%, a database of more than 5000 patients is required to develop a model with 15 predictor variables. With cardiac risk models, the number of variables ranges from 6 using the Ontario score to more than 20 using the STS score.

Other than database size, additional important issues are the time the data were collected, the geographic location of the patients in the database, whether the database is single-center or multicenter, and whether the data were collected prospectively or retrospectively (Table 41-1). For instance, the Parsonnet score—the first widely used cardiac risk-scoring system—was originally developed from a retrospective database of 3500 patients operated on between 1982 and 1987 in the United States. Given the changes in patient populations and surgical management since then, outcome predictions based on this patient population may no longer be valid, particularly outside the United States.

Validating Risk-Scoring Systems

Once a risk model has been developed it must be validated in other patients to see whether it functions for its intended purpose and how well it applies to various populations over time. There are two major statistical approaches to evaluating the predictive accuracy of a scoring system: calibration and discrimination.

Calibration refers to how well the model assigns appropriate risk. It is evaluated by dividing patients into groups (usually deciles) according to expected risk and then comparing expected to observed risk within those groups. A well-calibrated model shows a close agreement between the observed and expected mortality rates, both overall and within each decile of risk. Most cardiac risk models are well calibrated in the midrange but less so for patients at the extremes of risk, both high and low.7

Discrimination refers to how well the model distinguishes patients who survive from those who die (Fig. 41-1). It is evaluated by analyzing receiver operating characteristic (ROC) curves.8 ROC curves are plots of sensitivity (true positive rate) against 1− specificity (false positive rate) at various levels of predicted risk (cutoffs). They graphically display the tradeoff between sensitivity and specificity as shown in Figure 41-2. For randomly selected pairs of patients with different outcomes (survival or death), a model should predict a higher mortality risk for the patient who dies than for the patient who survives. The area under the ROC curve (c-statistic) varies between 0.5 and 1.0 and indicates the probability that a randomly chosen patient who dies has a higher risk than a randomly chosen survivor. A model with no discrimination has an area under the ROC curve of 0.5 (see Fig. 41-2, graph B), meaning that the chance of correctly predicting the death in a randomly chosen pair is 50%, which is no better than the toss of a coin. By contrast, a model with an area under the ROC curve of 1 (see Fig. 41-2, graph A) implies perfect discrimination, meaning that the model always assigns a higher risk to a randomly chosen death than to a randomly chosen survival. In such a situation, there is no tradeoff between sensitivity and specificity; they are always 100% at every percentage risk prediction. A good explanation of ROC curves with applets allowing graphic manipulation of the threshold and sample population separation is given at http://www.anaesthetist.com/mnm/stats/roc/index.htm.


Figure 41.2 Receiver operating characteristic (ROC) curves for mortality prediction. ROC curves are constructed by plotting sensitivity (true-positive fraction) against 1-specificity (false-positive fraction) for all levels (cutoffs) of predicted risk from 0% to 100%. Three curves are shown; they correspond to A, B, and C in Fig. 41-1. For curve A, as sensitivity rises there is no loss of specificity (and as specificity rises, there is no loss of sensitivity). This model has perfect discrimination, and the area under the ROC curve is 1.0 (see text). For curve B, as sensitivity rises there is loss of specificity by an exactly equal amount, and vice versa. This model has no discrimination, and the area under the ROC curve is 0.5 (see text). Curve C is more typical of cardiac risk scores. A sensitivity of 100% is associated with a specificity of about 30% (i.e., 1-specificity of 0.7). This is shown as point (i) on curve C. A specificity of 100% (i.e., 1-specificity of 0) is associated with a sensitivity of about 30%. This is shown as point (ii) on curve C. Point (iii) represents the risk score at which sensitivity and specificity are both optimal, in this case about 80% for each. The area under curve C is about 0.85, which represents a high degree of discrimination—most risk models in medicine and surgery perform well below this. Sensitivity and specificity are defined formally in Appendix 2.

The discriminatory ability of cardiac risk scores is limited by the presence of unknown independent predictors, random catastrophic events (e.g., a severe protamine reaction), difficulty in measuring complex clinical states, and factors that are rare but if present may be prognostically important (e.g., the presence of a pheochromocytoma in a patient undergoing CABG surgery). In validation studies, most cardiac risk scores for mortality have an area under the ROC curve of 0.7 to 0.80, indicating moderate discrimination.

Choice of Risk Models

The two most important cardiac surgery risk-scoring systems currently in use are the EuroSCORE and the system developed from the STS National Cardiac Surgery Database. The STS database was established in 1989 and is the largest cardiac surgery database in the world. In 2004 there were 638 contributing hospitals, and it contained information from more than 2 million surgical procedures.5,9 Participating hospitals submit their data to the STS on a biannual basis and receive detailed reports after analysis of the data from all participants. Reports contain risk-adjusted measures of mortality rates, which allow participating hospitals to benchmark their results against regional and national standards. The reports also provide data for research projects. Risk scoring is based on 24 patient-, cardiac-, and operation-related predictor variables. An online risk calculator is available from the STS website (www.sts.org) at The web calculator provides mortality predictions for five main operations: CABG surgery, aortic valve replacement, mitral valve replacement, CABG surgery plus aortic valve replacement, and CABG surgery plus mitral valve replacement. In addition, outcome predictions for five morbidities in CABG surgery are also provided: reoperation, permanent stroke, renal failure, deep sternal wound infection, and prolonged ventilation. Unlike other risk-scoring systems, the STS system has not released into the public domain the logistic regression equations it uses.

The EuroSCORE was developed from a prospective database of more than 19,000 patients involving 132 centers in eight European countries.10 Data were collected over a 3-month period in 1995. Two forms of the EuroSCORE have been developed—the additive score and the logistic score. Both are based on the same 17 predictor variables. The additive score is shown in Table 41-2. The EuroSCORE website (www.EuroSCORE.org) provides the β coefficients for the logistic regression equation and makes available online and downloadable risk calculators for the logistic score. In head-to-head comparisons, the EuroSCORE has been shown to have better discrimination than other scoring systems (i.e., a higher area under the ROC curve).1113 However, some investigators have found that the EuroSCORE (both additive and logistic) tends to overestimate observed mortality risk across a range of risk profiles (i.e., limited calibration).12,14,15 This may indicate that the EuroSCORE should be recalibrated to account for changes in patients and practices since it was developed in the 1990s. By contrast, other investigators have found that the additive EuroSCORE tends to underestimate observed risk in high-risk patients16,17 and in patients undergoing combined valve and CABG surgery.18

Table 41-2 The Additive EuroSCORE

  Patient-Related Factors Score
Age Per 5 years or part thereof over 60 years 1
Sex Female 1
Chronic pulmonary disease Long-term use of bronchodilators or steroids for lung disease 1
Extracardiac arteriopathy Any one or more of the following: claudication, carotid occlusion or >50% stenosis, previous or planned intervention in the abdominal aorta, limb arteries, or carotids 2
Neurologic dysfunction disease Severely affecting ambulation or day-to-day functioning 2
Previous cardiac surgery Requiring opening of the pericardium 3
Serum creatinine Greater than 200 m μ/l preoperatively 2
Active endocarditis Patient still under antibiotic treatment for endocarditis at the time of surgery 3
Critical preoperative state Any one or more of the following: ventricular tachycardia or fibrillation or aborted sudden death, preoperative cardiac massage, preoperative ventilation before arrival in the anesthetic room, preoperative inotropic support, intraaortic balloon counterpulsation or preoperative acute renal failure (anuria or oliguria <10 ml/hour) 3
  Cardiac-Related Factors Score
Unstable angina Rest angina requiring IV nitrates until arrival in the anesthetic room 2
LV dysfunction Moderate or LVEF 30% to 50% 1
  Poor or LVEF <30 3
Recent myocardial infarction Within 90 days 2
Pulmonary hypertension Systolic PA pressure >60 mmHg 2
  Operation-Related Factors Score
Emergency Carried out on referral before the beginning of the next working day 2
Other than isolated CABG Major cardiac procedure other than or in addition to CABG 2
Surgery on thoracic aorta For disorder of ascending, arch, or descending aorta 3
Post myocardial infarction septal rupture   4

CABG, coronary artery bypass graft; LV, left ventricular; LVEF, left ventricular ejection fraction; PA, pulmonary arterial.

Note: If a risk factor is present in a patient, a score is assigned. The weights are added to give an approximate percent predicted mortality.