Severity of illness and likely outcome from critical illness

Published on 27/02/2015 by admin

Filed under Anesthesiology

Last modified 27/02/2015

Print this page

rate 1 star rate 2 star rate 3 star rate 4 star rate 5 star
Your rating: none, Average: 0 (0 votes)

This article have been viewed 1319 times

Chapter 3 Severity of illness and likely outcome from critical illness

At present scoring systems are not sufficiently accurate to make outcome predictions for individual patients.

Clinical assessment of severity of illness is an essential component of medical practice. It influences the need and speed for supportive and specific therapy. Initial acuity may also indicate likely prognosis when other factors such as comorbidity and organisational aspects of critical care delivery are considered. It is intuitive to consider whether patterns and severity of physiological disturbance can predict patient outcome from an episode of critical illness.

Perhaps the earliest reference to grading illness was in an Egyptian papyrus which classified head injury by severity.1 More recently it has been the constellation of physiological disturbances for specific conditions which popularised an approach which linked physiological disturbance to outcome in the critically ill. Examples include the Ranson score for acute pancreatitis,2 the Pugh modification of Child–Turcotte classification for patients undergoing portosystemic shunt surgery and now widely used for classification of end-stage liver disease,3 Parsonnet scores for cardiac surgery4 and the Glasgow Coma Score (GCS) for acute head injury.5 The earliest attempts to quantify severity of illness in a general critically ill population was by Cullen et al.6 who devised a therapeutic intervention score as a surrogate for illness. This was followed in 1981 with the introduction of the Acute Physiology, Age and Chronic Health Evaluation (APACHE) scoring system by Knaus et al.7 Since then numerous scoring systems have been designed and tested in populations across the world.

The potential advantages of quantifying critical illness include:

The least controversial use for scoring systems has been as a method for comparing patient groups in clinical trials. While this seems to have been widely accepted by the critical care community, there has been less enthusiasm to accept the same systems as comparators for between-unit and even between-country performances, unless of course your performance is good. Most clinicians agree that scoring systems have limited value for individual patient decision pathways, although recently an APACHE II score above 25 under appropriate circumstances has been included in the guidelines for the administration of recombinant human activated protein C for severe sepsis and septic shock. Much of the sceptical views among clinicians are based on studies which show poor prognostic performance from many of the models proposed.814 The fundamental problem for these scores and prognostic models is poor calibration for the cohort of patients studied, often in countries with different health service infrastructures, to where the models were first developed. Other calibration issues arise either because there is poor adherence to the rules for scoring methodology by users, or patient outcomes improve following introduction of new techniques and treatments or models failed to include important prognostic variables. For example it has become clearer that prognosis is as much affected by local organisation, patient pathways, location prior to admission and preadmission state as it is by acute physiological disturbances.15,16

Score systems would be better calibrated if developed from a narrow number of countries with similar health services. However this limits their international usefulness for clinical studies. The latter can be improved by score systems developed from a wider international cohort; however when used for an individual country it can be expected to calibrate poorly. Simplified Acute Physiology Score 3 (SAPS 3: developed across the globe) has provided customisation formulae so that the risk-adjusted expected mortality can be related to the geographical location of the unit.16

Inevitably, as advances occur, risk-adjusted mortality predictions become outdated with some old models overestimating expected mortality14,17 and others underestimating observed mortality.18 The designers of the scoring systems have recognised the changing baselines and review the models every few years, Table 3.1 outlines some characteristics of the upgraded systems.

PHYSIOLOGICAL DISTURBANCE

An insult which potentially interferes with normal organ function is usually followed by increasing compensatory activity in order to retain vital organ activity. Most compensatory mechanisms are mediated through the endocrine and autonomic nervous system directed to maintain effective circulating volume, oxygenation and acid–base homeostasis to ensure normal mitochondrial function and vital organ function. Therefore hyperventilation, tachycardia, vasoconstriction and consequent oliguria – all signs of compensation – are hallmarks of early untreated critical illness. Once overwhelmed, the compensatory mechanisms lead to signs of decompensation such as hypotension, progressive coma, icterus and metabolic acidosis.

Most organs have limited ways in which they manifest their dysfunction in response to a systemic illness, for example the brain responds by becoming confused, developing seizures or progressive coma, while respiratory function is limited to hyperventilation, hypoventilation, wheezing or coughing with commensurate changes in blood gases. Therefore it is not surprising that most systemic pathophysiological processes result in common acute physiological disturbances. Consequently severity of illness can be assessed from a limited number of vital sign and biochemical observations. However it is not so clear what the magnitude of a response should be for a given insult. Therefore severity of illness for most conditions has been traditionally measured by the magnitude of physiological response rather than size of insult. The physiological response is further confounded by:

Some scoring systems such as Mortality Probability Model (MPM II0) and SAPS 3 estimate severity of illness on or near admission to intensive care in order to avoid the confounding effect of supportive therapy.

PRINCIPLES OF SCORING SYSTEM DESIGN

CHOICE OF INDEPENDENT PHYSIOLOGICAL VARIABLES AND THEIR TIMING

The designers of the original APACHE and SAPS systems chose variables which they felt would represent measures of acute illness. The chosen variables were based on expert opinion weighted equally on an arbitrary increasing linear scale, with the highest value given to the worst physiological value deviating from normal.26,27 Premorbid conditions, age, emergency status and diagnostic details were also included in these early models and from these parameters a score and risk of hospital death probability could be calculated. Later upgrades to these systems, SAPS 2, APACHE III and the MPM,28 used logistic regression analysis to determine the variables which should be included to explain the observed hospital mortality. Variables were no longer given equal importance but different weightings and a logistic regression equation were used to calculate a probability of hospital death. The more recent upgrades, APACHE IV and SAPS 3, have continued to use logistic regression techniques to identify variables that have an impact on hospital outcome.

The extent of physiological disturbance changes during critical illness. Scoring systems therefore needed to predetermine when the disturbance best reflected the severity of illness which additionally facilitated discrimination between likely survivors and non-survivors. Most systems are based on the worst physiological derangement for each parameter within the first 24 hours of ICU admission. However some systems, such as MPM II0, are based on values obtained 1 hour either side of admission; this is designed to avoid the bias that treatment might introduce on acute physiology values.29

DEVELOPING A SCORING METHODOLOGY AND ITS VALIDATION

All the scoring systems have been based on a large database of critically ill patients, usually derived from at least one country and from several ICUs (Table 3.2). Typically, in the more recent upgrades of the common scoring systems, part of the database is used to develop a logistic regression equation with a dichotomous outcome – survival or death, while the rest of the database is used to test out the performance of the derived equation. The equation includes those variables which are statistically related to outcome. Each of these variables is given a weight within the equation. The regression equation can be tested, either against patients in the developmental dataset using special statistical techniques such as ‘jack-knifing’ and ‘boot-strapping’ or against a new set of patients – the validation dataset – who were in the original database but not in the developmental dataset. The aim of validation is to demonstrate that the derived model from the database can be used not only to measure severity of illness but also to provide hospital outcome predictions.

Table 3.2 Ability of scores to discriminate correctly between survivors and non-survivors when tested on similar casemixes. A value of 1 represents perfect prediction

Score Area under ROC curve
APACHE II 0.85
APACHE III 0.90
SAPS 2 0.86
MPM II0 0.82
MPM II24 0.84
SAPS 3 0.84
APACHE IV 0.88

ROC, receiver operating characteristic; APACHE, Acute Physiology, Age and Chronic Health Evaluation; SAPS, Simplified Acute Physiology Score; MPM, Mortality Probability Model.

Once a satisfactory equation has been developed it can be used to calculate a probability of death for an individual patient. Similarly an overall probability of death can be calculated for a group of patients; however, this methodology can not indicate which of the patients in the cohort is going to die. These models are not powerful enough to provide sufficiently accurate discrimination.

In a perfect model the aim would be that:

The performance of a mortality prediction model used on a cohort of patients other than the developmental set is usually judged by two functions: first, its ability to predict which patients will survive and which will die (discrimination) and second, how well a model correctly predicts the overall observed mortality (calibration). The appendix shows some commonly calculated measures for a scoring system.

DISCRIMINATION

The discriminating power of a model can be determined by defining a series of threshold probabilities of death such as 50, 70, 80% above which a patient is expected to die if their calculated risk of death exceeds these threshold values and then comparing the expected number of deaths with what was observed in those patients at the various probability cut-off points. For example, the APACHE II system revealed a misclassification rate (patients predicted to die who survived and those predicted to survive who died) of 14.4, 15.2, 16.7 and 18.5% at 50, 70, 80 and 90% cut off points above which all patients with such predicted risks were expected to die. These figures indicate that the model predicted survivors and non-survivors best i.e., discriminated likely survivors from non-survivors when it was assumed that any patient with a risk of death greater than 50% would be a non-survivor.

A conventional approach to displaying discriminating ability is to plot sensitivity, true positive predictions on the y-axis against false-positive predictions (1 – specificity) on the x-axis for several predicted mortality cut-off points, and producing a receiver operator characteristic curve (ROC) (Figure 3.1).

The ROC area under the curve (AUC) summarises the paired true-positive and false-positive rates at different cut-off points (Figure 3.1) and provides a curve which defines the overall discriminating ability of the model. A perfect model would show no false positives and would therefore follow the y-axis and has an area of 1, a model which is non-discriminating would have an AUC 0.5, whereas models which are considered good would have AUC greater than 0.8. The AUC can therefore be used for comparing discriminating ability of severity of illness predictor models.

COMMONLY USED SCORING SYSTEMS

GLASGOW COMA SCORE

Buy Membership for Anesthesiology Category to continue reading. Learn more here