Severity of illness and likely outcome from critical illness

Published on 27/02/2015 by admin

Filed under Anesthesiology

Last modified 22/04/2025

Print this page

This article have been viewed 1614 times

Chapter 3 Severity of illness and likely outcome from critical illness

Mark Palazzo

At present scoring systems are not sufficiently accurate to make outcome predictions for individual patients.

Clinical assessment of severity of illness is an essential component of medical practice. It influences the need and speed for supportive and specific therapy. Initial acuity may also indicate likely prognosis when other factors such as comorbidity and organisational aspects of critical care delivery are considered. It is intuitive to consider whether patterns and severity of physiological disturbance can predict patient outcome from an episode of critical illness.

Perhaps the earliest reference to grading illness was in an Egyptian papyrus which classified head injury by severity.¹ More recently it has been the constellation of physiological disturbances for specific conditions which popularised an approach which linked physiological disturbance to outcome in the critically ill. Examples include the Ranson score for acute pancreatitis,² the Pugh modification of Child–Turcotte classification for patients undergoing portosystemic shunt surgery and now widely used for classification of end-stage liver disease,³ Parsonnet scores for cardiac surgery⁴ and the Glasgow Coma Score (GCS) for acute head injury.⁵ The earliest attempts to quantify severity of illness in a general critically ill population was by Cullen et al.⁶ who devised a therapeutic intervention score as a surrogate for illness. This was followed in 1981 with the introduction of the Acute Physiology, Age and Chronic Health Evaluation (APACHE) scoring system by Knaus et al.⁷ Since then numerous scoring systems have been designed and tested in populations across the world.

The potential advantages of quantifying critical illness include:

• providing a common language for discussion

• provision of risk-adjusted expected mortality rates facilitating acuity comparisons for clinical trials

• estimates of prognosis

• providing a method by which critical care practice and processes can be examined

The least controversial use for scoring systems has been as a method for comparing patient groups in clinical trials. While this seems to have been widely accepted by the critical care community, there has been less enthusiasm to accept the same systems as comparators for between-unit and even between-country performances, unless of course your performance is good. Most clinicians agree that scoring systems have limited value for individual patient decision pathways, although recently an APACHE II score above 25 under appropriate circumstances has been included in the guidelines for the administration of recombinant human activated protein C for severe sepsis and septic shock. Much of the sceptical views among clinicians are based on studies which show poor prognostic performance from many of the models proposed.⁸^–¹⁴ The fundamental problem for these scores and prognostic models is poor calibration for the cohort of patients studied, often in countries with different health service infrastructures, to where the models were first developed. Other calibration issues arise either because there is poor adherence to the rules for scoring methodology by users, or patient outcomes improve following introduction of new techniques and treatments or models failed to include important prognostic variables. For example it has become clearer that prognosis is as much affected by local organisation, patient pathways, location prior to admission and preadmission state as it is by acute physiological disturbances.^15,¹⁶

Score systems would be better calibrated if developed from a narrow number of countries with similar health services. However this limits their international usefulness for clinical studies. The latter can be improved by score systems developed from a wider international cohort; however when used for an individual country it can be expected to calibrate poorly. Simplified Acute Physiology Score 3 (SAPS 3: developed across the globe) has provided customisation formulae so that the risk-adjusted expected mortality can be related to the geographical location of the unit.¹⁶

Inevitably, as advances occur, risk-adjusted mortality predictions become outdated with some old models overestimating expected mortality ^14,¹⁷ and others underestimating observed mortality.¹⁸ The designers of the scoring systems have recognised the changing baselines and review the models every few years, Table 3.1 outlines some characteristics of the upgraded systems.

Table 3.1 Revision dates for the most common internationally recognized risk adjusted models for mortality prediction

FACTORS INDICATING RISK AND SEVERITY OF ILLNESS THAT MIGHT HELP PREDICT LIKELY OUTCOME

• Degree of physiological disturbance

• Primary pathological process causing the physiological disturbance

• Patient’s physiological reserve specific, comorbid states and age

• Source of admission and mode of presentation

• Organ support prior to admission

• Unit organisation and processes

PHYSIOLOGICAL DISTURBANCE

An insult which potentially interferes with normal organ function is usually followed by increasing compensatory activity in order to retain vital organ activity. Most compensatory mechanisms are mediated through the endocrine and autonomic nervous system directed to maintain effective circulating volume, oxygenation and acid–base homeostasis to ensure normal mitochondrial function and vital organ function. Therefore hyperventilation, tachycardia, vasoconstriction and consequent oliguria – all signs of compensation – are hallmarks of early untreated critical illness. Once overwhelmed, the compensatory mechanisms lead to signs of decompensation such as hypotension, progressive coma, icterus and metabolic acidosis.

Most organs have limited ways in which they manifest their dysfunction in response to a systemic illness, for example the brain responds by becoming confused, developing seizures or progressive coma, while respiratory function is limited to hyperventilation, hypoventilation, wheezing or coughing with commensurate changes in blood gases. Therefore it is not surprising that most systemic pathophysiological processes result in common acute physiological disturbances. Consequently severity of illness can be assessed from a limited number of vital sign and biochemical observations. However it is not so clear what the magnitude of a response should be for a given insult. Therefore severity of illness for most conditions has been traditionally measured by the magnitude of physiological response rather than size of insult. The physiological response is further confounded by:

• Natural variability between patients

• Variable physiological reserve between patients

• Non-linear organ dysfunction in response to an insult, for example both the liver and kidney only manifest biochemical abnormality when a significant proportion of organ mass is malfunctioning

• Poor understanding of relative equivalence of degrees of malfunction between organs, e.g. what degree of jaundice is equivalent to a tachycardia 130 beats/min as an objective measure of severity of illness?

• Impact of organ support on physiological measurements, e.g. inotropes

Some scoring systems such as Mortality Probability Model (MPM II₀) and SAPS 3 estimate severity of illness on or near admission to intensive care in order to avoid the confounding effect of supportive therapy.

PRIMARY PATHOLOGICAL PROCESS

The primary pathology leading to intensive care admission has a significant influence on prognosis. Therefore, for a given degree of acute physiological disturbance, the most serious primary pathologies or underlying conditions are likely to have the worst predicted outcomes. For example, similar degrees of respiratory decompensation in an asthmatic and a patient with haematological malignancy are likely to lead to different outcomes. Furthermore the potential reversibility of a primary pathological process, whether spontaneous or through specific treatment, also greatly influences outcome, e.g. patients with diabetic ketoacidosis can be extremely unwell but insulin and volume therapy can rapidly reverse the physiological disturbance.

Both APACHE and the most recent SAPS systems include diagnostic categories with different weightings which improve the precision of estimated risk of hospital death calculations.

PHYSIOLOGICAL RESERVE, AGE AND COMORBIDITY

Physiological reserve is a surrogate term which broadly combines age and health status prior to critical illness. Age may be associated with diminishing physiological capacity but not in a predictable fashion.

• Chronological age alone is not a very strong influence on outcome.

• Biological age is a vague term usually used to imply physiological reserve below that expected for a patient’s chronological age. Biological age greater than chronological age is common in patients who smoke heavily, abuse alcohol or who have insidious systemic diseases such as diabetes and hypertension; these patients typically have reduced reserve of one or more organs.

Chronic health states such as immunosuppression, cirrhosis, cancer and haematological malignancies all result in significant diminution of physiological reserve and may have an overwhelming influence on outcome. These conditions are commonly included in any assessment of critical illness.

SOURCE OF ADMISSION AND MODE OF PRESENTATION

Patients arriving in intensive care either come as an emergency or for a variety of reasons come following elective surgery. There are a very small number of patients who come to intensive care for elective medical reasons. By its very nature emergency admission implies that patients are likely to be unstable and that the acute physiological disturbances are in the process of being managed. Most scoring systems quantifying risk of death include an adjustment for emergency admission. It has become widely recognised that the source of admission also influences the likely outcome. This might in part be because it increases the likelihood of patients having resistant organisms, such as those admitted from a health care environment.

ORGAN SUPPORT PRIOR TO ADMISSION

Many patients may arrive in ICU already ventilated, and receiving inotropes. Assessment of severity of illness at this stage when some physiological abnormalities have been corrected would make a physiologically based assessment underscore. Therefore either the assessment has to be adjusted to a time when no organ support was provided or some allowance has to be made for the support in the assessment. The approach to this has been varied; for example, SAPS 3 makes an adjustment for patients on inotropes, whereas MPM II allows measurements for the hour on either side of admission to be included.

UNIT ORGANISATION AND PROCESSES

Soon after the introduction of APACHE II it was recognised that units with effective teams, nursing and medical leadership, good communications and run by dedicated intensive care specialists potentially had better outcomes than those without such characteristics.¹⁹^–²¹

OTHER FACTORS

Some factors have not been included in risk-adjusted models for prediction of mortality; these include socioeconomic and genetic variables. However SAPS 3 has included a customisation adjustment to calibrate their model for patients in different parts of the world, possibly taking account of local factors and health care systems.

RISK-ADJUSTED EXPECTED OUTCOME AND ITS MEASUREMENT

Physiological disturbance, physiological reserve, pathological process and mode of presentation can be related to expected outcome by statistical methods. Common outcome measures for clinical trials are ICU mortality, or mortality at 28 days; however most models used for risk-adjusted mortality prediction are based on hospital mortality. Patient morbidity might also be considered nearly as important an endpoint as mortality, since many patients survive with serious functional impairment.^22,²³ For socioeconomic reasons there would be a strong argument to consider 1-year survival and time to return to normal function or work as endpoints.^24,²⁵ These latter measures however have been more closely related to chronic health status.

Hospital mortality is the most common outcome measure because it is frequent enough to act as a discriminator and is easy to define and document.

PRINCIPLES OF SCORING SYSTEM DESIGN

CHOICE OF INDEPENDENT PHYSIOLOGICAL VARIABLES AND THEIR TIMING

The designers of the original APACHE and SAPS systems chose variables which they felt would represent measures of acute illness. The chosen variables were based on expert opinion weighted equally on an arbitrary increasing linear scale, with the highest value given to the worst physiological value deviating from normal.^26,²⁷ Premorbid conditions, age, emergency status and diagnostic details were also included in these early models and from these parameters a score and risk of hospital death probability could be calculated. Later upgrades to these systems, SAPS 2, APACHE III and the MPM,²⁸ used logistic regression analysis to determine the variables which should be included to explain the observed hospital mortality. Variables were no longer given equal importance but different weightings and a logistic regression equation were used to calculate a probability of hospital death. The more recent upgrades, APACHE IV and SAPS 3, have continued to use logistic regression techniques to identify variables that have an impact on hospital outcome.

The extent of physiological disturbance changes during critical illness. Scoring systems therefore needed to predetermine when the disturbance best reflected the severity of illness which additionally facilitated discrimination between likely survivors and non-survivors. Most systems are based on the worst physiological derangement for each parameter within the first 24 hours of ICU admission. However some systems, such as MPM II₀, are based on values obtained 1 hour either side of admission; this is designed to avoid the bias that treatment might introduce on acute physiology values.²⁹

DEVELOPING A SCORING METHODOLOGY AND ITS VALIDATION

All the scoring systems have been based on a large database of critically ill patients, usually derived from at least one country and from several ICUs (Table 3.2). Typically, in the more recent upgrades of the common scoring systems, part of the database is used to develop a logistic regression equation with a dichotomous outcome – survival or death, while the rest of the database is used to test out the performance of the derived equation. The equation includes those variables which are statistically related to outcome. Each of these variables is given a weight within the equation. The regression equation can be tested, either against patients in the developmental dataset using special statistical techniques such as ‘jack-knifing’ and ‘boot-strapping’ or against a new set of patients – the validation dataset – who were in the original database but not in the developmental dataset. The aim of validation is to demonstrate that the derived model from the database can be used not only to measure severity of illness but also to provide hospital outcome predictions.

Table 3.2 Ability of scores to discriminate correctly between survivors and non-survivors when tested on similar casemixes. A value of 1 represents perfect prediction

Score	Area under ROC curve
APACHE II	0.85
APACHE III	0.90
SAPS 2	0.86
MPM II₀	0.82
MPM II₂₄	0.84
SAPS 3	0.84
APACHE IV	0.88

ROC, receiver operating characteristic; APACHE, Acute Physiology, Age and Chronic Health Evaluation; SAPS, Simplified Acute Physiology Score; MPM, Mortality Probability Model.

Once a satisfactory equation has been developed it can be used to calculate a probability of death for an individual patient. Similarly an overall probability of death can be calculated for a group of patients; however, this methodology can not indicate which of the patients in the cohort is going to die. These models are not powerful enough to provide sufficiently accurate discrimination.

In a perfect model the aim would be that:

• Overall predicted and observed outcomes should be the same.

• Individual patients observed to die or survive have been predicted.

The performance of a mortality prediction model used on a cohort of patients other than the developmental set is usually judged by two functions: first, its ability to predict which patients will survive and which will die (discrimination) and second, how well a model correctly predicts the overall observed mortality (calibration). The appendix shows some commonly calculated measures for a scoring system.

DISCRIMINATION

The discriminating power of a model can be determined by defining a series of threshold probabilities of death such as 50, 70, 80% above which a patient is expected to die if their calculated risk of death exceeds these threshold values and then comparing the expected number of deaths with what was observed in those patients at the various probability cut-off points. For example, the APACHE II system revealed a misclassification rate (patients predicted to die who survived and those predicted to survive who died) of 14.4, 15.2, 16.7 and 18.5% at 50, 70, 80 and 90% cut off points above which all patients with such predicted risks were expected to die. These figures indicate that the model predicted survivors and non-survivors best i.e., discriminated likely survivors from non-survivors when it was assumed that any patient with a risk of death greater than 50% would be a non-survivor.

A conventional approach to displaying discriminating ability is to plot sensitivity, true positive predictions on the y-axis against false-positive predictions (1 – specificity) on the x-axis for several predicted mortality cut-off points, and producing a receiver operator characteristic curve (ROC) (Figure 3.1).

Figure 3.1 A receiver operator curve (ROC) plots true-positive against false-positive rates for a series of cut-off points for risk of death. For example, a risk of death cut-off point of 10% would predict that all patients with a risk greater than 10% will die and all those below will survive. This would be compared with the observed rates in those patients. The prediction would be expected to be frequently wrong and would reflect itself in the calculation of true-positive and false-positive rates. These calculations would represent one point on the ROC curve. The exercise is repeated at different cut-off points, such as 15, 20, 25, 30, from which a curve can be constructed. The resulting area under the curve (AUC) reflects the ability of the model to predict survival correctly. This is a measure of discriminatory power. The best models have values greater than 0.85.

The ROC area under the curve (AUC) summarises the paired true-positive and false-positive rates at different cut-off points (Figure 3.1) and provides a curve which defines the overall discriminating ability of the model. A perfect model would show no false positives and would therefore follow the y-axis and has an area of 1, a model which is non-discriminating would have an AUC 0.5, whereas models which are considered good would have AUC greater than 0.8. The AUC can therefore be used for comparing discriminating ability of severity of illness predictor models.

CALIBRATION

A model with good calibration is one that for a given cohort predicts a similar percentage mortality as that observed. (Hosmer–Lemeshow goodness-of-fit C statistic compares the model with the patient group.)³⁰

Observations with many models have revealed that, unless the casemix of the test patients is similar to that used to develop the model, the models may underperform due to poor calibration. This is particularly true when the testing is done for patients in different countries.^12,^16,^31,³²

COMMONLY USED SCORING SYSTEMS

GLASGOW COMA SCORE

Introduced in 1974 to quantify level of consciousness after the first 6 hours of head injury, it individually scores best eye opening, verbal and motor responsiveness, provides an overall scale between 3 (profound coma) and 15 (normal alert state). The Glasgow Coma Score (GCS) has been included in a number of general severity-of-illness scoring systems.

Its main characteristics are:

• Consistency between expert and non-expert observers and has been adopted worldwide.^33,³⁴

• Consistency has allowed head-injury management protocols to be based on the initial scale at presentation; decision-making is based on GCS trends.

• When combined with age it provides some assessment of prognosis.

The standard form of GCS is inapplicable to infants and children below the age of 5 years and has been modified to recognise that the expected normal verbal and motor responses must be related to the patient’s age.³⁵

THERAPEUTIC INTERVENTION SCORING SYSTEM (TISS)

Introduced in 1974 with the aims of estimating severity of illness, the burden of work for ICU staff and nursing resource allocation,⁶ it requires the daily collection of 76 listed items, primarily interventions or treatments, although a cut-down version has been suggested – TISS 28.^36,³⁷

Its main characteristics are:

• good indicator of nursing and medical work

• poor measure of severity of illness

• successfully used as a method of accountancy through allocation of average costs per point

ACUTE PHYSIOLOGY AGE AND CHRONIC HEALTH EVALUATION (APACHE) SYSTEMS I–IV

In 1981 Knaus and colleagues described APACHE, a physiologically based classification system for measuring severity of illness in groups of critically ill patients. They suggested it could be used to control for casemix, compare outcomes, evaluate new therapies and study the utilisation of ICUs. APACHE II, a simplified version, was introduced in 1985 ²⁷ and, although superseded by APACHE III in 1991,³⁸ APACHE II has remained the most widely studied and extensive used severity-of-illness scoring system. APACHE IV was introduced in 2006 and, like APACHE, remains a proprietary system. Consequently, worldwide it is predominantly APACHE II rather than the later versions which is continued to be used for reporting severity of illness.

APACHE II was developed and validated on 5030 non-coronary artery bypass or burns patients admitted to ICUs in the USA.

It is the sum of three components:

1 an acute physiology score (APS)

2 a chronic health score based on defined premorbid states

3 a score based on the patient’s age

The 12 variables of the APS and their relative weights were decided by expert opinion. These variables are collected in the first 24 hours after admission to intensive care and should represent the worst physiological values. The APACHE II score can be included in a logistic regression equation with a coefficient for one of 50 diagnostic categories representing the reason for admission and a factor for emergency surgery to provide a risk of death probability for an individual.

APACHE II functions best when the ICU patient cohort is similar to the original database used for its development and as expected, is less well calibrated when used for cohorts with limited diagnostic categories or in countries not represented in the developmental database. It has been noted that as critical care management and organisation improve, this older scoring system has tended to overestimate mortality predictions. Modifying old scoring systems does not readily correct the calibration problems, hence the need to develop upgraded systems from completely new databases.

MODIFICATIONS OF APACHE II

Bion et al.³⁹ and Chang et al.⁴⁰ separately were the first to explore the possibility of using APACHE II in a dynamic scoring system. The former used a modified APACHE II in the Sickness Score System, in which the day 1 score was compared with day 4 and risk of mortality was predicted. Chang et al., on the other hand, used the product of daily APACHE II scores with a modified Organ Failure Score and calculated thresholds above which individual patient mortality could be predicted.⁴⁰

APACHE III

APACHE III, based on a larger reference database, was introduced in 1991 and was designed:

• to improve prognostic estimates by re-evaluating the selection and weighting of physiological variables

• to examine how outcome is related to patient selection for ICU admission and its timing

• to clarify the distinction between using the APACHE scoring system to stratify by risk of mortality within particular patient groups and using it to make individual estimates of mortality

Characteristics of the new system

• The database is threefold larger (17 440 patients) from 40 US hospitals equally divided between developmental and validation groups.

• Exclusions included admission for less than 4 hours, age less than 16 years, burn injuries or patients admitted with chest pain.

• Coronary artery bypass patients were a separate group.

• Seventeen physiological variables and their weights were chosen through statistical analysis. There were 78 diagnostic categories.

• Treatment location immediately prior to ICU admission was included.

• A revised version of the GCS was used.

• Only those comorbidities which seem to affect the patient’s immune status were taken into consideration

• Chronic disease and age contributed 15% to total mortality risk, the rest being acute physiology.

APACHE III represents an advance over APACHE II with improved discriminatory power (ROC 0.9 versus 0.85) and better calibration.³⁸

Castella et al.⁴¹ reported that, whereas APACHE II proved better calibrated results than SAPS and MPM I in a mixed-patient cohort of 14 745 from European and American ICUs, it failed to be as discriminating or as well calibrated as APACHE III.

APACHE III not only aimed to provide a calculated risk of death based on the worst first-day values but also if calculated on subsequent days gave an updated risk of death calculation. The coefficients for the regression equations are not in the public domain, and this has made independent assessment of the predictive aspect of the scoring system more difficult.

APACHE IV

It was the failure of customisation techniques with APACHE III to account for observations in subgroup analysis that indicated that a new model might be needed. Further the authors of APACHE IV revealed that when APACHE III in its modified form was applied to patients collected between 2002 and 2003, measures of calibration were poor.¹⁷

In 2006 there was consequently a further upgrade to the APACHE system based on a new database of 104 US ICUs in 45 hospitals. The selected hospitals had the APACHE III computerised data collection and analysis system already installed. The database, drawn from 131 618 admissions, analysed 110 558 patients, 60% of whom were randomly selected to make up the developmental dataset. The analysed patients excluded patients admitted for less than 4 hours, patients with burns, those < 16 years old and those who had received organ transplants other than kidney or liver. In addition patients in hospital for more than a year or who had no APS on the first day were excluded. Only first admissions were counted and those admitted from another ICU were excluded. The statistical and modelling techniques included cubic regression splines which allow a non-linear relationship between variable and outcome. This technique was applied to age, acute physiology and prior length of stay.

Variables included in APACHE IV

• Age

• Retained variables and weights of APACHE III and worst value collected on day 1

• Chronic health status

• 116 diagnostic categories for reason for admission

• Hospital length of stay and location prior to admission

• Emergency surgery status

• Ability to determine GCS

• Mechanical ventilation

• PaO₂/FiO₂ ratio

• Thrombolytic therapy for acute myocardial infarction

A different series of variables was used to develop a model for coronary artery bypass graft patients.

The AUC ROC (see Table 3.2) derived from the model used on a validation dataset was 0.88, indicating very good discrimination.

The APACHE IV mortality prediction model (Table 3.3) is in the public domain and can be found at www.criticaloutcomes.cerner.com.

Table 3.3 The relative contribution of predictor variables (%) for hospital mortality in APACHE IV

Acute physiology	65.5
Diagnosis	16.5
Age	9.4
Chronic health	5.0
Admission source and previous length of stay	2.9
Mechanical ventilation	0.6

APACHE, Acute Physiology, Age and Chronic Health Evaluation.

The APACHE IV system has not been tested outside the USA and therefore may not be appropriately calibrated. Indeed, this may also be the case within the USA, given the selected units used for the database.⁴²

SIMPLIFIED ACUTE PHYSIOLOGY SCORE (SAPS 1–3)

The SAPS was originally based on data derived from French ICUs, based almost entirely on acute physiological variables.²⁶ The 14 physiological variables chosen were based on expert opinion and the points ascribed to deviation of these variables from normal were arbitrary. Initially the score was not related to an equation for predicting probability of death, although later this was possible. Unlike APACHE II, this system did not include a diagnostic category or chronic health status as part of the estimate of severity of illness.

In 1993 SAPS 2 was introduced and this was based on European and North American patients ⁴³ The database contained 13 152 patients divided 65% and 35% between developmental and validation samples. Patients under 18 years, burns patients, coronary care and post cardiac surgery patients were excluded.

The weightings given to physiological derangements were derived from logistic regression analysis. This included 12 physiological variables and specific chronic health conditions such as the presence of acquired immunodeficiency syndrome (AIDS), haematological malignancies, cirrhosis and metastasis. Like SAPS, there was no requirement for inclusion of diagnostic groups to calculate probability of hospital mortality. The probability for hospital death could be readily calculated from a logistic regression equation based on APS and chronic health weightings. In the validation sample the area under the ROC curve was 0.86. It had equivalent calibration and discrimination to the APACHE III and MPM II systems. It is the most commonly used scoring system in Europe.

SAPS 3

SAPS 3 was introduced in 2005 ^15,¹⁶ and developed from a database of 16 784 patients from 303 ICUs from around the world, including South and Central America. The model used multilevel logistic regression equations based on 20 variables.

The variables were separated by the authors into those which were related to the period prior to admission, those concerning the admission itself and the acute physiological derangement (Table 3.4). These variables allow calculation of a SAPS 3 score which can be used to derive a risk of death from a logistic regression equation. Discrimination was good, with ROC AUC 0.848; however, the calibration varied depending on the geographical area tested. The best fits for the general SAPS 3 risk adjustment model were for northern European patients whereas the worst was for Central and South America, reflecting the number of patients used in the developmental dataset to derive the overall model. However the model can be customised with alternative equations to improve calibration for different regions of the world, and the authors suggest that customisation in future may allow within-country models.

Table 3.4 Factors considered in SAPS 3

Patient characteristics before admission	Circumstances surrounding admission	Acute physiological changes within 1 hour before and after admission
Age	Planned or unplanned	GCS
Comorbidities	Reason for admission (diagnostic group)	Bilirubin
Length of stay before ICU admission	Medical or surgical	Temperature
Hospital location before admission	Anatomical site of surgery	Creatinine
Vasoactive agents before admission	Acute infection at time of surgery	Leukocytes
		pH
		Systolic blood pressure
		Oxygenation and mechanical ventilation

SAPS, Simplified Acute Physiology Score; GCS, Glasgow Coma Score; ICU, intensive care unit.

The authors found that 50% of the explanatory power of the model for predicting hospital mortality was from patient characteristics prior to admission while circumstances surrounding admission and acute physiology parameters accounted for 22.5 and 27.5% respectively. Thus far experience with this model remains limited.

MORTALITY PREDICTION MODELS (MPM I AND II)

Introduced in 1985 to provide an evidence-based approach to constructing a scoring system,⁴⁴ the data were derived from a single US institution and included observations at the time of admission to ICU and within the first 24 hours. MPM I₀ was based on the absence or presence of some physiological and diagnostic features at the time of admission while a further prediction model, MPM I₂₄, was based on variables reflecting the effects of treatment at the end of the first ICU day. Unlike APACHE and SAPS systems, it does not calculate a score but computes the hospital risk of death from the presence or absence of factors in a logistic regression equation.

MPM II is based on the same dataset as SAPS 2.²⁸ The system is a series of four models which provide an outcome prediction estimate for ICU patients at admission and at 24, 48 and 72 hours. In common with the early APACHE and SAPS systems the models excluded burns, coronary care and cardiac surgery patients. The models were derived by using logistic regression techniques to choose and weight the variables, with the additional criterion that variables had to be ‘clinically plausible’.

MPM II₀ and MPMII₂₄ have similar discriminatory power to SAPS 2, with ROC and AUC of 0.82 and 0.84 respectively.

In a comparison between MPM II, SAPS 2 and APACHE III and the earlier versions of these systems, all the newer systems performed better than their respective older versions. However no system stood out as being superior to the others.⁴¹

POSSUM

In 1991 Copeland et al. introduced a scoring system in the UK as a tool for adjusting for the risk of mortality and morbidity.⁴⁵ Physiological and Operative Severity Score for the enumeration of Mortality and Morbidity (POSSUM) includes 12 acute physiological parameters divided into four grades measured at the time of surgery and variables related to the severity of surgery (Table 3.5).

Table 3.5 Factors assessed for POSSUM risk of death

Acute physiological variables at time of surgery	Variables related to operative severity
Age	Presence of malignancy
Cardiac signs and chest X-ray findings	Operative magnitude, e.g. minor, intermediate, major, major +
Respiratory signs and chest X-ray findings	Number of operations within 30 days
Systolic blood pressure	Blood loss per operation
Pulse rate	Peritoneal contamination
Glasgow Coma Score	Timing of operation, e.g. elective, emergency < 2 hours or < 24 hours
Urea
Sodium
Potassium
Haemoglobin
White cell count
Electrocardiogram

POSSUM, Physiological and Operative Severity Score for the enumeration of Mortality and Morbidity.

The POSSUM system was originally designed to predict risk of death by 30 days based on a logistic regression formula. However in 1996 Whiteley et al. noted that the POSSUM system overpredicted risk of death, particularly in those with very low risk (10%), by sixfold.⁴⁶ Whiteley et al. proposed that the POSSUM system was modified using the same parameters but a different logistic regression equation predicting risk of hospital mortality. This modification was named the Portsmouth-POSSUM (P-POSSUM) based on where it was developed. The P-POSSUM equation is more commonly used.

There have since been more modifications of the POSSUM logistic regression equation to obtain better calibration for other forms of surgery, e.g. vascular surgery (V-POSSUM), and Colon cancer resection (Cr-POSSUM).

POSSUM cannot be used for patients who do not have surgery and because the equation includes operative detail it cannot be used as a tool to make decisions as to whether surgery is undertaken. However it has become a very useful audit tool for surgeons who needed a risk adjustment tool for comparing their relative performances.

ORGAN FAILURE SCORES

These are simple prognostic values relating the number of failed organs and their duration to probability of mortality. The organ system failures (OSF) were defined for five organs in an all-or-nothing manner and the number of failures summed.⁴⁷ The outstanding observations were that:

• Single OSF lasting more than 1 day produced a hospital mortality rate of 40%.

• Two OSFs for more than 1 day (medical and surgical) increased rates to 60%.

• With three or more OSFs lasting more than 3 days, the mortality rate was 98%.

• Advanced chronologic age increased both the probability of developing OSF and the probability of death once OSF occurred.

Scores to take account of grades of dysfunction and supportive therapy have been proposed, including the multiple organ dysfunction score (MODS), which was based on specific descriptors in six organ systems (respiratory, renal, neurological, haematological, cardiovascular and hepatic). Progressive organ dysfunction was measured on a scale of 0–4; the intervals were statistically determined for each organ based on associated mortality. The summed score (maximum 24) on the first-day score was correlated with mortality in a graduated fashion.⁴⁸

ICU mortality was approximately:

• 25% at 9–12 points

• 50% at 13–16 points

• 75% at 17–20 points

• 100% at levels of > 20 points

Good discrimination was seen, with areas under the ROC curve of 0.936 in the development set and 0.928 in the validation set.

SEQUENTIAL ORGAN FAILURE ASSESSMENT (SOFA)

This score was originally associated with sepsis, It takes into account six organs (brain, cardiovascular, coagulation, renal, hepatic, respiratory) and scores organ function from zero (normal) to 4 (extremely abnormal). Experts defined the parameter intervals.⁴⁹

It was intended to provide the simplest daily description of organ dysfunction for use in clinical trials. It has the merit of including supportive therapy and, although increasing scores can be shown to be associated with increasing mortality, it was not designed for estimation of outcome probability. This simple method has become a popular method by which to track and describe patient changes in morbidity.

Logistic Organ, Dysfunction System (LODS) is an organ failure score that could be used for hospital outcome prediction.⁵⁰ Based on the patient cohort used to derive the SAPS 2 and MPM II systems, logistic regression was used on first-day data to propose an organ failure score. The LOD system identified 1–3 levels of organ dysfunction for six organ systems and between 1 and 5 LOD points were assigned to the levels of severity. The resulting LOD scores ranged from 0 to 22 points. Calibration and discrimination were good. LODS demonstrated that neurological, cardiovascular and renal dysfunction carried the most weight for predictive purposes followed by pulmonary and haematologic dysfunction, with hepatic dysfunction carrying the least weight. Unlike SOFA, the system takes into account the relative severity between organs and the degree of severity within an organ system by attributing different weights.

SCORES FOR INJURY AND TRAUMA

This is a relatively homogeneous group for assessment of severity of illness. There are two principal methods:

1 Injury Severity Score (ISS) scores the extent of anatomical injury.⁵¹ Based on the Abbreviated Injury Scale (AIS), which assigns a code and a value from 1 (minor) to 6 (unsurvivable) to six body regions, and incorporates a modification for blunt and penetrating injury,⁵² ISS is calculated from the sum of the squares of the highest AIS score in each of the three most severely injured body regions. (The square of values transformation results in a relatively linear relationship with mortality and other measures of severity.) The highest score is 5 (6 is fatal) in each body region and consequently the highest ISS is 75 (3×25).

2 Major trauma is defined as an ISS greater than 16 and this is associated with a greater than 10% risk of mortality. The ISS is a purely anatomical system that ignores physiological derangements or chronic health status.

3 Trauma Score (TS) is a physiologically based triage tool for use in the field, based on systolic blood pressure, capillary refill, respiratory rate and chest expansion and GCS. It could be used with anatomical and age data.⁵³

The Revised Trauma Score (RTS)⁵⁴ is based on disturbances in three variables: (1) GCS; (2) systolic blood pressure; and (3) respiratory rate. Each is coded between 1 and 4. The physiological disturbance was further modified by use of a coefficient to indicate a relative weighting for that organ.

The ISS and RTS individually had flaws as indicators of outcome, but they were successfully combined by Boyd et al. to provide the Trauma Injury Severity Score (TRISS) methodology for outcome prediction. TRISS also included the presence of penetrating injury and age in its methodology.^51,^54,⁵⁵

It uses coefficients derived from the data of 30 000 injured patients, from which it calculates an estimated probability of survival for individual trauma victims. It also provides a comparative measure of quality of care by trauma centres, using expected and observed outcomes.

A Severity Characterisation of Trauma (ASCOT) was introduced to rectify perceived problems with TRISS.⁵⁶ There are more details on injuries in the same body region, more age subdivisions and the use of emergency room acute physiology details rather than field values. ASCOT predicts survival better than TRISS, particularly for blunt injury. Reluctance to use ASCOT derives from increased complexity for only a modest gain in predictive value.

APPLICATION OF SCORING SYSTEMS

Standardised mortality ratio (SMR) is the comparison of predicted with observed mortality rates. This is used as surrogate evidence for good quality of care and a value of 1 is considered normal, above 1 worse than normal, below 1 better than normal but:

• standard deviations for SMR were not defined, therefore whether a unit significantly deviates from normal is unquantifiable

• SMR should be considered in the context of casemix and calibration ³²

• ideally samples should be very large and of a similar casemix as the original development database

The introduction of scoring systems has brought some standardisation and a new language that allows clinicians to describe the physiological disturbance of their casemix relatively accurately.

Such descriptions have led to numerous potential uses, including:

• stratification of patients for clinical trials

• comparison of predicted and observed outcomes

• relating resource allocation to severity of illness at presentation

• predicting length of stay for a cohort ⁵⁷

Decision-making for an individual patient based on the predictions of scoring systems is widely considered an inappropriate use because such systems are unable to discriminate patient outcome with certainty. The logistic regression equations only provide a probability for a dichotomous event, such as death or survival, and therefore they have no potential use as a guide to further treatment or limitation orders for an individual. When APACHE II was tested against an identical casemix to that from which it was derived and used as a predictor of outcome for individuals it at best had a misclassification rate of 15%.²⁷ The performance of these systems may be worse with different casemixes.⁵⁸^–⁶⁰

Attempts to correct for this have included recalibrating the APACHE II system ⁶¹ and the use of neural networks. Neural networks use ongoing patient data input to modify predictor equations continually. This approach theoretically gets closer and closer to predicting outcome for a specific casemix but it never reaches certainty.

While scoring systems might facilitate recognition of hopelessly ill patients it is likely that patient management decisions will remain based on clinical judgement for the foreseeable future.

There are good reasons for not using severity score predictions to guide management but they can be used as a valid benchmark method for stratifying comparable patient groups for clinical studies. This stratification is best based on estimates of risk of death.

Unfortunately, unless there is a strict protocol for the collection of data across critical care units comparison of performance based on scoring systems can be very misleading particularly if further confounded by casemix differences which alter calibration. Clinical trials can control the quality of their data more easily than is achieved on a day-to-day basis in individual ICUs. Consequently comparison of risk of death and outcome between two patient groups might be more robust.

In order for between-ICU comparison to be representative of efficacy of care, some fundamental assumptions must be made:

• All pre ICU care is identical between hospitals and has no differential impact on ICU or hospital outcome.

• Patients in the different hospitals are drawn from the same population (casemix).

• The samples are large enough to obey the mathematical principles of logistic regression calculations.

• Data acquisition is both flawless with respect to the rules of the scoring system and consistent between units.

Although SMR may be misleading for between-unit comparisons as a measure of performance, it may remain a useful tool for within-ICU comparisons where it is assumed many of the confounding factors are the same. In such circumstances changes in ICU process or improvements prior to ICU admission while casemix remains the same may be revealed through improvements in SMR.

In the context of patient management scoring systems can be used as guidelines. A simple example is the use of a GCS cut-off value below or equal to 8 for elective intubation and ventilation of a patient before transfer from the scene of injury. This has contributed to a reduction in secondary brain injury.

More controversial is the recent protocol for administration of recombinant human activated protein C, which suggests that it is only administered to patients with APACHE II scores > 25.⁶²

REFERENCES