Chapter 9 The medical literature
This chapter is presented in three parts:
Part A: the emergency physician’s guide to basic statistics
The table can now be completed with simple arithmetic.
Accuracy = (TP + TN)/(TP + TN + FP + FN)
Below is the original question with a prevalence of 10%.
Note, the PPV has now increased dramatically. If your confidence in basic statistics has now grown (and your headache has gone), try different combinations and permutations of parameters to confirm the effect on PPV and NPV. To start with, review a paper or even work through the detail for an investigation you perform on a regular basis. You may never view basic statistics with fear again!
Part B: an overview of EBM
The context
This excerpt from an editorial by Davidoff et al. in the BMJ in 1995 still holds true today. EBM comprises the latest information on the most effective or least harmful management for patients (Davidoff et al., 1995). The key processes in EBM are:
Critical appraisal is ‘the process of systematically examining research evidence to assess its validity, results and relevance before using it to inform a decision’ (Mark, 2008). It allows the reader to assess in a systematic way how strong or weak a paper is in answering a clinically relevant question or whether the paper can be relied on as the basis for making clinical decisions for patients.
Most readers will not limit their literature search in terms of date of publication (when) unless they are confident that a treatment or test was developed only recently, or an unlimited search yields numerous citations, which become unmanageable. As there is often a time lag between study citations being added to searchable databases, it is worthwhile considering conducting a more recent targeted search in relevant journals if the topic area is rapidly evolving. Wang and Bakhai (2006) and Pocock (1983) provide excellent further reading in the area of clinical trials, as do Greenhalgh’s ‘Education and debate’ series from the BMJ (Greenhalgh, 1997a) and Gordon Guyatt’s 2000 focus series from the JAMA (Guyatt, 2000).
Before looking for individual studies, we recom mend a concerted search for metaanalyses, which offer a useful background perspective and, if one is lucky, may even answer the research question, using the summated ‘quality overall evidence’ available so far (Mark, 2008). Meta-analyses are formally designed and properly conducted critical appraisals of intervention trials that attempt to ‘aggregate’ outcome findings from individual studies if they show a consistent effect. The presence of compelling outcome effects, consistent in direction and size across individual non-clinically heterogeneous studies of acceptable methodological quality, will likely be enough to tell you whether a proposed treatment or diagnostic test will be suited to your patient.
Critically appraising a meta-analysis will still save you time and effort if many intervention trials or studies of diagnostic performance of a certain test relevant to your objective have been carried out. You need to ascertain whether the methodological rigour and quality of the meta-analysis is sufficient for conclusions and recommendations contained in the meta-analysis to reliably fulfil your objectives. If a well-conducted and reported meta-analysis is available, re-examining individual studies in detail is less worthwhile, other than for personal interest. However, a meta-analysis will not include influential studies that become available after the date of publication of the metaanalysis. Carrying out a date-of-publication limited search for newly emerging studies is thus recommended, to see whether the conclusions reached in the meta-analysis remain consistent with the newer studies. For in-depth information on systematic reviews in health care, see the standard text by Matthias Egger et al. (2001).
Critical appraisal and clinical practice
In emergency medicine, critical appraisal of the evidence is most pertinent to time-critical conditions that require non-established or contentious urgent treatments that may be highly beneficial but also lead to significant harm. For example, this situation arises in thrombolytic treatment for acute ischaemic stroke, where treatment administered within three hours of symptom onset gives better neurofunctional outcome, but remains little used for fear of causing intracranial bleeding. ECASS III, a recently published RCT comparing IV alteplase with a placebo in ischaemic stroke, found alteplase to remain beneficial at three to four and a half hours after symptom onset (Hacke et al., 2008). The most recent Cochrane meta-analysis of thrombolysis trials in stroke, published in 2003, did not include ECASS III (Wardlaw et al., 2003). Evidence is in a constant state of evolution, so critical appraisal is a continuing process that aligns itself with continuing medical education and professional development. Nowadays, studies informing on therapeutic (in) effectiveness are easily and rapidly accessible through user-friendly information technology media such as the 24-hour medical cybrary. With the exception of acute resuscitation, there is never an excuse not to evaluate effectiveness prior to patient treatment.
Levels of evidence
Intervention and non-intervention studies can be stratified into several ‘levels of evidence’, according to their internal validity and dependability in informing treatment effects. A well-designed and conducted meta-analysis or randomised controlled blinded treatment trial is widely recognised as being able to offer the most reliable and least biased estimate of treatment benefit or harm (Wang et al., 2006), followed in descending order of quality of evidence by observational non-intervention studies such as case control studies and finally case series and case reports. This is variously graded (e.g. levels I–IV or grade A–C recommendations) depending on the body utilising this. Several issues should be apparent at this stage:
A tool kit of EBM techniques
Critical appraisal of a single intervention study
Critical appraisal requires the following questions to be satisfactorily answered.
What is the research question?
P study participant characteristics at baseline, including disease severity; study setting
I experimental intervention or diagnostic test being investigated
C comparison or control group, usually the standard treatment/test, a placebo or usual care
O outcomes of interest; clinically meaningful for both the clinician and patient
T time period of the study observation or period of follow-up.
Are the study results likely to be valid?
Was the trial design valid?
Was the conduct of the trial valid?
Was the analysis of the trial findings valid?
Intention to treat analysis
All patients should be analysed in the group to which they were randomised. Loss to follow-up greater than 20%, especially if differentially distributed between groups, will lead to post-randomisation bias if intention to treat (ITT) analysis is not used. ITT analysis means that patients are analysed according to the treatment group to which they were random ised, irrespective of whether they underwent the intended intervention or whether they adhered to protocol stipulations. ITT analysis results in an unbiased estimate of effect and more closely reflects what happens in reallife clinical practice, where patients have a range of compliance with treatment recommendations. In contrast, per protocol analysis is biased, since it includes only comparisons between patients who adhere to the treatment allocated to them. If the tested treatment works, the measured effect of the same treatment in the same study will be greater in magnitude for per protocol analysis (where only compliant patients are included in the analy sis) compared with ITT analysis (where all patient outcomes are included in the analysis whether patients comply with the treatment or not).
Statistical method
The use of post-hoc subgroup analyses, unjustified multiple outcome or interim comparisons will likely lead to a false positive finding in a small study subset (the more analyses are done, the more the risk of a false positive finding). However, it is reasonable to conduct post-hoc analysis adjustments if results indicate the method initially chosen is no longer valid. For example, a study may have been designed expecting normally distributed data, but non-normal distribution of data is unexpectedly encountered. In this situation, non-parametric methods will be required. Interested readers are referred to standard texts (Kirkwood et al., 2003) and user-friendly articles (Greenhalgh 1997b; 1997c).
What are the results?
Measures of treatment effect