CHAPTER 61 Psychiatric Epidemiology
Epidemiology is the study of the distribution and determinants of disease frequency in man. Epidemiological studies typically examine large groups of individuals, and by providing data on the distribution and frequency of diseases, they help describe the natural history of illness, assess service needs in the community or in special institutions, and shed light on the etiology of illness.
Epidemiology is based on two fundamental assumptions: first, that human disease does not occur at random, and second, that human disease has causal and preventive factors that can be identified through systematic investigation of different populations in different places or at different times. By measuring disease frequency, and by examining who gets a disease within a population, as well as where and when the disease occurs, it is possible to formulate hypotheses concerning possible causal and preventive factors.1
The frequency of disease or some other outcome within a population group is described using different concepts: the rate at which new cases are observed, or the proportion of a given population that exhibits the outcome of interest.
Incidence refers to the number of new events that develop in a population over a specified period of time (t0 to t1). If this incidence rate is described as the number of events (outcomes) in proportion to the population at risk for the event, it is called the cumulative incidence (CI), and is calculated by the following equation:
The denominator equals the total number of persons at risk for the event at the start of the time period (t0) without adjustment for any subsequent reduction in the cohort size for any reason, for example, loss to follow-up, death, or reclassification to “case” status. Therefore, CI is best used to describe stable populations where there is little reduction in cohort size during the time period of interest. An example would be a study of the incidence of major depressive disorder (MDD) in a residential program. If, at the beginning of the study, 8 of the 100 residents have MDD, and of the 92 remaining patients, 8 develop MDD over the next 12 months, the CI for MDD would be (8/92 × 100) = 8.7% for this period (i.e., 1 year). Note that the denominator does not include those in the population with the condition at t0, since they are not at risk for newly experiencing the outcome.
When patients are followed for varying lengths of time (e.g., due to loss to follow-up, death, or reclassification to “case” status) and the denominator value representing the population at risk changes significantly, incidence density provides a more precise measure of the rate at which new events occur. Incidence density (ID) is defined as the number of events occurring per unit population per unit time:
The denominator is the population that is actively at risk for the event, and is adjusted as people no longer belong in that pool. In a study of psychosis, for instance, if a person develops hallucinations and delusions, he or she becomes “a case” and no longer contributes to the denominator. Similarly, a person lost to follow-up would also contribute to the denominator only so long as he or she is being tracked by the study. To illustrate, suppose in a 100-person study of human immunodeficiency virus (HIV) infection, 6 people are lost to follow-up at the end of 6 months, and 4 develop HIV at the end of the third month, the person-years of observation would be calculated as follows: (90 × 1 year) + (6 × 0.5 year) + (4 × 0.25 year) = 94 person-years, and incidence density = (4 cases)/(94 person-years) = 4.26 cases/100 person-years of observation.
Prevalence is the proportion of individuals who have a particular disease or outcome at a point or period in time. In most psychiatric studies, “prevalence” refers to the proportion of the population that has the outcome at a particular point in time, and is called the point prevalence:
In stable populations, prevalence (P) can be related to incidence density (ID) by the equation P = ID × D, where D is the average duration of the disease before termination (by death or remission, for example). At times, the numerator is expanded to include the number of all cases, existing and new, in a specified time period; this is known as a period prevalence. When the period of interest is a lifetime, it is a type of period prevalence called lifetime prevalence, which is the proportion of people who have ever had the specified disease or attribute in their lifetime.
Lifetime prevalence is often used to convey the overall risk for someone who develops an illness, particularly psychiatric ones that have episodic courses, or require a certain duration of symptoms to qualify for a diagnosis (e.g., depression, anxiety, or posttraumatic stress disorder). In practice, however, an accurate lifetime prevalence rate is difficult to determine since it often relies on subject recall and on sampling populations of different ages (not necessarily at the end of their respective “lifetimes”). It is also an overall rate that does not account for changes in incidence rates over time, nor for possible differences in mortality rates in those with or without the condition.
There are a number of concepts that are helpful in the evaluation of assessment instruments. These involve the consistency of the results that the instrument provides, and its fidelity to the concept being measured.
Reliability is the degree to which an assessment instrument produces consistent or reproducible results when used by different examiners at different times. Lack of reliability may be the result of divergence between observers, imprecision in the measurement tool, or instability in the attribute being measured. Interrater reliability (Table 61-1) is the extent to which different examiners obtain equivalent results in the same subject when using the same instrument; test-retest reliability is the extent to which the same instrument obtains equivalent results in the same subject on different occasions.
Reliability is not sufficient for a measurement instrument—it could, for example, consistently and reliably give results that are neither meaningful nor accurate. However, it is a necessary attribute, since inconsistency would impair the accuracy of any tool. The demonstration of the reliability of an assessment tool is thus required before its use in epidemiological studies. The use of explicit diagnostic criteria, trained examiners to interpret data uniformly, and a structured assessment that obtains the same types of information from all subjects can enhance the reliability of assessment instruments.
There are several commonly used measures to indicate the degree of consistency between sets of data, which in psychiatry is often used to quantify the degree of agreement between raters. The kappa statistic (κ) is used for categorical or binary data, and the intraclass correlation coefficient (ICC, usually represented as r) for continuous data. Both measures have the same range of values (−1 to +1), from perfect negative correlation (−1), to no correlation (0), to perfect positive correlation (+1). For acceptable reliability, the kappa statistic value of 0.7 or greater is generally required; for the ICC, a value of 0.8 or greater is generally required.
Calculation of the kappa statistic (κ) requires only arithmetic computation, and accounts for the degree of consistency between raters with an adjustment for the probability of agreement due to chance. When the frequency of the disorder is very low, however, the kappa statistic will be low despite having a high degree of consistency between raters; it is not appropriate for the measurement of reliability of infrequent disorders.
where Po is the observed agreement and Pc is an agreement due to chance. Po = (a + d)/n and Pc = [(a + c)(a + b) + (b + d)(c + d)]/n2. Calculation of the ICC is more involved and is beyond the scope of this text.
Validity is a term that expresses the degree to which a measurement instrument actually measures what it purports to measure. When translating a theoretical concept into an operational instrument that purports to assess or measure it, several aspects of validity need to be accounted for.
For any abstract concept, there are an infinite number of criteria that one might use to assess it. For example, if one wants to develop a questionnaire to diagnose bipolar disorder, one should ask about mood, thought process, and energy level, but probably not whether the subject owns a bicycle. Content validity is the extent to which the instrument adequately incorporates the domain of items that would accurately measure the concept of interest.
Criterion validity is the extent to which the measurement can predict or agree with constructs external to the construct being measured. There are two types of criterion validity generally distinguished, predictive validity and concurrent validity. Predictive validity is the extent to which the instrument’s measurements can predict an external criterion. For instance, if we devise an instrument to measure math ability, we might postulate that math ability should be correlated to better grades in college math courses. A high correlation between the measure’s assessment of math ability and college math course grades would indicate that the instrument can correctly predict as it theoretically should, and has predictive validity. Concurrent validity refers to the extent to which the measurement correlates to another criterion at the same point in time. For example, if we devise a measure relying on visual inspection of a wound to determine infection, we can correlate it to a bacteriological examination of a specimen taken at the same time. A high correlation would indicate concurrent validity, and suggest that our new measure gives valid results for determining infection.
Construct validity refers to the extent to which the measure assesses the underlying theoretical construct that it intends to measure. This concept is the most complex, and both content and criterion validity point to it. An example of a measure lacking construct validity would be a test for assessing algebra skills using word problems that inadvertently assesses reading skills rather than factual knowledge of algebra. Construct validity also refers to the extent that the construct exists as theorized and can be quantified by the instrument. In psychiatry, this is especially difficult since there are no “gold standard” laboratory (e.g., chemical, anatomical, physiological) tests, and the criteria if not the existence of many diagnoses are disputed. To establish the validity for any diagnosis, certain requirements have been proposed, and include an adequate clinical description of the disorder that distinguishes it from other similar disorders and the ability to correlate the diagnosis to external criteria such as laboratory tests, familial transmission patterns, and consistent outcomes, including response to treatment.
Because there are no “gold standard” diagnostic tests in psychiatry, efforts to validate diagnoses have focused around such efforts as increasing the reliability of diagnostic instruments—by defining explicit and observationally based diagnostic criteria (DSM-III and subsequent versions), or employing structured interviews, such as the Diagnostic Interview Schedule (DIS)––and conducting genetic and outcome studies for diagnostic categories. The selection of a “gold standard” criterion instrument in psychiatry, however, remains problematic.
If we assume that a reliable criterion instrument that pro-vides valid results exists, the assessment of a new measurement instrument would involve comparing the results of the new instrument to those of the criterion instrument. The criterion instrument’s results are considered “true,” and a judgment of the validity of the new instrument’s results are based on how well they match the criterion instrument’s (Table 61-2).