61: Psychiatric Epidemiology

Published on 24/05/2015 by admin

Filed under Psychiatry

Last modified 22/04/2025

Print this page

This article have been viewed 3467 times

CHAPTER 61 Psychiatric Epidemiology

Albert Yeung, MD, ScD, Raymond W. Kam, MD, MPH

KEY POINTS

• Epidemiology is the study of the distribution and determinants of disease frequency in humans to inform the natural history, service needs, and etiology of illness.

• The frequency of disease can be expressed in different concepts, including cumulative incidence, incidence density, point prevalence, lifetime prevalence, and so on.

• Epidemiological studies frequently rely on assessment instruments to evaluate psychiatric disorders. It is important to first establish the reliability (or consistency) and validity (or truthfulness) of these assessment instruments.

• Based on the recent National Comorbidity Survey (NCS) in the United States, the most common psychiatric disorders were major depression and alcohol dependence, followed by social and simple phobias. Approximately, one in four respondents met criteria for a substance use disorder, one in four for an anxiety disorder, and one in five for an affective disorder in their lifetime.

• Epidemiological studies in the United States and in European countries showed that in general, individuals with a psychiatric disorder under-utilize mental health services. Among those who sought treatment, there was significant delay in seeking help.

OVERVIEW

Epidemiology is the study of the distribution and determinants of disease frequency in man. Epidemiological studies typically examine large groups of individuals, and by providing data on the distribution and frequency of diseases, they help describe the natural history of illness, assess service needs in the community or in special institutions, and shed light on the etiology of illness.

Epidemiology is based on two fundamental assumptions: first, that human disease does not occur at random, and second, that human disease has causal and preventive factors that can be identified through systematic investigation of different populations in different places or at different times. By measuring disease frequency, and by examining who gets a disease within a population, as well as where and when the disease occurs, it is possible to formulate hypotheses concerning possible causal and preventive factors.¹

EPIDEMIOLOGICAL MEASURES OF DISEASE FREQUENCY

The frequency of disease or some other outcome within a population group is described using different concepts: the rate at which new cases are observed, or the proportion of a given population that exhibits the outcome of interest.

Incidence refers to the number of new events that develop in a population over a specified period of time (t₀ to t₁). If this incidence rate is described as the number of events (outcomes) in proportion to the population at risk for the event, it is called the cumulative incidence (CI), and is calculated by the following equation:

The denominator equals the total number of persons at risk for the event at the start of the time period (t₀) without adjustment for any subsequent reduction in the cohort size for any reason, for example, loss to follow-up, death, or reclassification to “case” status. Therefore, CI is best used to describe stable populations where there is little reduction in cohort size during the time period of interest. An example would be a study of the incidence of major depressive disorder (MDD) in a residential program. If, at the beginning of the study, 8 of the 100 residents have MDD, and of the 92 remaining patients, 8 develop MDD over the next 12 months, the CI for MDD would be (8/92 × 100) = 8.7% for this period (i.e., 1 year). Note that the denominator does not include those in the population with the condition at t₀, since they are not at risk for newly experiencing the outcome.

When patients are followed for varying lengths of time (e.g., due to loss to follow-up, death, or reclassification to “case” status) and the denominator value representing the population at risk changes significantly, incidence density provides a more precise measure of the rate at which new events occur. Incidence density (ID) is defined as the number of events occurring per unit population per unit time:

The denominator is the population that is actively at risk for the event, and is adjusted as people no longer belong in that pool. In a study of psychosis, for instance, if a person develops hallucinations and delusions, he or she becomes “a case” and no longer contributes to the denominator. Similarly, a person lost to follow-up would also contribute to the denominator only so long as he or she is being tracked by the study. To illustrate, suppose in a 100-person study of human immunodeficiency virus (HIV) infection, 6 people are lost to follow-up at the end of 6 months, and 4 develop HIV at the end of the third month, the person-years of observation would be calculated as follows: (90 × 1 year) + (6 × 0.5 year) + (4 × 0.25 year) = 94 person-years, and incidence density = (4 cases)/(94 person-years) = 4.26 cases/100 person-years of observation.

Prevalence is the proportion of individuals who have a particular disease or outcome at a point or period in time. In most psychiatric studies, “prevalence” refers to the proportion of the population that has the outcome at a particular point in time, and is called the point prevalence:

In stable populations, prevalence (P) can be related to incidence density (ID) by the equation P = ID × D, where D is the average duration of the disease before termination (by death or remission, for example). At times, the numerator is expanded to include the number of all cases, existing and new, in a specified time period; this is known as a period prevalence. When the period of interest is a lifetime, it is a type of period prevalence called lifetime prevalence, which is the proportion of people who have ever had the specified disease or attribute in their lifetime.

Lifetime prevalence is often used to convey the overall risk for someone who develops an illness, particularly psychiatric ones that have episodic courses, or require a certain duration of symptoms to qualify for a diagnosis (e.g., depression, anxiety, or posttraumatic stress disorder). In practice, however, an accurate lifetime prevalence rate is difficult to determine since it often relies on subject recall and on sampling populations of different ages (not necessarily at the end of their respective “lifetimes”). It is also an overall rate that does not account for changes in incidence rates over time, nor for possible differences in mortality rates in those with or without the condition.

CRITERIA FOR ASSESSMENT INSTRUMENTS

There are a number of concepts that are helpful in the evaluation of assessment instruments. These involve the consistency of the results that the instrument provides, and its fidelity to the concept being measured.

Reliability is the degree to which an assessment instrument produces consistent or reproducible results when used by different examiners at different times. Lack of reliability may be the result of divergence between observers, imprecision in the measurement tool, or instability in the attribute being measured. Interrater reliability (Table 61-1) is the extent to which different examiners obtain equivalent results in the same subject when using the same instrument; test-retest reliability is the extent to which the same instrument obtains equivalent results in the same subject on different occasions.

Table 61-1 Interrater Reliability

Reliability is not sufficient for a measurement instrument—it could, for example, consistently and reliably give results that are neither meaningful nor accurate. However, it is a necessary attribute, since inconsistency would impair the accuracy of any tool. The demonstration of the reliability of an assessment tool is thus required before its use in epidemiological studies. The use of explicit diagnostic criteria, trained examiners to interpret data uniformly, and a structured assessment that obtains the same types of information from all subjects can enhance the reliability of assessment instruments.

There are several commonly used measures to indicate the degree of consistency between sets of data, which in psychiatry is often used to quantify the degree of agreement between raters. The kappa statistic (κ) is used for categorical or binary data, and the intraclass correlation coefficient (ICC, usually represented as r) for continuous data. Both measures have the same range of values (−1 to +1), from perfect negative correlation (−1), to no correlation (0), to perfect positive correlation (+1). For acceptable reliability, the kappa statistic value of 0.7 or greater is generally required; for the ICC, a value of 0.8 or greater is generally required.

Calculation of the kappa statistic (κ) requires only arithmetic computation, and accounts for the degree of consistency between raters with an adjustment for the probability of agreement due to chance. When the frequency of the disorder is very low, however, the kappa statistic will be low despite having a high degree of consistency between raters; it is not appropriate for the measurement of reliability of infrequent disorders.

where P_o is the observed agreement and P_c is an agreement due to chance. P_o = (a + d)/n and P_c = [(a + c)(a + b) + (b + d)(c + d)]/n ². Calculation of the ICC is more involved and is beyond the scope of this text.

Validity is a term that expresses the degree to which a measurement instrument actually measures what it purports to measure. When translating a theoretical concept into an operational instrument that purports to assess or measure it, several aspects of validity need to be accounted for.

For any abstract concept, there are an infinite number of criteria that one might use to assess it. For example, if one wants to develop a questionnaire to diagnose bipolar disorder, one should ask about mood, thought process, and energy level, but probably not whether the subject owns a bicycle. Content validity is the extent to which the instrument adequately incorporates the domain of items that would accurately measure the concept of interest.

Criterion validity is the extent to which the measurement can predict or agree with constructs external to the construct being measured. There are two types of criterion validity generally distinguished, predictive validity and concurrent validity. Predictive validity is the extent to which the instrument’s measurements can predict an external criterion. For instance, if we devise an instrument to measure math ability, we might postulate that math ability should be correlated to better grades in college math courses. A high correlation between the measure’s assessment of math ability and college math course grades would indicate that the instrument can correctly predict as it theoretically should, and has predictive validity. Concurrent validity refers to the extent to which the measurement correlates to another criterion at the same point in time. For example, if we devise a measure relying on visual inspection of a wound to determine infection, we can correlate it to a bacteriological examination of a specimen taken at the same time. A high correlation would indicate concurrent validity, and suggest that our new measure gives valid results for determining infection.

Construct validity refers to the extent to which the measure assesses the underlying theoretical construct that it intends to measure. This concept is the most complex, and both content and criterion validity point to it. An example of a measure lacking construct validity would be a test for assessing algebra skills using word problems that inadvertently assesses reading skills rather than factual knowledge of algebra. Construct validity also refers to the extent that the construct exists as theorized and can be quantified by the instrument. In psychiatry, this is especially difficult since there are no “gold standard” laboratory (e.g., chemical, anatomical, physiological) tests, and the criteria if not the existence of many diagnoses are disputed. To establish the validity for any diagnosis, certain requirements have been proposed, and include an adequate clinical description of the disorder that distinguishes it from other similar disorders and the ability to correlate the diagnosis to external criteria such as laboratory tests, familial transmission patterns, and consistent outcomes, including response to treatment.

Because there are no “gold standard” diagnostic tests in psychiatry, efforts to validate diagnoses have focused around such efforts as increasing the reliability of diagnostic instruments—by defining explicit and observationally based diagnostic criteria (DSM-III and subsequent versions), or employing structured interviews, such as the Diagnostic Interview Schedule (DIS)––and conducting genetic and outcome studies for diagnostic categories. The selection of a “gold standard” criterion instrument in psychiatry, however, remains problematic.

Assessment of New Instruments

If we assume that a reliable criterion instrument that pro-vides valid results exists, the assessment of a new measurement instrument would involve comparing the results of the new instrument to those of the criterion instrument. The criterion instrument’s results are considered “true,” and a judgment of the validity of the new instrument’s results are based on how well they match the criterion instrument’s (Table 61-2).

Table 61-2 Validity of a New Instrument

Sensitivity is the proportion of true cases, as identified by the criterion instrument, who are identified as cases by the new instrument (also known as the true positive rate).

Specificity is the proportion of non-cases, as identified by the criterion instrument, who are identified as non-cases by the new instrument (also known as the true negative rate).

For any given instrument, there are tradeoffs between sensitivity and specificity, depending on where the threshold limits are set to distinguish “case” from “non-case.” For example, in the Hamilton-Depression Scale (HAM-D) instrument, the cutoff value for the diagnosis of MDD (often set at 15) would determine whether an individual would be identified as “case” or “non-case.” If the value were instead set at 5, which most clinicians would consider “normal” or not depressed, the HAM-D would be an unusually sensitive instrument (e.g., using a structured clinical interview as the criterion instrument) since most anyone evaluated with even a modicum of depressive thinking would be considered a “case” as would anybody typically considered to have major depression. However, the test would not be especially specific, since it would be poor at identifying those without depression. Conversely, if the cutoff value were set at 25, sensitivity would be low but the specificity high.

In practice, the threshold values in any given evaluation instrument, whether creatine kinase (CK) levels for determining myocardial infarction, the number of colonies on a petri dish to determine infection, or criteria to determine attention-deficit/hyperactivity disorder (ADHD) (e.g., 6 of 9 from group one, 6 of 9 from group two), are chosen to balance the need for both sensitivity and specificity. To improve both these measures without a tradeoff, either the instrument itself or its administration must be improved, or efforts made to ensure maximum stability of the attribute being measured (e.g., administering them concurrently, or in similar circumstances, such as at the same time of day, or a similar clinical setting).

Two other useful measures are the positive predictive value (PPV), the proportion of those with a positive test that are true cases as determined by the criterion instrument. Negative predictive value (NPV) is the proportion of those with a negative test that are true non-cases as determined by the criterion instrument.

Study Designs

There are six basic study types, presented here in the order of their respective ability to infer causality.

DEVELOPMENT OF ASSESSMENT TOOLS

In 1972, Cooper and colleagues ⁴ published a United States/United Kingdom study that showed high variability in the diagnosis of psychotic disorders. It highlighted the need for having explicit operational criteria for case identification. The development of such diagnostic criteria with the publishing of the Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III) in 1980 represented a notable step toward increasing the reliability and validity of psychiatric diagnoses.

Standardized Instruments for Case Assessment

The clinical interview is generally used to diagnose psychiatric illness. However, differences in personal styles and theoretical frameworks, among other factors, can affect the process and conclusions of a psychiatric interview. To increase interrater reliability, a number of standardized interview instruments have been developed. The first was the Present State Examination (PSE), initially used in the International Pilot Study of Schizophrenia sponsored by the World Health Organization (WHO). The PSE was designed for use by psychiatrists or experienced clinicians, however, so its use in larger epidemiological studies was impractical. In 1978, epidemiologists at the National Institute of Mental Health (NIMH) began developing a comprehensive diagnostic instrument for large-scale, epidemiological studies that could be administered by either laypeople or clinicians. The result was the Diagnostic Interview Schedule (DIS), which used the then newly published DSM-III (1980), and elements of other research instruments, including the PSE, the Renald Diagnostic Interview (RDI), the St. Louis criteria, and the Schedule for Affective Disorders and Schizophrenia (SADS). The DIS has been used extensively in the United States and many other countries for surveys of psychiatric illness. Over time, the DIS has undergone revisions, first to incorporate DSM-III-R and then DSM-IV diagnoses. The WHO and the NIMH have also jointly developed the Composite International Diagnostic Interview (CIDI) that is structurally similar to the DIS and provides both ICD-10 and DSM-IV diagnoses.

CONTEMPORARY STUDIES IN PSYCHIATRIC EPIDEMIOLOGY

The Baseline National Comorbidity Survey (NCS)

The NCS, conducted between 1990 and 1992, was the first national survey of mental disorders in the United States. Face-to-face structured diagnostic interviews were administered by nonclinicians to a representative sample of all people living in households within the continental United States. The 8,098 NCS respondents were selected from over 1,000 neighborhoods in over 170 counties distributed over 34 states, and assessed with a modified CIDI.

The most important CIDI modifications involved the use of diagnostic stem questions, which were a small number of initial questions to assess core features of psychiatric disorders. Follow-up questions would only be asked when the subject responded positively. Another innovation of the NCS was the use of a two-phase clinical interview design for patients with evidence of schizophrenia or other nonaffective psychoses. Because prior studies had shown that these types of patients could not provide reliable self-reports, they were reinterviewed and diagnosed by experienced clinicians using a structured clinical interview.

In order to collect information on nonrespondents to the study, the NCS also systematically evaluated about one-third of nonrespondents using telephone interviews. Using the results of the nonresponse survey, the NCS study was able to adjust for the bias due to the lower rates of survey participation, especially among patients with anxiety disorders.

The NCS General Findings

DSM-III-R disorders were more prevalent than had been expected. About 48% of the sample reported at least one lifetime disorder, and 30% of respondents reported at least one disorder in the 12 months preceding the interview. The most common disorders were major depression and alcohol dependence, followed by social and simple phobias. As a group, substance use and anxiety disorders were more prevalent than affective disorders, with approximately one in four respondents meeting criteria for a substance use disorder in their lifetime, one in four for an anxiety disorder, and one in five respondents for an affective disorder (Table 61-4).

Table 61-4 Lifetime and 12-Month Prevalence Estimates for Psychiatric Disorders, NCS Results

	Lifetime Prevalence Estimate (%)	12-Month Prevalence Estimate (%)
Major depression	17.1	10.3
Mania	1.6	1.3
Dysthymia	6.4	2.5
Generalized anxiety disorder	5.1	3.1
Panic disorder	3.5	2.3
Social phobia	13.3	7.9
Simple phobia	11.3	8.8
Agoraphobia without panic	5.3	2.8
Alcohol abuse	9.4	2.5
Alcohol dependence	14.1	7.2
Drug abuse	4.4	0.8
Drug dependence	7.5	2.8
Antisocial personality disorder	2.8	—
Nonaffective psychosis^*	0.5	0.3

* Nonaffective psychosis: schizophrenia, schizophreniform disorder, schizoaffective disorder, delusional disorder, and atypical psychosis.

Source: Adapted from Tsuang and Tohen (2002).⁵

There were no differences by gender in the overall prevalence of psychiatric disorders. For individual disorders, men were more likely than women to have substance use disorders and antisocial personality disorder, whereas women were more likely than men to have anxiety and affective disorders (with the exception of mania).

NCS-Replication Survey (NCS-R)

The NCS-R was conducted between 2001 and 2003 and involved a new sample of 10,000 respondents in the same nationally representative sampling segments as the baseline NCS. Lifetime prevalence estimates of DSM-IV disorders per the NCS-R are as follows: anxiety disorders, 28.8%, mood disorders, 20.8%, impulse-control disorders, 24.8%, substance use disorders, 14.6%, and any disorder, 46.4%. First onset of mental illness was found most often in childhood to early adulthood, with one-half of all cases starting by age 14 years and three-fourths by age 24 years.

Mental Health Services Utilization

From the NCS to the NCS-R, there was no change in the overall prevalence of mental disorders, but the rate of treatment increased in the past decade. Among patients with a psychiatric disorder, 20.3% received treatment between 1990 and 1992 compared to 32.9% between 2001 and 2003 (p < 0.001). Nevertheless, most patients with mental disorders in the NCS-R study still did not receive treatment. For those who did, there was significant delay, ranging from 6 to 8 years for mood disorders and 9 to 23 years for anxiety disorders. The unmet need of mental health services has been greatest in traditionally underserved groups (including elderly persons, racial-ethnic minorities, residents of rural areas, and those with low incomes or without insurance).

The European Study of the Epidemiology of Mental Disorders (ESEMeD) Project

The ESEMeD is a cross-sectional epidemiological study, conducted between January 2001 and August 2003, that assessed the psychiatric epidemiology of 212 million noninstitutionalized adults from Belgium, France, Germany, Italy, the Netherlands, and Spain.⁶ Individuals were assessed in-person at their homes using computer-assisted psychiatric interview (CAPI) instruments, and data from 21,425 respondents were collected. A stratified, multistage, clustered area, probability sample design was used to analyze the data.

Epidemiology

Using DSM-IV criteria, about one in four respondents reported a lifetime history of any mental disorder. MDD, specific phobia, alcohol abuse, and dysthymia were the most common mental disorders, with estimated lifetime prevalence rates of 12.8%, 7.7%, 4.1%, and 4.1%, respectively. The lifetime prevalence of other mental disorders was low (less than 3%). About 10% of respondents were diagnosed, with any mental disorder in the 12 months preceding the diagnostic interview, MDD, and specific phobia being the most common with prevalence estimates (3.9% and 3.5%, respectively). Only 0.7% of respondents reported a history of an alcohol abuse disorder in the preceding 12 months.

Under-utilization of Health Services

The ESEMeD results suggest that the use of health services is limited among individuals with mental disorders in the European countries studied. Of the participants with a mental disorder in the preceding 12 months, only 25.7% consulted a formal health service during that period. The rates were higher for individuals with a mood disorder (36.5%) or anxiety disorder (26.1%). Among those who had used formal health services in the previous 12 months, approximately two-thirds had contacted a mental health professional. Psychotropic drug utilization was generally low (32.6%) by United States standards, with only 21.2% of those diagnosed with MDD in the preceding 12 months having received antidepressants during that period.

EPIDEMIOLOGY OF MAJOR PSYCHIATRIC DISORDERS

Schizophrenia

Epidemiology

Based on the NCS, the lifetime prevalence rate for schizophrenia is 1.4 per thousand, and 6.9 per thousand for the five nonaffective psychoses (schizophrenia, schizophreniform disorder, schizoaffective disorder, delusional disorder, and atypical psychoses).

Risk Factors

Genetic loading is a robust risk factor for schizophrenia (Table 61-5). The prevalence of schizophrenia in a monozygotic twin of a schizophrenia patient is 50%, and 15% in a dizygotic twin. The prevalence for a child with two schizophrenic parents is 46.3%, and 12.8% for a child with one schizophrenic parent.

Table 61-5 Prevalence of Schizophrenia in Specific Populations

Population	Prevalence (%)
General population	0.3
First-degree relatives of parents of schizophrenic patients	5.6
Children with one schizophrenic parent	12.8
Dizygotic twins of a schizophrenic patient	15.0
Children of two schizophrenic parents	46.3
Monozygotic twins of a schizophrenic patient	50.0