CHAPTER 7 Understanding and Applying Psychological Assessment
OVERVIEW
Psychological assessment is a consultation procedure that can greatly enhance clinicians’ understanding of their patients and facilitate the treatment process. To a considerable extent, psychological assessment consultations are underutilized in the current mental heath care environment. This is unfortunate given the evidence that psychological tests generally produce reliability and validity coefficients similar to many routine diagnostic medical tests.1 This chapter will provide a detailed review of what a psychological assessment comprises and discuss the potential benefits of an assessment consultation. This will be accomplished by reviewing the methods used to construct valid psychological instruments, the major categories of psychological tests (including detailed examples of each category), and the application of these instruments in clinical assessment. Issues relating to the ordering of psychological testing and the integration of information from an assessment report into the treatment process will also be presented.
PSYCHOMETRICS: THE SCIENCE OF TEST DEVELOPMENT
Reliability and Validity
To be meaningfully employed for either research or clinical activities, psychological tests must possess adequate reliability and validity. Reliability represents the repeatability, stability, or consistency of a subject’s test score. Reliability is usually represented as some form of a correlation coefficient ranging from 0 to 1.0. Research instruments can have reliability scores as low as .70, whereas clinical instruments should have reliability scores in the high .80s to low .90s. This is because research instruments are interpreted aggregately as group measures, whereas clinical instruments are interpreted for a single individual. A number of reliability statistics are available for evaluating a test: internal consistency (the degree to which the items in a test perform in the same manner), test-retest reliability (the consistency of a test score over time, which typically ranges from a few days to a year), and interrater reliability (observer-judged rating scales). The kappa statistic is considered the best estimate of interrater reliability, because it reflects the degree of agreement between raters after accounting for chance scoring. Factors that affect reliability (the amount of error present in a test score) can be introduced by variability in the subject (subject changes over time), in the examiner (rater error, rater bias), or in the test itself (given under different instruction).
MAJOR CATEGORIES OF PSYCHOLOGICAL TESTS
Intelligence Tests
Alfred Binet (1857-1911) is credited with developing the first true measure of intelligence. Binet and Theodore Simon were commissioned by the French School Board to develop a test to identify students who might benefit from special education programs. Binet’s 1905 and 1908 scales form the basis of our current intelligence tests. In fact, it was the development of Binet’s 1905 test that marked the beginning of modern psychological testing. His approach was practical and effective as he developed a group of tests with sufficient breadth and depth to separate underachieving children with normal intellectual ability from those who were underachieving because of lower intellectual ability. In addition to mathematic and reading tasks, Binet also tapped into other areas (such as object identification, judgment, and social knowledge). About a decade later at Stanford University, Lewis Terman translated Binet’s test into English, added additional items, and made some scoring revisions. Terman’s test is still in use today and is called the Stanford-Binet Intelligence Scales.2
Intelligence is a hard construct to define. Wechsler wrote that “intelligence, as a hypothetical construct, is the aggregate or global capacity of the individual to act purposefully, to think rationally, and to deal effectively with the environment.”3 This definition helps clarify what the modern IQ tests try to measure (i.e., adaptive functioning) and why intelligence or IQ tests can be important aids in clinical assessment and treatment planning. If an IQ score reflects aspects of effective functioning, then IQ tests measure aspects of adaptive capacity. The Wechsler series of IQ tests cover the majority of age ranges. The series starts with the Wechsler Preschool and Primary Scales of Intelligence (ages 4 to 6 years), progressing to the Wechsler Intelligence Scale for Children–III (ages 5 to 16 years), and ending with the Wechsler Adult Intelligence Scale–III (ages 16 to 89 years).4 Although the current discussion will focus on the measurement of adult intelligence, all of the Wechsler scales provide three major IQ test scores: the Full Scale IQ (FSIQ), Verbal IQ (VIQ), and Performance IQ (PIQ). All three IQ scores have a mean of 100 and a standard deviation (SD) of 15. This statistical feature means that a 15-point difference between a subject’s VIQ and PIQ can be considered both statistically significant and clinically meaningful. Table 7-1 presents an overview of the IQ categories.
Full-Scale IQ Score | Intelligence (IQ) Categories | Percentile in Normal Distribution |
---|---|---|
≥130 | Very superior | 2.2 |
120-129 | Superior | 6.7 |
110-119 | High average | 16.1 |
90-109 | Average | 50.0 |
80-89 | Low average | 16.1 |
70-79 | Borderline | 6.7 |
≤69 | Mentally retarded | 2.2 |
The Wechsler IQ tests are composed of 10 or 11 subtests, which were developed to tap two primarily intellectual domains, verbal intelligence (VIQ; Vocabulary, Similarities, Arithmetic, Digit Span, Information, and Comprehension) and nonverbal, or performance, intelligence (PIQ; Picture Completion, Digit Symbol, Block Design, Matrix Reasoning, and Picture Arrangement). All the Wechsler subtests are constructed to have a mean score of 10 and an SD of 3. Given this statistical feature we know that if two subtests differ by 3 or more scaled score points, the difference is clinically meaningful. All IQ scores and subtest-scaled scores are also adjusted for age.4
The subscales included in the VIQ and PIQ can be deconstructed even further to provide specific information about verbal and nonverbal abilities. Table 7-2 provides an outline of the WAIS-III subtests and their location on the verbal and nonverbal indexes. The Verbal Comprehension Index (VCI) and the Working Memory Index (WMI) are the two indexes subsumed under the VIQ. Because the subtests involved in the VCI are not timed, they are considered to be a more pure measure of verbal ability (i.e., word knowledge, verbal abstract reasoning, and general information). As the name implies, the WMI is a measure of how quickly individuals can manipulate, process, and respond to verbal information. For example, the Arithmetic subtest that is part of the WMI is administrated orally and the individual is not permitted to use paper and pencil to solve the word problems. The PIQ is composed of the Perceptual Organizational Index (POI) and the Processing Speed Index (PSI). Subtests in the POI measure nonverbal reasoning, attention to detail, and the integration of visual and motor systems. Although two out of the three subtests that make up the POI are timed, fast responding is not the focus of this index. The PSI is a measure of how quickly individuals can process and respond to visually presented information.4
Objective (Self-Report) Tests of Personality and Psychopathology
Hathaway and McKinley (1943) published the original version of the MMPI at the University of Minnesota.5 (Although the original version of the MMPI was produced in 1943, the official MMPI manual was not published until 1967.5) The purpose of the test was to be able to differentiate psychiatric patients from normal individuals, as well as to accurately place patients in the proper diagnostic group. A large item pool was generated, and hundreds of psychiatric patients were interviewed and asked to give their endorsement on each of the items. The same was done with a large sample of people who were not receiving psychiatric treatment. The results of this project showed that while the item pool did exceptionally well in differentiating the normal from clinical groups, differentiating one psychiatric group from another was more difficult. A major confounding factor was that patients with different conditions tended to endorse the same items; this led to scales with a high degree of item overlap (i.e., items appeared on more than one scale). This method of test development, known as empirical keying (described earlier), was innovative for its time because most personality tests preceding it were based solely on items that test developers theorized would measure the construct in question (rational test development). The second innovation introduced with the MMPI was the development of validity scales that were intended to identify the response style of test takers. In response to criticisms that some items contained outdated language and that the original normative group was considered a “sample of convenience,” the MMPI was revised in 1989. The MMPI-2 is the result of this revision process, and it is the version of the test currently used today.6
The Minnesota Multiphasic Personality Inventory–2
The Minnesota Multiphasic Personality Inventory–2 (MMPI-2) is a 567-item true/false, self-report test of psychological function.6 As mentioned earlier, the MMPI was designed to both separate subjects into “normals” and “abnormals,” and to subcategorize the abnormal group into specific classes.7 The MMPI-2 contains 10 Clinical Scales that assess major categories of psychopathology and six Validity Scales designed to assess test-taking attitudes. MMPI raw scores are transformed into standardized T-scores where the mean is 50 and the SD is 10. A T-score of 65 or greater indicates clinically significant psychopathology on the MMPI-2. An interesting feature of the MMPI-2 is that over 300 “new” or experiential scales have been developed for the test over the years. This is made possible by the empirical keying method described earlier. Groups of items that have been shown to reliably differentiate two or more samples or populations can be added to the MMPI-2 as a clinical or supplemental scale. The addition of these scales helps sharpen and individualize the clinical interpretation of the MMPI-2 results.
The MMPI-2 validity scales are the Lie (L), Infrequency (F), correction (K), Variable Response Inventory (VRIN), True Response Inventory (TRIN), and F back (FB) scales. The L scale was designed to identify respondents who attempt to minimize pathology to the extent that they deny even minor faults to which most individuals will admit. It is commonly thought of as an unsophisticated attempt to appear healthier than one might actually be (i.e., faking good). The F scale contains items of unusual or severe pathology that are infrequently endorsed by most people. Therefore, elevation of the F scale is thought of as either a “cry for help” or a more intentional attempt to appear worse off psychologically (i.e., faking bad). Like the L scale, the K scale is purported to measure defensiveness, but data have suggested that persons with a higher level of education tend to score higher on the K scale items than the L scale items.8