Screening and Assessment Tools

Published on 21/03/2015 by admin

Filed under Pediatrics

Last modified 22/04/2025

Print this page

This article have been viewed 3400 times

CHAPTER 7 Screening and Assessment Tools

7A. Measurement and Psychometric Considerations

GLEN P. AYLWARD, TERRY. STANCIN

In a general pediatric population, practitioners can expect 8% of their patients to experience significant developmental or behavioral problems between the ages of 24 and 72 months, this rate increasing to 12% to 25% during the first 18 years.^1,² Therefore, consideration and interpretation of tests and rating scales are part of the clinician’s day-to-day experience, regardless of whether the choice is made to administer evaluations or review test or rating scale data obtained by other professionals.

This chapter is an introduction to the section on assessment and tools. It contains topics such as: discussion of descriptive statistics (e.g., mean, median, mode), distributions of scores and standard deviations, transformation of scores (percentiles, z-scores, T-scores), psychometric concerns (sensitivity, specificity, positive and negative predictive values), test characteristics (reliability, validity), and age and grade equivalents. Many of these topics are also elaborated in greater detail in subsequent chapters of this text. A more thorough discussion of psychological assessment methods can be found in Sattler’s text.³

Developmental and psychological evaluations usually include measurement of a child’s development, behavior, cognitive abilities, or levels of achievement. Comprehensive child assessments involve a multistage process that incorporates planning, collecting data, evaluating results, formulating hypotheses, developing recommendations, and conducting follow-up evaluations.³ Test data provide samples of behavior, with scores representing measurements of inferred attributes or skills. These scores are relative and not absolute measures, and rating scales and test instruments are typically used to compare a child to a standardized, reference group of other children. Approximately 5% of the general population obtains scores that fall outside the range of “normal.” However, the range of normal is descriptive, not diagnostic: it describes problem-free individuals, but does not provide a diagnosis for them.³ No test is without error, and scores may fall outside the range of normal simply as a result of chance variation or issues such as refusal to take a test. Three major sources of variation that may affect test data include characteristics of a given test, the range of variation among normal children, and the range of variation among children who have compromised functioning.

Selection of which test to use depends on the referral questions posed, as well as time and cost constraints. Testing results vary in terms of levels of detail, complexity, and definitiveness of findings. The first level of testing is screening, the results of which are suggestive. The second level is administration of more formal tests designed to assess development, cognition, achievement, language, motor, adaptive, or similar functions, the results being indicative. The third tier involves administration of test batteries to assess various areas and abilities; these results are assumed to be definitive. This third tier typically includes a combination of formal tests or test batteries, history, interview, rating scales, and observations. The primary goal of more detailed testing is to delineate patterns of strengths and weaknesses so as to provide a diagnosis and guidance for intervention and placement purposes. Results gain meaning through comparison with norms. A caveat is that tests differ markedly in their degree of accuracy.

In general, regardless of whether a measurement tool is designed to be used as an assessment or a screening instrument, the normative sample on which the test is based is critical. Test norms that are to be applied nationally should be representative of the general population. Demographics must proportionately reflect characteristics of the population as a whole, taking into account factors such as region (e.g., West, Midwest, South, Northeast), ethnicity, socioeconomic status, and urban/rural setting. If a test is developed with a nonrepresentative population, characteristics of that specific sample may bias norms and preclude appropriate application to other populations. Adequate numbers of children need to be included at each age across the age span evaluated by a given test so as to enhance stability of test scores. Equal numbers of boys and girls should be included. Clinical groups should also be included for comparison purposes. Convenience samples, or those obtained from one geographic location are not appropriate for development of test norms.

Tests generally need to be reduced and refined by eliminating psychometrically poor items during the development phase. Conventional item analysis is one such approach and involves evaluation of an item difficulty statistic (percentage of correct responses) and patterns of responses. The use of item discrimination indexes (item-total correlations) and item validity (discrimination between normative and special groups, by T-tests or chi square analyses) is routine. More recent tests such as the Bayley Scales of Infant and Toddler Development—Third Edition (BSID-III)⁴ or the Stanford-Binet V⁵ employ inferential norming⁶ or item response theory.⁷ Item response theory analyses involve difficulty calibrations for dichotomous items and step differences for polychotomous items, the goal being a smooth progression of difficulty across each subtest (e.g., as in the Rasch probabilistic model⁸). Item bias and fairness analysis are also components; this procedure is called differential item functioning.⁹ See Roid⁵ or Bayley⁴ for a more detailed description of these procedures.

STANDARDIZED ASSESSMENTS

Standardized normreferenced assessments (SNRAs) are the tests most typically administered to infants, children, and adolescents. The most parsimonious definition of SNRAs is that they compare an individual child’s performance on a set of tasks presented in a specific manner with the performance of children in a reference group. This comparison is typically made on some standard metric or scale (e.g., scaled score).¹⁰ Although there may be some allowance for flexibility in rate and order of administration procedures (particularly in the case of infants), administration rules are precisely defined. The basis for comparison of scores is that tasks are presented in the same manner across testings, and there are existing data that represent how similar children have performed on these tasks. However, if this format is modified, additional variability is added, precluding accurate comparison of the child’s data and those of the normative group.

A major issue facing users of SNRAs is identification of the question to be answered from the results of testing. One of two contrasting questions is probably the reason for testing: (1) How does this child compare with his or her referent group? or (2) What are the limits of the child’s abilities, regardless of comparison to a referent group? SNRAs are suited to answer the first question. Examiners can subsequently test limits or alter procedures to clarify clinical issues such as strengths and weaknesses after the standard administration is completed. However, these data, although clinically useful, should not be incorporated into the determination of the test score because of the reasons cited previously. Also, no single SNRA in isolation can provide all the answers regarding a child’s development or cognitive status; rather, it is a component of the overall evaluation.

Use of SNRAs is not universally endorsed, particularly with regard to infant assessment, because of concerns regarding one-time testing in an unfamiliar environment, different objectives for testing, and linkage to intervention, instead of diagnosis. Therefore, emphasis is placed on alternative assessments that rely on criterion-referenced and curriculum-based approaches. In actuality, curriculum-based assessment is a type of a criterion-referenced tool. These assessments can help to answer the second question posed previously and could also better delineate the child’s strengths. Both provide an absolute criterion against which a child’s performance can be evaluated. In criterion-referenced tests, the score a child obtains on a measurement of a specific area of development reflects the proportion of skills the child has mastered in that particular area (e.g., colors, numbers, letters, shapes). For example, in the Bracken Basic Concepts Scale—Revised,¹¹ in addition to normreferenced scores, examiners can also determine the percentage of mastery of skills in the six areas included in the School Readiness Composite. More specifically, in the colors subtest, the child is asked to point to colors named by the examiner. This raw score can be converted to a percentage of mastery, which is computed regardless of age. Similarly, other skills such as knowledge of numbers and counting or letters can be gauged. In curriculum-based evaluations, the emphasis is on specific objectives that are to be achieved, the potential goal being intervention planning.^12,¹³ The Assessment, Evaluation, and Programming System for Infants and Children¹⁴ and the Carolina Curricula for Infants and Toddlers with Special Needs¹⁵ are examples of curriculum-based assessments. Therefore, SNRAs, criterion-referenced tests, and curriculum-based tests each have a role, depending on the intended purpose of the evaluation.

PRIMER OF TERMINOLOGY USED TO DETECT DYSFUNCTION

The normal range is a statistically defined range of developmental characteristics or test scores measured by a specific method. Figure 7A-1 depicts a normal distribution or bell-shaped curve. This concept is critical in the development of test norms and provides a basis for the following discussion.

FIGURE 7A-1 The normal distribution.

Descriptive Statistics

The mean (M) is a measure of central tendency and is the average score in a distribution. Because it can be affected by variations caused by extreme scores, the mean can be misleading in scores obtained from a highly variable sample. In Figure 7A-1, the mean score is 100.

The mode, also a measure of central tendency, is the most frequent or common score in a distribution.

The median is defined as the middle score that divides a distribution in half when all the scores have been arranged in order of increasing magnitude. It is the point above and below which 50% of the scores fall. This measure is not affected by extreme scores and therefore is useful in a highly variable sample. In the case of an even number of data points in a distribution, the median is considered to be halfway between two middle scores. Noteworthy is the fact that in the normal distribution depicted in Figure 7A-1, the mean, mode, and median are equal (all scores = 100), and the distribution is unimodal.

The range is a measure of dispersion that reflects the difference between the lowest and highest scores in a distribution (highest score − the lowest score +1). However, the range does not provide information about data found between two extreme values in the test distribution, and it can be misleading when the clinician is dealing with skewed data. In this situation, the interquartile range may be more useful: The distribution of scores is divided into four equal parts, and the difference between the score that marks the 75th percentile (third quartile) and the score that marks the 25th percentile (first quartile) is the interquartile range.¹⁶

The standard deviation (SD) is a measure of variability that indicates the extent to which scores deviate from the mean. The standard deviation is the average of individual deviations from the mean in a specified distribution of test scores. The greater the standard deviation, the more variability is found in test scores. In Figure 7A-1, SD = 15 (the typical standard deviation in normreferenced tests). In a normal distribution, the scores of 68% of the children taking a test will fall between +1 and −1 standard deviation (square root of the variance). In general, most intelligence and developmental tests that employ deviation quotients have a mean of 100 and a standard deviation of 15. Scaled scores, such as those found in the Wechsler tests, have a mean of 10 and a standard deviation of 3 (7 to 13 being the average range). If a child’s score falls less than 2 standard deviations below average on an intelligence test (i.e., IQ < 70), he or she may be considered to have a cognitive-adaptive disability (if adaptive behaviors are also impaired).

Skewness refers to test scores that are not normally distributed. If, for example, an IQ test is administered to an indigent population, the likelihood that more children will score below average is increased. This is a positively skewed distribution (the tail of the distribution approaches high or positive scores, i.e. the right portion of the x-axis). Here, the mode is a lower score than the median, which, in turn is lower than the mean. Probabilities based on a normal distribution will yield an underestimate of the scores at the lower end and an overestimate of the scores at the higher end. Conversely, if the test is administered to children of high socioeconomic status, the distribution might be negatively skewed, which means that most children will do well (the tail of the distribution trails toward lower scores or the left portion of the x-axis). In negatively skewed distributions, the value of the median < mean < mode scores at the lower end will be overestimated, and those at the upper end will be underestimated. Skewness has significant ramifications in interpretation of test scores. In fact, the meaning of a score in a distribution depends on the mean, standard deviation, and the shape of the distribution.

Kurtosis reflects the shape of the distribution in terms of height or flatness. A flat distribution, in which more scores are found at the ends of the distribution and fewer in the middle, is platykurtic, in comparison with the normal distribution. Conversely, if the peak is higher than the normal distribution, scores do not spread out and instead are compressed and cluster around the mean. This is called a leptokurtic distribution.

Transformations of Raw Scores

LINEAR TRANSFORMATIONS

Linear transformations provide information regarding a child’s standing in comparison to group means. The z-score is a standard score (standardization being the process of converting each raw score in a distribution into a z-score: raw score − the mean of the distribution, divided by the standard deviation of the distribution) that corresponds to a standard deviation; that is, a z-score of +1 is 1 standard deviation above average and a z-score of −1 is 1 standard deviation below average. The mean equals a z-score of 0; therefore scores between z-scores of −1 and +1 are in the average range. Stated differently, if a child receives a z-score of +1, he or she obtained a score higher than those of 84% of the population (see Fig. 7A-1).

The T-score is another linear transformation and can be considered a z-score × 10 + 50. The mean T-score is 50, and the standard deviation is 10. Therefore a z-score of 1 equals a T-score of 60. T-scores are often found in psychopathology-related test instruments such as the Minnesota Multiphasic Personality Inventory—A, the Conners rating scales, or the Child Behavior Checklist, on which T-scores of 70 or greater are considered to be clinically relevant (approximately the 98th percentile); these cutoffs are depicted in many scoring forms.

AREA TRANSFORMATIONS

A percentile (the technical slang is “centile”) tells the practitioner how an individual child’s performance compares to a specified norm group. If a percentile score is 50, half of the children tested will score above this, and half will score below. A score that is 1 standard deviation below average is at approximately the 16th percentile; a score 1 standard deviation above average is at the 84th percentile. Clinicians must be aware that small differences in scores in the center of the distribution produce substantial differences in percentile ranks, whereas greater raw score differences in outliers do not have as much of an effect on percentile scores. Oftentimes, the third percentile is considered to be a clinical cutoff (e.g., in the case of the infant born small for gestational age). Deciles are bands of percentiles that are 10 percentile ranks in width (each decile contains 10% of the normative group). Quartiles are percentile bands that are 25 percentile ranks in width; each quartile contains 25% of the normative group. Percentiles require the fewest assumptions for accurate interpretation and can be applied to virtually any shape of distribution. This metric is most readily understood by parents and professionals and is recommended as the preferred way to describe how a child’s score compares within a group of scores. For example, a Wechsler Intelligence Scale for Children—Fourth Edition (WISC-IV) Full Scale IQ score of 70 indicates that fewer than 3% of children of a similar age score lower on that measure of intelligence; conversely, more than 97% of children taking the test have a higher score.

The stanine is short for standard nine, and this metric divides a distribution into nine parts. The mean = 5, and the SD = 2, with the third to seventh stanine being considered the average range. Approximately 20% of children score in the fifth stanine, 17% each in the fourth and sixth stanines, and 12% each in the third and seventh stanines (78% in total). Stanines are frequently encountered with group administered tests such as the Iowa Tests of Basic Skills, the Metropolitan Achievement Tests, or the Stanford Achievement Tests. The interrelatedness of these scores is depicted in Figure 7A-1.

PSYCHOMETRIC CONCERNS

Appropriate interpretation of test data necessitates consideration of other important test characteristics. As mentioned previously, when a child’s normreferenced test results are interpreted, the extent to which the child’s characteristics are represented in the normative sample from which scores were derived is a critical concern. Moreover, caution is recommended when test results for children from cultural and ethnic minorities drive academic or clinical decisions, unless there is adequate representation of this diversity in standardization samples and validation studies.

Sensitivity and Specificity

Frequently, interpretation of test results must take into account how well the instrument performs with set cutoff scores. Sensitivity is a measure of the proportion of children with a specific problem who are positively identified by a test, with a specific cutoff score. Children who have a disorder but are not identified by the test are considered to have false-negative scores. In developmental/behavioral pediatrics, the “gold standard” (criterion used to determine the presence of a given problem) often is not definitive but rather is a reference standard. Comparison with an imperfect “gold standard” may lead to erroneous conclusions that a screening test is inaccurate. As a result, sensitivity may be better conceptualized as copositivity. Desired sensitivity rates are 70% to 80%, and sensitivity is the true positive rate of a test.

Specificity is a measure of the proportion of children who actually are normal and who also are correctly determined by a given test to not have a problem. Children who are normal but who are incorrectly determined by a test cutoff score to be delayed or learning disabled are considered to have false-positive scores. Specificity is the true negative rate of a test. Again, in cases such as developmental screening, the presence of a reference (and not “gold”) standard makes the term conegativity more appropriate. A specificity rate of 70% to 80% is desirable. However, in the case of screening, it is better to have a higher sensitivity rate, perhaps at the cost of lowered specificity, so as to enhance identification of infants and children who might be at risk.

Cutoff scores can be adjusted to enhance sensitivity. By making criteria more inclusive, fewer children with true abnormalities will be missed; however, a more restrictive cutoff will also increase the probability of false-positive findings (overidentifying “normal” children as being abnormal). Conversely, if the cutoff score is made more exclusive to enhance specificity, the number of normal children inaccurately identified as abnormal is decreased, but some of those who are truly abnormal will be erroneously called normal (false-negative findings). Sensitivity and specificity are described in Figure 7A-2.

FIGURE 7A-2 Example highlighting sensitivity and specificity.

Positive predictive value refers to the proportion of children with a positive test result who actually are delayed or learning disabled. This reflects the probability of having a problem when the test result is positive. The lower the prevalence of a disorder, the lower is the positive predictive value. Sensitivity may be a better measure in low-prevalence problems. In developmental screening, positive predictive values often are in the range of 30% to 50%.

Negative predictive value refers to the proportion of children with a negative test result who indeed do not have developmental delays or learning problems. It is the probability of not having the disorder when the test result is negative. This value is influenced by the frequency or prevalence of a problem; in low-prevalence problems, specificity may be a better measure.

Frequency of a Disorder/Problem

Prevalence rate refers to the number of children in the population with a disorder, in relation to the total number of children in the population, measured at a given time. The incidence rate indicates the risk of developing a disorder: namely, new cases of a problem that develop over a period of time. The relationship between incidence and prevalence can best be illustrated by the following: prevalence rate = the incidence rate × the duration of the disorder. In essence, the predictive value of screening takes into account sensitivity and specificity of the screening procedure and the prevalence of the disorder.

Base rate is the naturally occurring rate of a given disorder. For example, the base rate of learning disabilities would be much higher in children referred to a learning and attention disorders clinic than in the general population. If a screening instrument were used to detect learning disabilities for this group, sensitivity and specificity values would differ from those found in the general pediatric population. For example, in the follow-up of low-birth-weight infants, the base rate for major handicaps (moderate to severe mental retardation; cerebral palsy; epilepsy; deafness or blindness) is 15%; therefore, in 85% of this population, the findings would be true negative. Low base rates increase the possibility of false-positive results. High base rates do not leave much room for improvement in terms of locating true-positive scores and result in an increase in false-negative findings. Tests can be most helpful in decision making when the base rate is in the vicinity of 0.50. Therefore, particularly in the case of screening, the relatively low base rates of developmental problems in very young children may increase the probability of false positive findings. However, in such situations, this scenario is more desirable than the converse: false negative findings.

Relative risk provides an alternative strategy for evaluating test accuracy.^17,¹⁸ This approach involves use of the likelihood ratio, which indicates the increased probability that the child will display a developmental problem, if the results of an earlier screening test were abnormal or suspect. This approach recognizes that not all children at early risk will later manifest a developmental problem, but there is a greater likelihood that they will. If a problem or disorder is rare, relative risk and odds ratios are nearly equal.

Test Characteristics

RELIABILITY

Measurement is the ability to assign numbers to individuals in systematic ways as a means of inferring properties of these individuals. Reliability refers to consistency or accuracy in measurement. Reliability focuses on how much error is involved in measurement or how much an obtained score varies from the “true score.” An observed test score = true score + measurement error. Internal consistency is a measure of whether all components of a test evaluate a cohesive construct or set of constructs (e.g., verbal ability or visual-motor skills). Stated differently, high internal consistency means that all items are highly intercorrelated. This is measured with Cronbach’s alpha, split-half reliability, or the Kuder-Richardson reliability estimate. Cronbach’s alpha is used to evaluate how individual items relate to the test as a whole (intercorrelation among items); split-half reliability relates half of the test items to the remaining half, often by an odd-even split; and the Kuder-Richardson reliability estimate is used for dichotomous (i.e., “yes”/“no”) items. Test-retest reliability is particularly pertinent in developmental and psychological testing because it takes into account the “true score” and error, addressing whether the same score would be obtained if a specific test were readministered. The length of time between the two administrations of the test is critical in regard to this measurement; that is, the sooner the test is readministered, the greater the reliability estimate is. In general, test-retest correlations of 0.70 are considered moderate, 0.80 moderate to high, and 0.90, high (scores >0.85 are desirable, although explicit, evidence-based criteria have not been defined yet). Tests with more items tend to have higher reliability, because of the likelihood of a greater variance in scores. Interrater reliability refers to how well independent examiners agree on results of a test. Alternate forms involve use of parallel tests, so as to prevent carryover (score inflation) if the parallel test is administered soon after the first. For example, the Peabody Picture Vocabulary Test—III has two forms, as does the Wide Range Achievement Test—4.

Reliability is affected by test length (longer tests are more reliable), test-retest interval (longer interval lessens reliability), variability of scores (greater variance increases reliability estimate), guessing (increased guessing decreases reliability), variations in test situation, and practice effects.³

VALIDITY

Validity refers to whether a test measures what it is supposed to measure for a specific purpose. A test may be valid for some uses and not others. For example, the Peabody Picture Vocabulary Test—III may be a valid measure of receptive vocabulary, but it is not a valid measure of overall cognitive ability or even overall language ability. It is important to keep in mind that test validation is context specific. In order to determine whether an assessment method is “psychometrically sound” or “valid,” the clinician must consider how it is being used. For example, an intelligence test may be a valid method for determining a child’s cognitive abilities but may have limited validity for treatment design and planning (see previous discussion of SNRAs). Similarly, a test may have demonstrated evidence as a valid measure of severity of general anxiety but not of phobias; a certain behavior rating scale may be valid as a measure of current clinical symptoms but may not have validity for treatment planning or for predicting outcomes. Thus, the purpose of the assessment needs to be considered in order to properly evaluate the psychometric characteristics of an assessment method.

Content validity determines whether the items in the test are representative of the domain the test purports to measure: that is, whether the test does cover the material it is supposed to cover. Construct validity concerns whether the test measures a particular psychological construct or trait (e.g., intelligence). Criterion-related validity involves the current relationship between test scores and some criterion, such as results of another test. Criterion-related validity can be concurrent (convergent) or predictive. In both instances, the results of a test under consideration are compared to an established reference standard to determine whether findings are comparable. In concurrent validity, the two tests (e.g., a screen such as the Bayley Infant Neurodevelopmental Screener and a “reference standard” such as the BSID-II) are administered at the same time, and the results are correlated. With predictive validity, a screening test might be given at one time, followed by administration of the reference standard at a later date (e.g., the BSID-II is given to children aged 36 months, and the Wechsler Preschool and Primary Scales of Intelligence—III at age 4½ years). Discriminant validity shows how well a screening test detects a specific type of problem. For example, autism might be the condition of concern, and a screening test such as the Modified Checklist for Autism in Toddlers (M-CHAT) is used to distinguish children with this disorder from those with mental retardation without autism. Face validity involves whether the test appears to measure what it is supposed to measure. Test-related factors (examiner-examinee rapport, handicaps, motivation), criterion-related factors, or intervening events could affect validity.

With regard to the interrelatedness among reliability and validity, reliability essentially sets the upper limit of a test’s validity, and reliability is a necessary but not sufficient condition for valid measurement. A specific test can be reliable, but it may be invalid when used to evaluate a function that it was not designed to measure. However, if a test is not reliable, it cannot be valid. Stated differently, all valid tests are reliable, unreliable tests are not valid, and reliable tests may or may not be valid.¹⁹

Practitioners should also be cognizant of the fact that testing can involve a speed test, in which items are relatively easy but there is a specific time limit and it is difficult to answer all of the items. The infamous 2-minute math test is an example. A power test involves progressively more difficult items, this difficulty being determined by the limits of a child’s knowledge base.

Age and Grade Equivalents

Age- and grade-equivalent scores are based on raw scores and portray the average age or grade placement of children who obtained a particular raw score. Although these metrics are useful in explaining results to parents and make conceptual sense, age and grade-equivalent scores are uneven units of measurement. For example, a six-month difference in performance at the age of 2 years is much more significant than a 6-month lag at age 8 years. Moreover, a 9-year-old with an age equivalent of 7 years is quite different from a 4-year-old functioning at a 7-year age equivalent, or an average 7-year-old. These measures lack precision, and in some test manuals, the same standard scores can produce somewhat different age/grade equivalents. Both metrics assume that growth is consistent throughout the school year and tend to exaggerate small differences in performance. These measures also vary from test to test. Furthermore, with achievement testing, it is necessary to know whether age or grade norms were used to obtain standard scores. For example, if age norms are used and the child had been retained in grade, he or she would be at a significant disadvantage because he or she would not have been exposed to the more advanced material. Conversely, if a child failed second grade and is being tested in early fall while repeating second grade, he or she may receive inflated scores if grade norms are used.

The IQ/DQ ratio (developmental quotient) is computed as mental age (obtained by the use of a test score) ÷ the child’s chronologic age and then multiplied by 100. Although developmental age refers to a level of functioning, DQ reflects the rate of development.¹⁹ IQ/DQ ratio scores are not comparable at different age levels because the standard deviation (variance) of the ratio does not remain constant. As a result, interpretation is difficult, and these scores generally are not used very much in contemporary standardized testing. Instead, the deviation IQ/DQ is employed. The deviation IQ is a method of estimation that allows comparability of scores across ages and is used with most major psychological and developmental test instruments. The deviation IQ/DQ is norm referenced and normally distributed, with the same standard deviation; typically, M = 100 and SD = 15. Therefore, a deviation IQ of 85 obtained at age 6 should have the same meaning as a score of 85 obtained at age 9.

The standard error of measurement (SEM) is an estimate of the error factor in a test that is the result of sampling or test characteristics, taking into account the mean, standard deviation, and size of the sample. The larger the standard error of measurement, the greater the uncertainty associated with a given child’s score. The SEM is produced by multiplying the standard deviation of the test by the square root of (1− the reliability coefficient of the test). In 95% of cases, the interval of approximately two times (1.96) the SEM above or below a child’s score would contain the “true” score: a 95% confidence interval. Stated differently, a 95% confidence interval indicates that if a test is given 100 times with different samples, scores will fall in this interval 95% of the time. In a 90% confidence interval, an interval of 1.64× the SEM above and below a child’s score would contain the “true” score. Such estimates are important in test-retest situations or in the case of a child who does not receive services because of missing a cutoff score by only a few points (e.g., a WISC-IV Full Scale IQ score of 72).

A final concern is the Flynn effect,²⁰ in which test norms increase approximately 0.3 to 0.5 points per year, which is equivalent to a 3- to 5-point increment per decade. This finding has ramifications in comparisons of scores obtained on earlier versions of tests to more contemporary scores (e.g., WISC-Revised to the WISC—Third Edition or WISC-IV; BSID to BSID-II; Stanford-Binet form LM to the 5^th edition). Caution is warranted when the practitioner attributes a decline in scores to a loss of cognitive ability, because in actuality this decline may be attributable to the fact that a newer test has mean scores that are considerably lower than those of an earlier version of the test (e.g., 5-8 points).²⁰ This issue would also have ramifications for children whose IQ score on an older version of a test is in the low 70s but decreases to below the cutoff for mild mental retardation on a newer version.

Although some practitioners may administer tests, all have occasion to respond to inquiries from parents about their child’s test performance or diagnosis derived from testing. The physician’s role includes explaining test results to parents, acknowledging parental concerns and advocating for the child, providing additional evaluation, or referring to other professionals.²¹

REFERENCES

1 Costello EJ, Edelbrock C, Costello AJ, et al. Psychopathology in pediatric primary care: The new hidden morbidity. Pediatrics. 1988;82:415-424.

2 Lavigne JV, Binns HJ, Christoffel KK, et al. Behavioral and emotional problems among preschool children in pediatric primary care: Prevalence and pediatricians’ recognition. Pediatrics. 1993;91:649-657.

3 Sattler JM. Assessment of Children, 4th ed. San Diego: Jerome M. Sattler, 2001.

4 Bayley N. Bayley Scales of Infant and Toddler Development, Third Edition: Technical Manual. San Antonio, TX: PsychCorp, 2005.

5 Roid GH. Stanford-Binet Intelligence Scales for Early Childhood, Fifth Edition: Manual. Itasca, IL: Riverside, 2005.

6 Wilkins C, Rolfhus E, Weiss L, et al: A Simulation Study Comparing Inferential and Traditional Norming with Small Sample Sizes. Paper presented at annual meeting of the American Educational Research Association, Montreal, Canada, 2005.

7 Wright BD, Linacre JM. WINSTEPS: Rasch Analysis for All Two-Facet Models. Chicago: MESA, 1999.

8 Rasch G. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press, 1980.

9 Dorans NJ, Holland PW. DIF detection and description: Mantel-Haenszel and standardization. In: Holland PW, Wainer H, editors. Differential Item Functioning. Mahwah, NJ: Erlbaum; 1993:35-66.

10 Gyurke JS, Aylward GP. Issues in the use of normreferenced assessments with at-risk infants. Child Youth Fam Q. 1992;15:6-8.

11 Bracken BA. Bracken Basic Concepts Scale-Revised. San Antonio, TX: The Psychological Corporation, 1998.

12 Greenspan SI, Meisels SJ. Toward a new vision for the developmental assessment of infants and young children. In: Meisels SJ, Fenichel E, editors. New Visions for the Developmental Assessment of Infants and Young Children. Washington, DC: Zero to Three: National Center for Infants Toddlers and Families; 1996:11-26.

13 Meisels S. Charting the continuum of assessment and intervention. In: Meisels SJ, Fenichel E, editors. New Visions for the Developmental Assessment of Infants and Young Children. Washington, DC: Zero to Three: National Center for Infants Toddlers and Families; 1996:27-52.

14 Bricker D. Assessment, Evaluation and Programming System for Infants and Children, Volume 1: AEPS Measurement for Birth to Three Years. Baltimore: Paul H. Brookes, 1993.

15 Johnson-Martin N, Jens K, Attermeir S, et al. The Carolina Curriculum, 2nd ed. Baltimore: Paul H. Brookes, 1991.

16 Urdan T. Statistics in Plain English. Mahwah, NJ: Erlbaum, 2001.

17 Frankenburg WK, Chen J, Thornton SM. Common pitfalls in the evaluation of developmental screening tests. J Pediatr. 1988;113:1110-1113.

18 Frankenburg WK. Preventing developmental delays: Is developmental screening sufficient? Pediatrics. 1994;93:586-593.

19 Salvia J, Ysseldyke JE. Assessment, 8th ed. New York: Houghton Mifflin, 2001.

20 Flynn JR. Searching for justice. The discovery of IQ gains over time. Am Psychol. 1999;54:5-20.

21 Aylward GP. Practitioner’s Guide to Developmental and Psychological Testing. New York: Plenum Medical, 1994.

7B. Surveillance and Screening for Development and Behavior

FRANCES P. GLASCOE, PAUL H. DWORKIN

More than three decades have elapsed since the identification of developmental, behavior, and psychosocial problems as the so-called “new morbidity” of pediatric practice.¹ During the ensuing years, profound societal change, with public policy mandates for deinstitutionalization and mainstreaming, has further influenced the composition of pediatric practice. Studies have documented the high prevalence of developmental and behavioral issues within the practice setting, including disorders of high prevalence and lower severity such as specific learning disability, attention-deficit/hyperactivity disorder, and speech and language impairment, as well as problems of higher severity and lower prevalence such as mental retardation, autism, cerebral palsy, hearing impairment, and serious emotional disturbance.²

The critical influence of the early childhood years on later school success and the well-documented benefits of early intervention provide a strong rational for the early detection of children at risk for adverse developmental and behavioral outcomes. Neurobiological, behavioral, and social science research findings from the 1990s, the so-called decade of the brain, have emphasized the importance of experience on early brain development and on subsequent development and behavior and the extent to which the less differentiated brain of the younger child is particularly amenable to intervention.³

In this chapter, we highlight links between early detection and early intervention. Much has been written on this topic and the American Academy of pediatrics has recently revised its policy statement on developmental screening. The new statement includes expert opinion on how to provide quality developmental surveillance (the process of incorporating medical/developmental history, knowledge of the family, parents’ concerns, screening test results, and clinical observation) in order to make informed decisions about any needed referrals. Thus, this chapter offers a review of evidence and challenges in surveillance and screening, reconciles both approaches, includes a list of quality screening measures, describes effective early identification initiatives, and provides suggestions for enhancing the well-child visits to facilitate early detection of developmental and behavioral problems.

BACKGROUND

Early identification and intervention affords the opportunity to avert the inevitable secondary problems with loss of self-esteem and self-confidence that result from years of struggle with developmental dysfunction. Federal legislation, the Individuals with Disabilities Education Act (IDEA) of 2004, and related state legislation mandate early detection and intervention for children with developmental and behavioral disabilities. Surveys indicate that parents have strong interest in promoting children’s optimal development.^4,⁵

Perhaps the most compelling rationale for early detection is the effectiveness of early intervention. Researchers have documented the benefits of early intervention in children with mental retardation and physical handicaps, particularly when improved family functioning is a measured outcome.⁶ More recently, the benefits of early intervention for children at environmental risk has also been demonstrated. For example, enrollment and participation of disadvantaged children in Head Start programs contribute to a decreased likelihood of grade repetition, less need for special education services, and fewer school dropouts.⁷ Detection is also supported by the clearer delineation of adverse influences on children’s development. For example, the effect of such diverse factors as low-level lead exposure and adverse parent-infant interaction on child development has implications for early identification.

By virtue of their access to young children and their families, child health providers are particularly well positioned to participate in early identification of children at risk for adverse outcomes through ongoing monitoring of development and behavior. Clinicians’ knowledge of medical and genetic factors also facilitates early identification of conditions associated with developmental problems. Furthermore, through their relationships with children and their families, pediatricians and other child health providers are familiar with the social and familial factors that place children at environmental risk. Professional guidelines emphasize the importance of early detection by child health providers. The American Academy of Pediatrics’ Committee on Children with Disabilities; Medicaid’s Early Periodic Screening, Diagnosis, and Treatment (EPSDT) program; and Bright Futures (guidelines for health supervision of infants, children, and adolescents developed by the American Academy of Pediatrics and the Maternal and Child Health Bureau) all encourage the effective monitoring of children’s development and behavior and the prompt identification of children at risk for adverse outcomes.^8,⁹ The emphasis on the primary care practice as a comprehensive medical home for all children also supports the office as the ideal medical setting for developmental and behavioral monitoring.¹⁰

Despite this strong rationale, results of surveys of parents and child health providers demonstrate that current practices widely vary and suggest the need to strengthen developmental monitoring and early detection. Only about half of parents of children aged between 10 and 35 months recall their children’s ever having received structured developmental assessments from their child health providers.¹¹ Parents also report gaps in the discussion of development and related issues with pediatric providers.¹² Most pediatricians employ informal, nonvalidated approaches to developmental screening and assessment. The majority of pediatricians do not incorporate within their practice such tools as those recommended by Bright Futures to aid in early detection.¹³

Not surprisingly, the early detection of children at risk for adverse developmental and behavioral outcomes has proved elusive. Fewer than 30% of children with such disabilities as mental retardation, speech and language impairments, learning disabilities, and serious emotional/behavioral disturbances are identified before school entry.¹³ This lack of detection precludes the opportunity and benefits of timely, early intervention. Although nearly half of parents have some concerns for their child’s development or behavior, such concerns are infrequently elicited by child health providers.¹⁴

Multiple factors have been cited as barriers to effective developmental monitoring. Child health providers report inadequate time during the office visit to deliver developmental services, including monitoring and early detection. A professionally administered developmental test (e.g., the Denver-II) cannot be adequately performed in a child health supervision visit that lasts, on average, less than 20 minutes and in which other content must be delivered. Other recognized barriers include the inadequate training of child health providers and ineffective administrative and clinical practices, including staffing and record keeping. Despite the assigning of a value to the billing code for developmental screening (96110) by the Centers for Medicare and Medicaid Services, reimbursement for developmental services in general and for developmental monitoring specifically by third-party payers remains inadequate. Health care organizations do not measure or prioritize the developmental content of child health supervision services. Furthermore, even if at-risk children are identified, the linkage of such children and their families to developmentally enhancing programs and services is often inefficient and challenging.

DEVELOPMENTAL SURVEILLANCE

Currently, child health providers employ a variety of techniques to monitor children’s development and behavior. History taking during a health supervision visit typically includes a review of age-appropriate developmental milestones. Unfortunately, recall of such milestones is notoriously unreliable and typically reflects parents’ prior conceptions of children’s development.¹⁵ Although the accuracy in determining the age of performing certain tasks is certainly improved by the use of diaries and records, the wide range of normal acquisition for such milestones limits their value in assessing children’s developmental progress. Child health providers may also question parents as to their predictions for their child’s development. Predictions (typically elicited with questions such as “when your child becomes an adult, do you think he or she will be above average, average, or below average?”) are also unhelpful in developmental monitoring, because parents are likely to expect average functioning for children with delays and predict overachievement for children developing at an average pace, a phenomenon dubbed the presidential syndrome.¹⁵

During the physical examination, child health providers may interact with children by using an informal collection of age-appropriate tasks. The lack of a standardized approach to measuring developmental progress makes interpretation of children’s performance on such tasks difficult. The reliance of child health providers on “clinical judgment,” based on subjective impressions during the performance of the history and physical examination, are also fraught with hazard. Such impressions are unduly influenced by the extent to which a child is verbal and sociable in a setting that may be frightening, an effect likely to restrict affect and deter spontaneous demonstrations of pragmatic language skills. Studies have documented the poor correlation between provider’s subjective impressions of children’s development and the results of formal assessments. Clinical judgment identifies fewer than 30% of children with developmental disabilities.¹⁵ The reliance on subjective impressions undoubtedly contributes to the late identification of children with such developmental issues as mild mental retardation.

According to research findings and expert opinion, surveillance and screening constitute the optimal approach to developmental monitoring.¹⁶ As originally described by British investigators, surveillance encompasses all activities relating to the detection of developmental problems and the promotion of development through anticipatory guidance during primary care.¹⁷ Developmental surveillance is a flexible, longitudinal, continuous process in which knowledgeable professionals perform skilled observations during child health care.¹⁷ Although surveillance is most typically performed during health supervision visits, clinicians may perform opportunistic surveillance during sick visits by exploring the child’s understanding of illness and treatment.^18a

The emphasis of developmental surveillance is on skillfully observing children and identifying parental concerns. Components include eliciting and attending to parents’ opinions and concerns, obtaining a relevant developmental history, skillfully and accurately observing children’s development and parent-child interaction, and sharing opinions and soliciting input from other professionals (e.g., visiting nurse, child care provider, preschool and school teacher), particularly when concerns arise. Developmental history should include an exploration of both risk and protective factors, including environmental, genetic, biological, social, and demographic influences, and observations of the child should include a careful physical and neurological examination. Surveillance stresses the importance of viewing the child within the context of overall well-being and circumstance.¹⁷

The most critical component of surveillance is eliciting and attending to parents’ opinions and concerns. Research has elucidated the value of information available from parents. Although there are several ways to obtain quality information, research on parents’ concerns is voluminous. Concerns are particularly important indicators of developmental problems, particularly for speech and language function, fine motor skills, and general functioning (e.g., “He’s just slow”).^15,¹⁸ Although concerns about self-help skills, gross motor skills, and behavior are less sensitive indicators of developmental functioning, such opinions should serve as clinical “red flags,” mandating closer clinical assessment and developmental promotion.^15,¹⁸ The manner in which parental concerns are elicited is important. Asking parents whether they have worries about their children’s development is unlikely to be useful, because they may be reluctant to acknowledge fears and interpret “development” as merely reflecting physical growth. In contrast, asking parents whether they have any concerns about the way their child is behaving, learning, and developing, followed by more specific inquiry about functioning in specific developmental domains, is more likely to yield valid and clinically useful responses.^18,¹⁹ Clinicians must be mindful of the complex relationship between concerns and disability (some concerns are predictors of developmental status only at certain ages), the critical importance of eliciting concerns rather than relying on parents to volunteer, and the value of an evidence-based approach to interpreting concerns.^18,²¹

Parents’ estimations are also accurate indicators of developmental status. For example, a study conducted in primary care demonstrated the extent to which parents’ estimates of cognitive, motor, self-help, and academic skills correlate with findings on developmental assessments.²² Parental responses to the question, “Compared with other children, how old would you say your child now acts?” are important indicators of developmental delay, although such questions are more challenging for parents than elicitations of concerns.²²

In contrast to the limitations of parents’ recall of developmental milestones, contemporaneous descriptions of children’s current skills and achievements are useful indicators of developmental status. Similar to the solicitation of parental concerns, the format of questions eliciting parental report is important. Recognition questions such as “Does your child use any of the following words?” are more likely to yield helpful information than are such identification questions as “What words does your child say?” that rely on parents’ spontaneous recall and report. Parental report is likely to yield higher estimates of children’s functioning than is professional assessment. This discrepancy is less likely to result from parental inaccuracy or exaggeration than from parents’ reports on newly emerging skills that are inconsistency demonstrated in the familiar and supportive home environment.

Parents’ opinions and concerns must be considered within the context of cultural influences. Parents’ appraisals and descriptions are influenced by expectations for children’s normal development, and such expectations vary among different ethnic groups. For example, in a study of Latino (primarily Puerto Rican), African American, and European American mothers, Puerto Rican mothers expected personal and social milestones to be normally achieved at a later age than did the other groups, whereas first steps and toilet training were expected at an older age by European American mothers.²³ Such differences were often explained by underlying cultural beliefs, values, and childrearing practices. For example, the older age for achievement of self-help skills is consistent with the Puerto Rican concept of familismo and its emphasis on caring for children.

USE OF SCREENING TOOLS

The effectiveness of developmental surveillance is enhanced by incorporating valid measures of parents’ appraisals and descriptions of children’s development and skilled professional observations. The process is enhanced by the periodic use of evidence-based screening tools (meaning that measures are repeatedly administered over time), including parent-completed questionnaires and professionally administered tests. Screening tools that elicit information from parents may be used on a routine basis to supplement data gathering during health supervision visits, may be used periodically at select ages (e.g., 9, 18, and 24 months), or may be used in a targeted manner to further explore the significance of parental concerns. Similarly, professionally administered screening tests may be administered periodically to help ensure that children do not elude early identification, or they may be used when concerns arise (so-called second-stage screening) or when parents are not able to provide information.

Table 7B-1 includes descriptions of screening tools that are highly accurate: that is, based on nationally representative samples, fulfilling psychometric criteria (see Chapter 7A), and having both sensitivity and specificity of at least 70% to 80%. Two types of tools are presented: those relying on information from parents and those requiring direct elicitation of children’s skills. The latter are useful in practices with staff (e.g., nurses, pediatric nurse practitioners) who have the time and skill to administer relatively detailed screens. Such measures are also useful in early intervention programs. Information is included on purchasing, cost, time to administer, scores produced, and age ranges of the children tested.

TABLE 7B-1 Developmental, Mental Health/Behavioral and Academic Screens

COMBINING SCREENING AND SURVEILLANCE

We now present an algorithm for combining surveillance and screening into an effective, evidence-based process for detecting and addressing developmental and behavioral issues. The American Academy of Pediatrics recently revised its policy statement on early detection.⁸ We include the elements of the statement, as follows.

1. Review the patient’s chart for medical risk factors. Take note of such potentially teratogenic exposures as radiation or medications, infectious illnesses, fever, addictive substances, and trauma, and review results of neonatal screens, including phenylketonuria, hypothyroidism, and other metabolic conditions. Also consider the perinatal history, including birth weight, gestational age, Apgar scores, and any medical complications. In addition, postnatal medical factors to be considered include chronic respiratory or allergic illness, recurrent otitis, head trauma, and sleep problems, including symptoms of obstructive sleep apnea.

2. Identify psychosocial risk factors. Common risk factors for developmental and behavioral problems include parents with less than a high school education, parental mental health or substance abuse problems, four or more children in the home, single-parent family, poverty, frequent household moves, limited social support, parental history of abuse as a child, and ethnic minority status. Four or more risk factors are associated with developmental performance that is well below average, which, in turn, has an adverse effect on future success in school.²⁴ The presence of multiple risk factors suggests the need for enrichment or remedial programs, regardless of screening results. Examples include Head Start, after-school tutoring, parenting training, social work services, mentoring, quality child care, and summer school. A measure such as the Family Psychosocial Screen (available at www.pedstest.com) is often helpful for identifying psychosocial risk factors and can be used as a standard intake form for new patients.

3. Elicit parents’ concerns and observations. Careful attention to wording is essential. Although facilitating conversations with parents can be informally accomplished, several helpful sources suggest well-worded questions. For example, Bright Futures contains useful trigger questions. A parent-completed measure, the Parents’ Evaluation of Developmental Status questionnaire (see Table 7B-1), has empirically tested wording and weighs the types of concerns parents raise, assigns levels of risk, and identifies optimal responses to concerns.

4. Conduct a physical examination. Examination should include attention to growth parameters, head shape and circumference, facial and other body dysmorphology, eye findings (e.g., cataracts in various inborn errors of metabolism), vascular markings, and signs of neurocutaneous disorders (e.g., café-au-lait spots in neurofibromatosis, hypopigmented macules in tuberous sclerosis). Vision and hearing screening are essential.

5. Administer/score developmental screening tests. Use of parent report measures, completed before the visit or in the waiting/examination room, reduces the amount of time needed for screening. Positive results may be followed by additional screening of social-emotional functioning (e.g., Ages & Stages Questionnaires: Social-Emotional and the Modified Checklist for Autism in Toddlers; see Table 7B-1) to better identify the areas of delay and types of services needed. Note that the AAP’s new statement recommends use of an autism specific screen like the M-CHAT at both 18 and 24 months, regardless of performance on broadband tools like PEDS or the ASQ.

6. Provide additional medical screens when developmental-behavioral screens are positive. When indicated, common health-related causes for delays and disorders should include screens for iron deficiency and lead toxicity. Unless suggested by parental report (e.g., seizure activity) or clinical findings (e.g., microcephaly, expanding head circumference), neurophysiological (e.g., electroencephalogram) and neuroimaging (e.g., computed tomographic scan, magnetic resonance imaging) studies are not routinely indicated. Developmental delay may suggest the need for metabolic screening for ammonia, organic, and amino acids (or referral for such screens). A progressive loss of milestones suggests the possible need to screen for human immunodeficiency virus (HIV).

7. Explain results to parents. When parents’ concerns have been elicited, the process of explaining findings can begin with a simple affirmation of parents’ observations. It is important to present results in person and to maintain a positive outlook about available services and their potential to improve outcome. Because screening/surveillance activities are not diagnostic in nature, the clinician should avoid diagnostic labels in favor of euphemisms, such as “developmental delay,” “behind other children,” and “having difficulties with…” When a parent reports conflicting perceptions within the family about the possibility of problems, the clinician should offer to explain findings to other family members. Asking parents whether they know other families with children who have developmental-behavioral differences may be helpful in clarifying discussions.

8. If indicated, make referrals for subspecialty medical services. When medical factors are identified, an appropriate response is referral for further evaluation.

9. Seek nonmedical interventions. Nonmedical interventions need not await a complete diagnosis. All children with apparent delays or disorders should be referred promptly to appropriate programs and services. Public programs, including those mandated by such legislation as the IDEA, should be available through community-based agencies or the public schools without cost to the family and generally provide a range of high-quality therapies and evaluations, including speech-language, physical, and occupational therapy; assistive technology evaluations; and behavioral interventions. Most IDEA programs do not provide a detailed diagnosis but rather define functional skills and deficits. As a consequence, a referral may also need to be made to a multidisciplinary diagnostic service. Because such centers typically have long waiting lists and because a final diagnosis is not necessary for initiating intervention, it is best to make such referrals concurrent with a referral to an IDEA program. Other services should be sought (e.g., Head Start, after school tutoring, quality daycare, parent training) for children with psychosocial risk factors who do not fulfill specific eligibility requirements for early intervention or special education. Referral letters to programs and services should include suggestions for the types of evaluations needed (e.g., speech-language therapy, occupational and physical therapy, social-emotional assessment, intelligence testing, academics). Programs offered through IDEA often require documentation of hearing and vision status. Some programs require the completion of specific referral forms. Parental consent should be obtained for sharing information, including copies of subsequent evaluations.

10. Offer developmental promotion. Regardless of whether a child has developmental problems, parents need advice and encouragement on promoting optimal development, particularly in domains associated with school success: language, academic/preacademic skills, and cognition. Clinicians may provide patient education materials, lists of informative and factual Web sites, lists of parent training services, and contacts for social support programs. Group discussions for parents on developmental topics are another potential strategy but require careful planning and organization. Developmental promotion is assisted by a well-organized system for filing and retrieving parent-focused materials (see www.dbpeds.org for materials and links).

11. Establish a medical home. For children with developmental-behavioral problems and/or complex health care needs, primary care contact and perspectives are of critical importance to promote optimal health and development. The American Academy of Pediatrics National Center of Medical Home Initiatives for Children with Special Needs (www.medicalhomeinfo.org) provides an essential guide for organizing practices to ensure continuity of care, manage multiple referrals and comprehensive records, coordinate appointments, and communicate with various providers.

SYSTEMWIDE APPROACHES TO SURVEILLANCE AND SCREENING

State wide and countywide efforts to enhance collaboration among medical and nonmedical providers offer some of the most promising evidence for the effectiveness of surveillance and screening. Documented outcomes include large increases in screening rates during EPSDT visits;²⁵ a fourfold increase in early intervention enrollment, resulting in a match between the prevalence of disabilities and receipt of services²⁶; a 75% increase in identification of children from birth to age 3 with autism spectrum disorder²⁷; improvement in reimbursement for screening²⁸; and, interestingly, increased attendance at well-child visits when parents’ concerns are elicited and addressed.²⁵

Among the numerous initiatives—national, international, and regional—we selected a few to highlight because they employed varied models and gathered outcome data to support their successes (and challenges).

The Assuring Better Child Health and Development (ABCD) Program

Created by The Commonwealth Fund, the ABCD Program has identified policy strategies for state Medicaid agencies to strengthen the delivery and financing of early childhood services for low-income families. The emphasis is on assisting participating states in developing care models that promote healthy development, including the mental development of young children. Models include developmental screening, referral, service coordination, and educational materials and resources for families and clinical providers. The program has resulted in improvements in screening, surveillance, and assessment. Most notably, work in North Carolina facilitated a 75% increase in screening, increased enrollment rates in early intervention from 2.6% to 8% (in line with the Centers for Disease Control and Prevention’s prevalence projections), while simultaneously lowering referral age ²⁶ (http://www.nashp.org; http://www.cdc.gov/ncbdd/child/interventions.htm).

Help Me Grow

A program of the Connecticut Children’s Trust Fund, Help Me Grow links children and families to community programs and services by using a comprehensive statewide network. Components of the program include the training of child health providers in effective developmental surveillance; the creation of a triage, referral, and case management system that facilitates access for children and families to services through Child Development Infoline; the development and maintenance of a computerized inventory of regional services that address developmental and behavioral needs of children and their families; and data gathering to systematically document capacity issues and gaps in services. The program has increased identification rates of at-risk children by child health providers and increased referral rates of such children to programs and services. For example, chart reviews conducted in participating practices noted an increase in documented developmental or behavioral concerns from 9% before training to 18% after training. Furthermore, training resulted in significant differences in referral rates for certain conditions. Behavioral conditions were involved in 4% of referrals from trained practices, in comparison with 1% from untrained practices. Four percent of referrals from trained practices were for parental support and guidance, in comparison with fewer than 1% from untrained practices ²⁹ (http://www.infoline.org/Programs/helpmegrow.asp).

Promoting Resources in Developmental Education (PRIDE)

This program is a 3-year project funded by the Duke Endowment through a partnership of the Children’s Hospital, the Center for Developmental Services (a colocation of agencies serving children with developmental disorders), the regional office of the state’s early intervention system (BabyNet), the local school district’s Childfind and Parents-as-Teachers programs, and a parent-to-parent mentoring program for parents of children with special health care needs. The goal of PRIDE is earlier identification and intervention for children in Greenville County, South Carolina with developmental delays and improved support for their parents.

The program has targeted key players in the lives of infants and toddlers as follows: Parents sign up around the time of their child’s birth to receive milestone cards every 3 to 6 months during the first 3 years that describe the key developmental attainments, activities to promote development at that age, and red flags for potential developmental problems. Parents are instructed to discuss any concerns with their physician. Primary care physicians are provided with information and tools (the Parents Evaluation of Developmental Status questionnaire) to improve their system of developmental screening. A nurse practitioner employed by PRIDE as the “physician office liaison” works closely with practices, initially by setting up lunch meetings with physicians and staff that are also attended by the PRIDE developmental-behavioral pediatrician. With the agreement of the physicians, the liaison then assists the office staff in implementing the system and provides a “Resource Guide” with information on local developmental services and forms to facilitate referrals. Child care providers have the opportunity to attend educational sessions (for credit hours) in which they learn about child development, signs of developmental problems, and services that are available for these children. The training sessions are provided in collaboration with local programs that promote higher quality child care and early education (Success By 6 and First Steps), and the attendees receive “toolkits” with information on the topics discussed. Initial results of the program indicate success; 16 of 17 local pediatric practices (which previously had no standardized system of developmental screening) now utilize the Parents Evaluation of Developmental Status questionnaire. Over the first 18 months of the program, referrals to early intervention have increased almost 100% and referrals to the school’s Childfind program by 30%. Other service providers have seen increases in new referrals of up to 30%. The average age at referral to early intervention has also dropped slightly.

Not surprisingly, increasing rates of referral raised the likelihood of even longer waiting lists for tertiary-level developmental-behavioral pediatric evaluations. To address this challenge, the PRIDE staff sought funding from The Commonwealth Fund to study the feasibility and cost effectiveness of a model of “midlevel” developmental-behavioral pediatrics assessment (as a step between telephone triage/record review and comprehensive diagnostic evaluation) for children younger than 6 years.³⁰

First Signs

This national and international training effort is devoted to early detection of children with disabilities, with a particular focus on autism spectrum disorders. This detection is accomplished through a mix of print materials and broadcast press, direct mail, public service announcements, presentations (to medical and nonmedical professionals), a richly informative website (www.firstsigns.org), and detailed program evaluation. Although First Signs initiatives have been conducted in several states, including New Jersey, Alabama, Delaware, and Pennsylvania, the Minnesota campaign is highlighted here because of that state’s assistance in program evaluation. Minnesota is divided into discrete service regions. Centralized train-the-trainers forums were conducted to prepare 130 professionals as outreach trainers. These individuals were from all regions of the state, and most were early interventionists, family therapists, and other nonmedical service providers. They then provided more than 165 workshops to 686 medical providers, to whom they offered individualized training tailored for health care clinics, as well as training for more than 3000 early childhood specialists. First Signs Screening Kits (which include video, information about and in some cases copies of appropriate screening tools, wall charts and parent handouts on warning signs) were distributed to more than 900 practitioners and clinics. In addition, public service announcements were aired across the state in collaboration with the Autism Society of Minnesota. Within 12 months, there was a 75% increase in the number of young children identified in the 0- to 2-year age group and an overall increase of 23% in detection of autism spectrum disorders among all children aged 0 to 21 years in that same period. The state has now expanded the initiative to include childcare providers and is educating them about red flags and warning signs. In addition, physicians with the Minnesota Chapter of the American Academy of Pediatrics Committee for Children with Disabilities have begun incorporating First Signs information into physician training program at the University of Minnesota.²⁷

Blue Cross/Blue Shield of Tennessee

Blue Cross/Blue Shield of Tennessee requested that child health providers use standardized, validated screening at all EPSDT visits. To facilitate compliance, Blue Cross/Blue Shield of Tennessee piloted a program in 34 high-volume, Medicaid-managed care practices. Outreach nurses, called regional clinical network analysts, trained providers on site how to administer, score, interpret, and submit reimbursement for the Parents’ Evaluation of Developmental Status questionnaire (the standardized developmental-behavioral surveillance and screening instrument that elicits parents’ concerns about their children). After training, screening rates increased from 0% to 43.5% during the pilot phase. At the same time, the practices experienced a 16% increase in attendance at scheduled well-child visits, which suggests that focusing on parents’ concerns may increase their adherence to visit schedules. Blue Cross/Blue Shield of Tennessee, together with the Tennessee Chapter of the American Academy of Pediatrics, is now providing training across the state.²⁵ More information can be found through the Center for Health Care Strategies, “Best Clinical and Administrative Practices for Statewide” developmental and behavioral screening initiatives as established by the Center for Health Care Strategies [http://www.chcs.org/]

Healthy Steps for Young Children

This a national initiative improves traditional pediatric care with the assistance of an in-office child development specialist, whose duties include expanded discussions of preventive issues during well-child and home visits, staffing a telephone information line, disseminating patient education materials, and networking with community resources and parent support groups. Now in its 12th year, Healthy Steps followed its original cohort of 3737 intervention and comparison families from 15 pediatric practices in varied settings. In comparison with controls, Healthy Steps families received significantly more preventive and developmental services, were less likely to be dissatisfied with their pediatric primary care, and had improved parenting skills in many areas, including adherence to health visits, nutritional practices, developmental stimulation, appropriate disciplinary techniques, and correct sleeping position. In practices serving families with incomes below $20,000, use of telephone information lines increased from 37% before the intervention to 87% after; office visits with someone who teaches parents about child development increased from 39% to 88%; and home visits increased from 30% to 92%. Low-income families receiving Healthy Steps services were as likely as high-income parents to adhere to age-appropriate well-child visits at 1, 2, 4, 12, 18, and 24 months.^31,³² One program evaluation suggests that Healthy Steps offers a benefit comparable with that of Head Start at about one-tenth the cost,³³ although this claim is somewhat premature because Head Start data now extend to more than 35 years of follow-up research with a proven return rate of $17.00 for each $1.00 spent on early intervention, with savings realized through reductions in teen pregnancy, increases in high school graduation and employment rates, and decreased adjudication and violent crime.⁷ Nevertheless, Healthy Steps is extremely promising and inexpensive and includes a strong evaluation component that will answer questions about its long-term effect.

CONCLUSION

In summary, both expert opinion and research evidence support surveillance and screening as the optimal clinical practice for monitoring children’s development and behavior, promoting optimal development, and effectively identifying children at risk for delays. The effectiveness of surveillance is enhanced by incorporating valid measures of parents’ appraisals and descriptions of children’s development and behavior and skilled professional observations. Developmental monitoring should combine surveillance at all health supervision visits with the periodic use of evidence-based screening tools, including parent-completed questionnaires and professionally administered tests.

To be effective, identification must lead to intervention through referral to appropriate programs and services. Surveillance and screening activities must ensure access to medical evaluations, developmental assessments, and intervention programs. Ultimately, collaboration among health care providers, parents, and early intervention and other social service providers is crucial for effectively addressing the challenges of detection and timely enrollment in early intervention programs and services.

Establishing effective surveillance and screening in primary care is nevertheless challenging.¹³ Effective initiatives consistently offer training to providers, office staff, and nonmedical professionals. Implementation details are numerous (e.g., incorporation into existing office workflow, ordering and managing screening materials, gathering and organizing lists of referral resources and patient education handouts, identifying measures that work well with available personnel, and determining how best to communicate with nonmedical providers).^18,^18a,^26,³⁴ Ultimately, helping health care providers recognize the need to adopt effective detection methods is the critical first step.

DIRECTIONS FOR FURTHER RESEARCH

Although much is known about the accuracy of screening tools, studies are needed to determine their accuracy when used repeatedly (given that screening measures both overrefer and underrefer to some extent). Other rich topics of inquiry include the following: How do surveillance methods enhance development and early detection, and which specific techniques most enhance decision making? Does improved reimbursement have a positive effect on provider behavior? How can surveillance and screening be incorporated into electronic health records? In the absence of regional and state initiatives, can primary care professionals engage in effective self-study and thus positive practice change? What teaching methods and content best help residents master efficient surveillance and screening techniques that work well in primary care? Perhaps the most critical area in need of further inquiry is determining the longitudinal outcomes of families and children when surveillance and screening are used together.

REFERENCES

1 Haggerty RJ, Roughman KJ, Pless IB. Child Health and the Community. New York: Wiley, 1975.

2 Dobos AE, Dworkin PH, Bernstein BA. Pediatricians’ approaches to developmental problems: Has the gap been narrowed? J Dev Behav Pediatr. 1994;15:34-38.

3 Institute of Medicine. From Neurons to Neighborhoods: The Science of Early Childhood Development. Washington, DC: National Academies Press, 2000.

4 Blumberg SJ, Halfon N, Olson LM. The national survey of early childhood health. Pediatrics. 2004;113:1899-1906.

5 Young KT, Davis K, Schoen C, et al. Listening to parents. A national survey of parents with young children. Arch Pediatr Adolesc Med. 1998;152:255-262.

6 Shonkoff JP, Hauser-Cram P. Early intervention for disabled infants and their families: a quantitative analysis. Pediatrics. 1987;80:650-658.

7 Shonkoff JP, Meisels SJ. Handbook of Early Childhood Intervention, 2nd ed. New York: Cambridge University Press, 2000.

8 American Academy of Pediatrics, Council on Children with Disabilities. Identifying infants and young children with developmental disorders in the medical home: An algorithm for developmental surveillance and screening. Pediatrics. 2006;118:403-420.

9 Green M, Palfrey JS, editors. Bright Futures: Guidelines for Health Supervision of Infants, Children, and Adolescents, 2nd ed., Arlington, VA: National Center for Education in Maternal and Child Health, 2002.

10 American Academy of Pediatrics, Medical Home Initiatives for Children with Special Needs Project Advisory Committee. The medical home. Pediatrics. 2002;110:184-186.

11 Halfon N, Regalado M, Sareen H, et al. Assessing development in the pediatric office. Pediatrics. 2004;113:1926-1933.

12 Bethell C, Reuland CHP, Halfon N, et al. Measuring the quality of preventive and developmental services for young children: National estimates and patterns of clinicians’ performance. Pediatrics. 2004;113:1973-1983.

13 Silverstein M, Sand N, Glascoe FP, et al. Pediatricians’ reported practices regarding developmental screening: Do guidelines work? And do they help? Pediatrics. 2005;116:174-179.

14 Inkelas M, Glascoe FP, Regalado M, et al: National Patterns and Disparities in Parent Concerns about Child Development. Paper presented at the annual meeting of the Pediatric Academic Societies, Baltimore, 2002.

15 Glascoe FP, Dworkin PH. The role of parents in the detection of developmental and behavioral problems. Pediatrics. 1995;95:829-836.

16 Regalado M, Halfon N. Primary care services promoting optimal child development from birth to age 3 years: review of the literature. Arch Pediatr Adolesc Med. 2001;12:1311-1322.

17 Dworkin PH. British and American recommendations for developmental monitoring: The role of surveillance. Pediatrics. 1989;84:1000-1010.

18 Glascoe FP. Collaborating with Parents: Using Parents’ Evaluations of Developmental Status to Detect and Address Developmental and Behavioral Problems. Nashville: Ellsworth & Vandermeer, 1998.

18a Houston HL, Davis RH. Opportunistic surveillance of child development in primary care: is it feasible? (Comparative Study Journal Article). J R Coll Gen Pract. 1985;35(271):77-79.

19 Glascoe FP. Toward a model for an evidenced-based approach to developmental/behavioral surveillance, promotion and patient education. Ambul Child Health. 1999;5:197-208.

20 Rydz D, Shevell MI, Majnemer A, et al. Developmental screening. Child Neurol. 2005;20:4-21.

21 Glascoe FP. Do parents’ discuss concerns about children’s development with health care providers? Ambul Child Health. 1997;2:349-356.

22 Glascoe FP, Sandler H. The value of parents’ age estimates of children’s development. J Pediatr. 1995;127:831-835.

23 Pachter LM, Dworkin PH. Maternal expectations about normal child development in four cultural groups. Arch Pediatr Adolesc Med. 1997;151:1144-1150.

24 Glascoe FP. A re over-referralson developmental screening tests really a problem? Arch Pediatr Adolesc Med. 2001;155:54-59.

25 Smith PK: BCAP Toolkit: Enhancing Child Development Services in Medicaid Managed Care. Center for Health Care Strategies, 2005. (Available at: http://www.chcs.org/; accessed 10/13/06.)

26 Pinto-Martin J, Dunkle M, Earls M, et al. Developmental stages of developmental screening: Steps to implementation of a successful program. Am J Public Health. 2005;95:6-10.

27 Glascoe FP, Sievers P, Wiseman N: First Signs Model Program makes great strides in early detection in Minnesota: Clinicians and educators play major role in increased screenings. American Academy of Pediatrics’ Section on Developmental and Behavioral Pediatrics Newsletter. August, 2004. (Available at: www.dbpeds.org; accessed 10/13/06.)

28 Inkelas M, Regalado H, Halfon N: Strategies for integrating developmental services and promoting medical homes. National Center for Infant and Early Childhood Health Policy, 2005. (Available at: http://www.healthychild.ucla.edu; accessed 10/13/06.)

29 McKay K. Evaluating model programs to support dissemination. An evaluation of strengthening the developmental surveillance and referral practices of child health providers. J Dev Behav Pediatr. 2006;27(1 Suppl):S26-S29.

30 Kelly D: PRIDE. American Academy of Pediatrics’ Section on Developmental and Behavioral Pediatrics Newsletter. March, 2006 (Available at: www.dbpeds.org; accessed 10/13/06.)

31 McLearn KT, Strobino DM, Hughart N, et al. Narrowing the income gaps in preventive care for young children: Families in Healthy Steps. J Urban Health. 2004;81:206-221.

32 McLearn KT, Strobino DM, Minkovitz CS, et al. Developmental services in primary care for low-income children: Clinicians’ perceptions of the Healthy Steps for Young Children Program. J Urban Health. 2004;81:556-567.

33 Zuckerman B, Parker S, Kaplan-Sanoff M, et al. Healthy Steps: A case study of innovation in pediatric practice. Pediatrics. 2004;114:820-826.

34 Hampshire A, Blair M, Crown N, et al. Assessing the quality of child health surveillance in primary care. A pilot study in one health district. Child Health Care Dev. 2002;28:239-249.

7C. Assessment of Development and Behavior

TERRY. STANCIN, GLEN P. AYLWARD

“Assessment is a means to an end, not an end in itself.

—Jerome M. Sattler, 2001

Assessment of child development and behavior involves a process in which information is gathered about a child so that judgments can be made. This process generally includes a multistage approach, designed to gain sufficient understanding of a child so that informed decisions can be made.¹ In contrast to psychological testing (which includes the administration of tests), assessment is the process in which data from clinical sources and tools (including history, interviews, observations, formal and informal tests), preferably obtained from multiple perspectives, are interpreted and integrated into relevant clinical decisions.

Developmental and behavioral assessments may be conducted for several purposes.^1,² Screening involves procedures to identify children who are at risk for a particular problem and for whom there are available effective interventions. Diagnosis and case formulation procedures help determine the nature, severity, and causes of presenting concerns and often result in classification or a label. Prognosis and prediction methods result in generating recommendations for possible outcomes. Treatment design and planning assessment strategies aid in selecting and implementing interventions to address concerns. Treatment monitoring methods track changes in symptoms and functioning targeted by interventions. Finally, treatment evaluation procedures help investigators examine consumer satisfaction and the effectiveness of interventions.

The purpose of this chapter is to describe methods and tools for assessing children’s development and behavior. In accordance with current discussions within the child psychology literature,² we advocate the development of integrated evidence-based assessment strategies for childhood problems with emphasis placed on research concerning the reliability, validity, and clinical utility of commonly used measures in assessment and treatment planning of developmental and behavioral problems (i.e., what methods have been shown to be useful and valid for what purpose). We describe general information about clinical interviewing and observational methods required to conduct comprehensive child assessments (for more extensive discussions, see McConaughy³). To help guide the pediatric practitioner’s and researcher’s appropriate use of assessment results, we provide information on the range of methods used for assessing developmental abilities, intelligence and cognitive abilities, behavioral and emotional functioning, and specialized testing, including neuropsychological testing and measures of functional outcome. However, we do not attempt to address the complex manner in which information, obtained from different assessment data sources, is weighted and synthesized in the formulation of clinical judgments. The discussions of assessment tools is not meant to be all-inclusive—there are literally thousands of developmental and behavioral assessment measures in the literature—nor an endorsement of one instrument over others. Rather, it is a sampling the array of instruments available to clinicians and researchers (Table 7C-1). We present implications and recommendations for future research concerning measures of psychological assessment as they pertain to the field of developmental behavioral pediatrics.

TABLE 7C-1 Illustrative Behavioral and Developmental Assessment Methods

Method	Applications	Illustrative Methods
Structured/sem ¡structured interviews	Diagnostic assessments	Diagnostic Interview for Children-IV (DISC-IV)⁸ Diagnostic Interview for Children and Adolescents (DICA-IV)⁹
Structured/sem ¡structured interviews	Assessment and treatment planning	Comprehensive Assessment to Intervention System (CAIS)⁷ Child and Adolescent Psychiatric Assessment (CAPA)¹² Semistructured Parent Interview (SPI)³ Semistructured Clinical Interview for Children and Adolescents (SCICA)¹⁰
Standardized cognitive methods	Developmental assessments	Manual of Developmental Diagnosis ³² Cattell Infant Intelligence Scale ³¹ Bayley Scales of Infant and Toddler Development-Third Edition (BSID-III)²⁷ Battelle Developmental Inventory-Second Edition (BDI-2)³⁹ Mullen Scales of Early Learning (MSEL)⁴¹ Differential Ability Scales (DAS)⁴³ McCarthy Scales of Children’s Abilities (MSCA)⁴⁴
	Intelligence assessment	Kaufman Brief Intelligence Test, Second Edition (KBIT-2)⁴⁵ Stanford-Binet Intelligence Scale–Fifth Edition ²⁶ Stanford-Binet Intelligence Scales for Early Childhood–5 (SB-5)⁴⁶ Kaufman Assessment Battery for Children–Second Edition (KABC-II)⁴⁷ Wechsler Preschool and Primary Scale of Intelligence–Third Edition (WPPSI-III)⁴⁸ Wechsler Intelligence Scale for Children–Fourth Edition (WISC-IV)⁴⁹ Wechsler Abbreviated Scale of Intelligence (WASI)⁵⁰
	Achievement	Kaufman Test of Educational Achievement–II (KTEA-II)⁵² Peabody Individual Achievement Test–Revised (PIAT-R)⁵³ Peabody Individual Achievement Test–Normative Update ⁵⁴ Wechsler Individual Achievement Test–II (WIAT-II)⁵⁵ Wide Range Achievement Test–3 (WRAT-3)⁵⁶ Woodcock-Johnson III Tests of Achievement (WJ III)⁵⁸
	Neuro psychological assessments	Children’s Memory Scale (CMS)⁶⁰ NEPSY—A Developmental Neuropsychological Assessment (NEPSY)⁶¹ Behavior Rating Inventory of Executive Function (BRIEF)⁶² Wide Range Assessment of Memory and Learning (WRAML; 2nd edition: WRAML-2)^63,⁶⁴
Global behavior rating scales	Broad measures of pathology	Achenbach System of Empirically Based Assessment (ASEBA)^11,⁶⁷^–⁷¹ Caregiver completed: Child Behavior Checklist (CBCL/1½-5, CBCL/6-18) Teacher Report Form Youth Self-Report Form Behavior Assessment System for Children–Second Edition (BASC-2)⁷⁵ Parent Rating Scales Teacher Rating Scales Self-Report of Personality Infant-Toddler Social-Emotional Assessment Scale (ITSEA)⁷⁶ Minnesota Multiphasic Personality Inventory–Adolescent (MMPI-A)⁷⁸
Peer reports	Broad measure of pathology	Peer-Report Measure of Internalizing and Externalizing Behavior (PMIEB)⁸¹
Observational coding methods	Assessment of parent child interactions	Dyadic Parent-Child Interaction Coding System (DPICS)⁸³
Problem-specific questionnaires and rating scales	Depression	Children’s Depression Inventory (CDI)⁸⁷ Mood and Feeling Questionnaire (MFQ)⁸⁸ Reynolds Child Depression Scale (RCDS)⁸⁹ Reynolds Adolescent Depression Scale (RADS)⁹⁰ Children’s Depression Rating Scale Revised (CDRS-R)⁹¹ Preschool Feelings Checklist ⁹²
	Anxiety	Multidimensional Anxiety Scale for Children (MASC)⁹⁴ Social Phobia and Anxiety Inventory for Children (SPAI-C)⁹⁵ Social Anxiety Scale for Children (SAS-C)⁹⁶ and Social Anxiety Scale for Adolescents (SAS-A)⁹⁷ Revised Children’s Manifest Anxiety Scale (RCMAS)⁹⁸
	Attention-deficit/hyperactivity disorder (ADHD)	ADHD Rating Scale-IV ⁹⁹ Vanderbilt ADHD Diagnostic Parent Rating Scales ¹⁰⁰
	Autism spectrum disorders	Autism Diagnostic Interview-Revised (ADI-R)¹⁰⁶ Autism Diagnostic Observation Schedule (ADOS)¹⁰⁷ Social Communication Questionnaire (SCQ)¹⁰⁸ Childhood Autism Rating Scale (CARS)¹¹⁰
Family assessment methods	Parent and family assessment	Parenting Stress Index (PSI) 3^rd ed¹¹³
Functional outcome methods	Global functioning	Children’s Global Assessment Scale (CGAS)¹¹⁵ Child and Adolescent Functional Assessment Scale (CAFAS)¹¹⁶
	Adaptive behavior	Vineland Adaptive Behavior Scales (Vineland–II)¹¹⁷
	Health-related quality of life	PedsQL 4.0¹¹⁸

CASE ILLUSTRATIONS

The following case examples are referred to throughout the discussion of assessment methods:

Case 1: Jane is a 21-month-old (corrected age) girl who was born at 27 weeks’ gestation, with a birth weight of 850 g, having a grade III intraventricular hemorrhage, bronchopulmonary dysplasia, and hyperbilirubinemia. Her young, single mother resides in low-income housing and may have used cocaine during pregnancy. Her score on the revised Bayley Scales of Infant Development (BSID-II) Mental Developmental Index (MDI) was 90 at age 12 months (corrected age). Her developmental status is being evaluated at a high-risk infant follow-up clinic at this time to determine need for early intervention services.

Case 2: Rachel is a 15-year-old girl with mild cerebral palsy with no identified learning disorders who presents with depressed mood and falling grades in her ninth grade placement. Academic strengths have been language arts, but she has always been weak in math. Historically she has been a B and C student, but in her freshman year, she is in danger of failing math. Rachel complains about trouble getting work done this year, especially in algebra. Her mother is puzzled that Rachel has requested counseling.

Case 3: Jose is a 9-year-old third grader referred to the Developmental-Behavioral Pediatrics Clinic because of the following problems: poor academic performance, disruptive behavior, and trouble getting along with peers. His first language is Spanish, but he is considered fluent in English. He was born in Puerto Rico, and his parents do not speak or read English. A note from his teacher indicates concerns that Jose has a short attention span and fails to complete many assignments.

“WHAT MEASURE SHOULD I USE?”

Kazdin ⁴ noted that in clinical situations, this question suggests a misunderstanding of the assessment process, because it is unlikely that any one measure or method can suitably capture child functioning. Although some measures have been shown to be perform better than others, a single “gold standard” tool does not exist for assessing most aspects of children’s functioning. Valid child assessment often requires data from multiple sources, including interviews, direct observations, standardized parents’ and teachers’ rating scales, self-reports, background questionnaires, and standardized tests. Multiple methods are needed not only to evaluate different facets of problems but also because of the high rate of comorbidity in children with developmental and behavioral conditions. In clinical settings, methods should be tailored to address the specific referral questions and assessment goals; therefore, preordained “assessment batteries” should be avoided. Moreover, clinical assessments often have multiple goals, such both diagnosis and treatment planning. Diagnostic methods, shown to be evidenced-based (e.g., structured diagnostic interviews or rating scales), are often not helpful in treatment planning, whereas a functional analysis of impairment (i.e., identification of environmental contexts and socially valid target behaviors) are more useful.⁵ Different methods of data collection yield different information, and one is not inherently better than the other; each method contributes unique elements. Moreover, assessments must adopt a framework that maintains a correct developmental perspective, including use of methods and procedures that fit a child’s developmental stage.

INTERVIEWS

Clinical assessment interviews are face-to-face interactions with bidirectional influence for the purpose of planning, implementing, or evaluating treatment.³ The interview is a fundamental technique for gathering assessment data for clinical purposes and is considered by many clinicians to be an essential component. Interviews provide respondents the opportunity to offer personal reflections of concerns and historical events. Thoughts, feelings, and other private experiences are conveyed in conversation that is not readily obtainable in any other format. The interview often serves a dual purpose. Not only does a clinical interview provide valuable assessment data, but it also is probably the first opportunity for a clinician to begin to build a positive therapeutic relationship that is the foundation for effective behavioral change. In practice, most clinical assessment interviews use unstructured or semistructured formats in order to obtain detailed information about a particular presenting problem. Greater flexibility in interview formats is often desirable when the clinical goals include not just reaching a diagnosis but also establishing a therapeutic relationship with a family and developing a treatment plan.

An effective clinical interview needs to establish a condition of trust and rapport so that the interviewee can feel comfortable in divulging personal information.⁶ It is important to outline the purpose and nature of the interview at the outset and to discuss issues and limits of confidentiality. Effective interviewing requires listening skills, strategic use of open-ended and direct questions, and verbal and nonverbal empathic communications. The clinician needs to offer careful statements that reflect, paraphrase, reframe, summarize, and restate to verify accurate interpretation of client statements.⁶ At the same time, the clinician is gathering verbal and nonverbal information conveyed by the client. Most interviewers take notes during interviews.

Most clinical assessments of children begin with a parent interview, the content of which depends on its purpose. Interviewing in the context of a developmental-behavioral problem usually focuses on identification and analysis of parental concerns so that an intervention plan can be developed and implemented. Psychosocial interviews typically elicit parent perceptions about the specific nature of the problem (including antecedents and consequences of the problem), family relations and home situation, social and school functioning, developmental history, and medical history. A practical interview format that is well suited for primary care settings is the Comprehensive Assessment to Intervention System, developed by Schroeder and Gordon.⁷ This behaviorally oriented format clusters information in six areas for quick response: referral question, social context of question, general information about the child’s development and family, specifics of the concern and functional analysis of behavior, effects of the problem, and areas for intervention. Schroeder and Gordon used this system both in their telephone call-in service and in their pediatric psychology office practices.

Child interviews are generally viewed as an essential component of clinical assessments and can be conducted with children as young as age 3 years.³ Child clinical interviews are useful for establishing rapport, learning the child’s perspective of functioning, selecting targets for interventions, identification of the child’s strengths and competencies, and assessing the child’s view of intervention options. Moreover, child interviews offer an opportunity to observe the child’s behavior, affect, and interaction style directly. However, competent interviewing of children and adolescents interviews requires considerable skills and knowledge of development. For example, preschool children often respond better in interviews that the interviewer conducts while sitting at the child’s level on the floor or at a small table and with toys, puppets, and manipulative items. School-age children may end communication if they feel barraged by too many direct questions, especially if asked “why” about motives, or if questions are abstract or rhetorical. Adolescent interviews may require additional attention to matters of confidentiality, trust, and respect.

Interviews of children and adolescents may include a brief observational, descriptive report of clinician impressions, summarized as a behavioral observations or a mental status examination. Key areas of psychological functioning are examined, including general appearance and behavior (physical appearance, nonverbal behaviors, attitudes), emotional expression (mood and affect), characteristics of speech and language, form (how thoughts are organized) and content (e.g., delusions, obsessions, suicidal/homicidal ideation) of thought, perceptual disturbances (e.g., hallucinations, dissociation), cognition (orientations, attention, memory), and judgment and insight (developmentally appropriate).

Structured and Semistructured Diagnostic Interviews

Assessment data obtained from unstructured clinical interviews tend to vary considerably and are largely interviewer dependent. As a result, unstructured interviews have particularly poor reliability and validity. When the primary assessment goal is to provide a diagnosis or a specific judgment with high interassessor reliability, as would be desired in research studies on specific psychiatric diagnoses, standardized, structured psychiatric interviews are often preferable. Structured interviews contain specific, predetermined questions with a format designed to elicit information efficiently and thoroughly. Key questions are followed by specified branch questions with restricted, closed (“yes”/“no”) or brief responses.

An example of a structured interview is the National Institute of Mental Health Diagnostic Interview for Children—IV.⁸ This instrument is a highly structured interview with nearly 3000 questions designed to assess more than 30, psychiatric disorders and symptoms listed in the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV)^8a in children and adolescents aged 9 to 17 years. Parent and child versions in English and Spanish are available, and lay interviewers can administer it for epidemiological research. The Diagnostic Interview for Children and Adolescents⁹ is another structured diagnostic interview for children ages 6 to 17. This instrument consists of nearly 1600 questions that address 28 DSM-IV diagnoses relevant to children. Interrater reliability estimates of individual diagnoses range from poor to good, and diagnoses are moderately correlated with clinicians’ diagnoses and self-rated measures.

Structured interviews result in higher interrater (or interobserver) reliability because there is little opportunity for the interviewer to influence the content of data collected. Although sometimes considered to be the “gold standard” for psychiatric diagnostic and epidemiological research, standardized interviews are not impervious to reporter bias. In addition, structured diagnostic interviews tend to rely on DSM-IV symptoms which may not be developmentally appropriate, particularly for very young children. Moreover, structured diagnostic interviews may take 1 to 3 hours to complete, which renders them impractical for most clinical settings, especially because they typically do not assess background and family factors that are necessary for developing and implementing an intervention plan.

Semistructured interviews combine aspects of traditional and behavioral interviewing techniques. Specific topic areas and questions are presented, but, in contrast to structured interviews, more detailed responses are encouraged. Semistructured formats also support use of empathic communication described previously (e.g., reflecting, paraphrasing). For example, the Semistructured Parent Interview ³ contains sample questions organized around six topic areas: concerns about the child (open ended), behavioral or emotional problems (eliciting elaboration to begin a functional analysis of behavior), social functioning, school functioning, medical and developmental history, and family relations and home situations. Like other semistructured formats, the Semistructured Parent Interview encourages parent interviews built around a series of open-ended questions to introduce a topic, followed by more focused questions about specific areas of concern.

The Semistructured Clinical Interview for Children and Adolescents (SCICA)¹⁰ is an interview designed for children aged 6 to 16. It is part of the Achenbach System of Empirically Based Assessment (ASEBA)¹¹ and was designed to be used separately or in conjunction with other ASEBA instruments (e.g., Child Behavior Checklist [CBCL], Teacher Report Form). The SCICA contains a protocol of questions and procedures assessing children’s functioning across six broad areas: (1) activities, school, and job; (2) friends; (3) family relations; (4) fantasies; (5) self-perception and feelings; and (6) problems with parent/teacher. There are additional optional sections pertaining to achievement tests, screening for motor problem, and adolescent topics (e.g., somatic complaints, alcohol and drug abuse, trouble with the law). Interview information (observations and self-report) are scored on standardized rating forms and aggregated into quantitative syndrome scales and DSM-IV—oriented scales. Test-retest, interrater, and internal consistency evaluations indicate excellent to moderate estimates of reliability. Accumulating evidence for validity of the SCICA includes content validity, as well as criterion-related validity (ability to differentiate matched samples of referred and nonreferred children).

The Child and Adolescent Psychiatric Assessment ¹² is another semistructured diagnostic interview for children and adolescents aged 9 to 17. One interesting feature of this instrument is the inclusion of sections assessing functional impairment in a number of areas (e.g., family, peers, school, and leisure activities), family factors, and life events.

Motivational Interviewing

Motivational interviewing is an empirically supported interviewing approach gaining considerable attention in medical and mental health settings. More than an assessment strategy, motivational interviewing is a brief, client-centered directive intervention designed to enhance intrinsic motivation for behavior change through the exploration and reduction of patient ambivalence.¹³ Based on a number of social and behavioral principles, including decisional balance, self-perception theory, and the transtheoretical model of change,¹⁴ motivational interviewing combines rogerian and strategic techniques into a directive and yet patient-centered and collaborative encounter. Assessment from a motivational interviewing perspective involves addressing the patient’s ambivalence about making a change in behavior, exploring the negative and positive aspects of this choice, and discussing the relationship between the proposed behavior change (e.g., compliance with mediations) and personal values (e.g., health). This information is elicited in an empathic, accepting, and nonjudgmental manner and is used by the patient to select goals and create a collaborative plan for change with the provider.

The effectiveness of motivational interviewing with children and young adolescents has not been established. However, there is emerging evidence of its utility with adolescents and young adults, particularly in the areas of risk behavior, program retention, and substance abuse.^15,¹⁶

TESTING METHODS: DEVELOPMENTAL AND COGNITIVE

Infancy and Early Childhood

Since the 1980s, there has been increased interest in the developmental evaluation of infants and young children.^17,¹⁸ This began with the 1986 Education of the Handicapped Act Amendments (Public Law 99-457) and continues with the Individuals with Disabilities Education Improvement Act of 2004 (Public Law 108-446), a revision of the Individuals with Disabilities Education Act (IDEA). These laws involve provision of early intervention services and early childhood education programs for children from birth through 5 years of age. Developmental evaluation is necessary to determine whether children qualify for such intervention services. Part C of the IDEA revision (Section 632) delineates five major areas of development: cognitive, communication, physical, social-emotional, and adaptive. However, definitions of delay vary, criteria being set on a state by state basis. These can included a 25% delay in functioning in comparison with same-aged peers, 1.5 to 2.0 standard deviations below average in one or more areas of development, or performance on a level that is a specific number of months below a given child’s chronological age. However, pressure to quantify development has caused professionals working with infants and young children to attribute a degree of preciseness to developmental screening and assessment that is neither realistic nor attainable. Additional problems include test administration by examiners who are not adequately trained and use of instruments that have varying degrees of psychometric rigor.¹⁹ Nonetheless, developmental evaluation is critical, because timely identification of children with developmental problems affords the opportunity for early intervention, which enhances skill acquisition or prevents additional deterioration.

Again, choice of the type of developmental assessment that is administered is driven by the purposes of the evaluation: for example, determination of eligibility for early intervention or early childhood education services, documentation of developmental change after provision of intervention, evaluation of children who are at risk for developmental problems because of established biomedical or environmental issues, documentation of recovery of function, or prediction of later outcome. Assessment of infants and young children is in many ways unique, because it occurs against a backdrop of qualitative and quantitative developmental, behavioral, and structural changes, the velocity of change being greater during infancy and early childhood than at any other time. The rapidly expanding behavioral repertoire of the infant and young child and the corresponding divergence of cognitive, motor, and neurological functions pose distinct evaluation challenges.^18,¹⁹

Another significant testing concern in this age range is test refusal.²⁰ Test refusal, where a child either declines to respond to any items, or eventually stops responding when items become increasingly difficult, occurs in 15% to 18% of preschoolers.²¹^–²⁴ Occasional refusals occur in 41% of young children. In addition to the immediate ramifications problematic test-taking behaviors have on actual test scores, there is evidence that early high rates of refusals are associated with similar behaviors at later ages, and with lower intelligence, visual perceptual, neuropsychological, or behavioral scores in middle childhood.²²^–²⁵ Non-compliance has been reported to occur in verbal production tasks, gross motor activities, or toward the end of the testing session, and it occurs more in children born at biologic risk or those from lower socioeconomic households. Children who refuse any aspect of testing differ from those who refuse some items, or who are compliant and cooperative to a certain point and then refuse more difficult items. This situation prompted inclusion of the Test Observation Checklist (TOC) in the Stanford-Binet Scales for Early Childhood, 5th Edition (SB5).²⁶

A distinction is often made between developmental tests and intelligence tests,²⁷ and both are used in the age range under discussion. The assessment of intelligence originated from the need to determine which children would be able to learn in a classroom and which would be mentally deficient. In fact, this was the original purpose of the Binet test. Intelligence tests have become more psychometrically sophisticated but still assess different facets of primary cognitive abilities such as reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory. In contrast, the purpose of early developmental measures such as the Bayley Scales of Infant Development (BSID)²⁸ or the Gesell Developmental Schedules²⁹ was to be diagnostic of developmental delays, providing a benchmark of developmental acquisitions (or lack thereof) in comparison to same-aged peers. Nonetheless, this distinction is often blurred, perhaps because there is no specific age at which a child shifts from “development” to “intelligence” (although the culmination of the infancy period is often indicated), nor is there a clear-cut transformation from a delay to a deficit. Developmental tests also tend to include motor and social-adaptive skills. Both tests of development and intelligence are driven by the theoretical model of the test developer and the constructs measured by the test. Those that assess the former are considered more dynamic or fluid; those that assess intelligence are more consistent and predictive. Herein, we discuss both developmental and intelligence tests that are used in children in this age level.

Developmental Assessment Instruments

GESELL DEVELOPMENTAL SCHEDULES/CATTELL INFANT INTELLIGENCE TEST

The Gesell Developmental Schedules ^29,³⁰ and the Cattell Infant Intelligence Test³¹ are the oldest developmental test instruments and exemplify the blurring of developmental and intelligence testing boundaries. The most recent version of the former is Knobloch and associates’ Manual of Developmental Diagnosis (for children aged 1 week to 36 months).³² Gesell specified key ages at which major developmental acquisitions occur: 4, 16, 28, and 40 weeks and 12, 18, 36, and 48 months. Gross motor, fine motor, adaptive, language, and personal-social areas are assessed, with 1 to 12 items at each age. A developmental quotient is computed for each area with the formula maturity age level/chronological age ×100. The Cattell test is essentially an upward extension of the Gesell schedule over the first 21 months and a downward extension of early versions of the Stanford-Binet tests from age 22 months and older (the Cattell age range is 2 to 36 months). A major drawback of both instruments is the limited standardization sample size (e.g., 107 for the Gesell schedule, 274 for the Cattell test). As a result, neither is used frequently at this time, although the Cattell test does yield so-called IQ scores below 50 (the floor of the BSID).

BAYLEY SCALES OF INFANT DEVELOPMENT ^27,^28,³³

The original BSID ²⁸ evolved from versions administered to infants enrolled in the National Collaborative Perinatal Project. It was the reference standard for the assessment of infant development, administered to infants 2-30 months of age. The BSID was theoretically eclectic and borrowed from different areas of research and test instruments. It contained three components—the MDI, the Psychomotor Developmental Index (PDI), and the Infant Behavior Record (M = 100, SD = 16)—and was applicable for children aged 2 to 30 months. The BSID subsequently was revised as the BSID-II,³³ this partly because of the upward drift of approximately 11 points on the MDI and 10 points on the PDI, reflecting the Flynn effect³⁴ (M = 100, SD = 15). As a result, the BSID-II scores were 12 points lower on the MDI and 10 points lower on the PDI in comparison with the original BSID.³⁵ The Behavior Rating Scale was developed to enable assessment of state, reactions to the environment, motivation, and interaction with people. The age range for the BSID-II was expanded to 1 to 42 months. Unfortunately, this instrument had 22 item sets and basal and ceiling rules that differed from the original BSID. These rules were controversial in that if correction is used to determine the item set to begin administration, or if an earlier item set is employed because of developmental problems, scores tend to be somewhat lower, because the child is not automatically given credit for passing the lower item set. It was also criticized because it did not provide area scores compatible with IDEA requirements such as cognitive, motor communication, and social and adaptive function.³⁵

For the newest version of the BSID, the Bayley Scales of Infant and Toddler Development—Third Edition (BSID-III),²⁷ norms were based on responses of 1700 children. The BSID-III assesses development (at ages 1 to 42 months) across five domains: cognitive, language, motor, social-emotional, and adaptive. Like its predecessors, the BSID-III is a power test. Assessment of the first three domains is accomplished by item administration, whereas the latter two are evaluated by means of caregiver’s responses to a questionnaire. A Behavior Observation Inventory is completed by both the examiner and the caregiver. The Language scale includes a Receptive Communication and an Expressive Communication scaled score; the Motor Scales includes a Fine Motor and a Gross Motor score. The BSID-III Social-Emotional Scale is an adaptation of the Greenspan Social-Emotional Growth Chart: A Screening Questionnaire for Infants and Young Children.³⁶ The Adaptive Behavior Scale is composed of items from the Parent/Primary Caregiver Form of the Adaptive Behavior Assessment System—Second Edition;³⁷ it measures areas such as communication, community use, health and safety, leisure, self-care, self-direction, functional preacademic performance, home living, and social and motor skills and yields a General Adaptive Composite score.

Scaled scores (M = 10, SD = 3), composite scores (M = 100, SD = 15), percentile ranks, and growth scores are provided, as are confidence intervals for the scales and age-equivalent scores for subtests. Growth scores are new and, with caution, are used to plot the child’s growth over time for each subtest in a longitudinal manner. This metric is calculated on the basis of the subtest total raw score and ranges from 200 to 800 (M = 500, SD = 100). As in the original BSID, there are basal rules (passing the first three items at the appropriate age starting point) and a ceiling or discontinue rules (a score of 0 for five consecutive items).

The correlation between the BSID-III Language Composite and the BSID-II MDI is 0.71; that between the Motor Composite and the BSID-II PDI is 0.60; and that between the Cognitive Composite and the BSID-II MDI is 0.60. The moderate correlation between the older PDI and MDI and their BSID-III counterparts underscores the significant differences between the old and new BSIDs. However, in contrast to the expected Flynn effect (see Chapter 7A and Flynn³⁴), the BSID-III Cognitive and Motor composite scores are approximately 7 points higher than the corresponding BSID-II MDI and PDI. This phenomenon has also been reported with the Peabody Picture Vocabulary Test—Third Edition,³⁸ and the Battelle Developmental Inventory—Second Edition³⁹ (Box 7C-1).

BOX 7C-1 CASE 1: DEVELOPMENTAL ASSESSMENT DISCUSSION

The toddler in Case 1 was given a developmental assessment that included the BSID-III. Results are shown in the table below.

BSID-III results indicate that the child had average cognitive abilities, low-average language skills, borderline motor abilities (Gross Motor worse than Fine Motor scores), low-average social-emotional functioning, and borderline adaptive skills. Her low average language may be influenced by the nonoptimal environment; the motor deficits are most likely attributable to the grade III bleed. The Cognitive composite score is 5 points higher than the previous BSID-II MDI score that the child had received at age 1; this is in contrast to the Flynn effect (whereby scores generally increase 0.5 points per year) but is within the 7-point increment that is found when the BSID-II and BSID-III scores are compared (BSID-III scores are somewhat higher than BSID-II scores). On the basis of these data, early intervention services geared toward language and adaptive skills are recommended. Moreover, the motor deficits will require occupational and physical therapy services.

McCARTHY SCALES OF CHILDREN’S ABILITIES (MSCA)⁴⁴

The MSCA essentially bridges developmental and IQ tests.¹⁷ It is most useful in the 3- to 5-year age range (age range, 2½ to 8½ years). Some clinicians would question viewing the MSCA as a developmental test; however the term IQ was avoided initially, with the test considered to measure the child’s ability to integrate accumulated knowledge and adapt it to the tasks of the scales. Eighteen tests in total are divided into Verbal (five tests), Perceptual-Performance (seven tests), Quantitative (three), Memory (four tests), and Motor (five) categories. Several tests are found on two scales. The Verbal, Perceptual-Performance, and Quantitative scales are combined to yield the General Cognitive Index (M = 100, SD = 16; 50 is the lowest score). The mean scale standard score (T-score) for each of the five scales is 50 (SD = 10). The MSCA is attractive because in enables production of a profile of functioning (with age-equivalent scores) and it includes motor abilities; conversely, the test was devised in 1972, and hence there is inflation of scores vis-à-vis the Flynn effect (i.e., increments in test norms over time result in lower scores on newer tests than those obtained on measures with older norms; see Chapter 7A for a discussion of the Flynn effect³⁴). Short forms of the MSCA are available, but these are not useful in the younger age ranges.¹⁷

Intelligence Assessment Instruments

KAUFMAN BRIEF INTELLIGENCE TEST, SECOND EDITION (KBIT-2)⁴⁵

The KBIT-2 was released 14 years after the original Kaufman Brief Intelligence Test and is applicable for ages 4 to 90 years. It is particularly useful as an estimate of IQ, for screening, and in time-limited situations. The test produces Verbal, Nonverbal, and Composite IQ scores (M = 100, SD = 15), as well as 90% confidence intervals, age-equivalent scores, and scaled scores for two of the three subtests. The Verbal scale consists of two subtests: Verbal Knowledge (60 items measuring both receptive vocabulary and range of general information; child points to the picture matching the word or question) and Riddles (48 items measuring verbal comprehension, reasoning, vocabulary knowledge, and deductive reasoning, based on two or three clues). The Riddles subtest replaces the Definitions from the original Kaufman Brief Intelligence Test, thereby circumventing reading. Matrices is the nonverbal scale (46 items with meaningful stimuli [people, objects] and abstract stimuli [designs, symbols]). Discrepancies between Verbal and Nonverbal scores are of interest. The KBIT-2 Verbal score is approximately 1 point lower than that of the original Kaufman Brief Intelligence Test, the KBIT-2 Nonverbal score is 3 points lower, and the KBIT-2 Composite is, on average, 2 points lower. The KBIT-2 composite score is typically within 2 points of the Wechsler Intelligence Scale for Children—Fourth Edition (WISC-IV), composite score, and correlations with the Verbal Comprehension Index, Perceptual Reasoning Index, and the Full Scale IQ (FSIQ) are 0.79, 0.56, and 0.77, respectively.

STANFORD-BINET INTELLIGENCE SCALES, FIFTH EDITION/STANFORD-BINET INTELLIGENCE SCALES FOR EARLY CHILDHOOD—5 (EARLY SB5)^26,⁴⁶

The 10 subtests of the Early SB5 are drawn from the SB5, and the norms are derived from approximately 1660 children aged 7 years 3 months or younger. The test is applicable from age 2 to 7¼ years (the SB5 extends to adulthood). The 10 subtests constitute the FSIQ, and various combinations of these subtests constitute other scales. An Abbreviated Battery IQ scale consists of two routing subtests: Object Series/Matrices and Vocabulary. Routing subtests enable the examiner to know the level at which to begin subsequent subtests. The Nonverbal IQ scale consists of five subtests measuring the factors of nonverbal fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory. The Verbal IQ scale is composed of five subtests measuring verbal ability domains in the same five factor areas as for the Nonverbal IQ scale. The Early SB5 also includes the Test Observation Checklist. The test differs markedly from the fourth edition of the Stanford-Binet Intelligence Tests. Nonverbal IQ, Verbal IQ, and FSIQ scores are obtained (M = 100, SD = 15), as are total factor index scores (sum of verbal and nonverbal scaled scores) for fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory; scaled scores (M = 10, SD = 3) can be computed for each of the nonverbal and verbal domains. Optional change-sensitive scores and age-equivalent scores are also computed. The SB5 FSIQ is approximately 3.5 points lower than the that of the fourth edition. The SB5 FSIQ is approximately 5 points lower than the FSIQ for the Wechsler Intelligence Scale for Children—Third Edition (WISC-III).

KAUFMAN ASSESSMENT BATTERY FOR CHILDREN—SECOND EDITION ⁴⁷

This battery, with norms based on scores from 3025 children, is applicable in children aged 3 to 18 years (the original Kaufman Assessment Battery for Children ceiling was 12) and contains 18 core and supplementary subtests (the number of core and supplementary tests administered varies, depending on age). It is similar to the original battery in that there is a simultaneous and sequential processing approach, vis-à-vis the Luria neuropsychological model. However, the test also uses the Cattell-Horn-Carroll abilities model that includes fluid crystallized intelligence. As a result, interpretation is based on the model that is selected; the number of scales produced is also model-dependent. The five areas assessed include (1) simultaneous processing (eight subtests; e.g., triangles, face recognition, pattern reasoning, block counting, gestalt closure), (2) sequential processing (word order, number recall, hand movements), (3) planning (a new scale applicable for ages 7 to 18; includes pattern reasoning, story completion), (4) learning (four subtests, e.g., Atlantis, Rebus), and (5) knowledge (optional and only for the Cattell-Horn-Carroll model; includes riddles, verbal knowledge, and expressive vocabulary, some of which were previously achievement tests).

For subjects at age 3 years, a Mental Processing Index (from the Luria model) and a Fluid Crystallized Index (FCI-from the Cattell-Horn-Carroll model) are derived. For children by age 7 years, the full array of scores can be derived; this includes the Mental Processing Index, a Global Score, a Fluid-Crystallized Index, and a Nonverbal Index (four or five subtests, depending on age, and including language-reduced instructions and nonverbal responses). The number of core subtests for the Cattell-Horn-Carroll mode is 7 to 10, depending on age, and the number of core subtests for the Luria approach is 5 to 8. Subtest scale scores have a mean of 10 (SD = 3); the index score mean is 100 (SD = 15). As with the SB5 and WISC-IV, intraindividual differences can be computed.

WECHSLER PRESCHOOL AND PRIMARY SCALE OF INTELLIGENCE—THIRD EDITION (WPPSI-III)⁴⁸

Whereas the Wechsler Preschool and Primary Scale of Intelligence—Revised was a downward extension of the Wechsler Intelligence Scale for Children, this is not the case with the WPPSI-III. The current version, with norms based on scores of 1700 children, contains 14 subtests (7 new, 7 revised) and has two age ranges: from 2 years 6 months to 3 years 11 months and from 4 years 0 months to 7 years 3 months. In the first age range, FSIQ, Verbal IQ, and Performance IQ scores are obtained, through four core subtests. Seven core subtests are applicable to the second age range. Supplemental and optional subtests are used to obtain a General Language Composite in the younger children and a Processing Speed Quotient in the older children. Inclusion of the Picture Concepts, Matrix Reasoning, and Word Reasoning subtests allows for better assessment of fluid reasoning. For IQ and composite scores, M = 100 and SD = 15; for scaled scores, M = 10, SD = 3. Children tested with the WISC-III and the WPPSI-III at overlapping ages had a WISC-III FSIQ score that was, on average, 4.9 points higher than the WPPSI-III FSIQ score; correlations with the BSID-II MDI score were 0.80; those with the Differential Ability Scales General Conceptual Ability composite were 0.87. As in many of the newer IQ tests, various composite scores allow for testing of more specific cognitive abilities and better interpretation of findings.

WECHSLER INTELLIGENCE SCALE FOR CHILDREN—FOURTH EDITION ⁴⁹

The WISC-IV, with norms based on responses from 2200 children, is applicable to ages 6 years 0 months to 16 years 11 months, and contains 15 subtests (10 core, 5 supplementary). The Verbal IQ and Performance IQ scores of the WISC-III are no longer used. Gone also are the Picture Arrangement, Object Assembly, and Mazes subtests from the WISC-III, to decrease the emphasis on performance time. Instead, the WISC-IV contains a Verbal Comprehension Index (Similarities, Vocabulary, Comprehension, Information,^* and Word Reasoning^*), a Perceptual Reasoning Index (Block Design, Picture Concepts, Matrix Reasoning, Picture Completion^*), a Working Memory Index (Digit Span, Letter-Number Sequencing, Arithmetic^*), and a Processing Speed Index (Coding, Symbol Search, Cancellation^*). In addition to these four index scales, a measure of general intellectual function (FSIQ) is produced. The more narrow domains and emphasis on fluid reasoning reflect contemporary thinking with regard to intelligence per se. For index and FSIQ scores, M = 100 and SD = 15; the mean scaled score is 10 (SD = 3). The WISC-IV is highly correlated with WISC-III indexes (rs = 0.72 to 0.89). The FSIQ score is approximately 2.5 points less than that of its predecessor; the Verbal Comprehension Index score is 2.4 points less than the WISC-III Verbal IQ score; the Perceptual Reasoning Index score is 3.4 points less than the Performance IQ score; the Working Memory Index score is 1.5 points lower than the Freedom from Distractibility Index score; and the Processing Speed Index score is 5.5 points lower than its WISC-III counterpart. In comparison with the Wechsler Abbreviated Scale of Intelligence (WASI) (described next), the WISC-IV FSIQ score is 3.4 points lower, the Verbal Comprehension Index score is 3.5 points lower than the WASI Verbal IQ, and the Perceptual Reasoning Index score is 2.6 points lower. A General Ability Index (containing three verbal comprehension and three perceptual reasoning subtests), can be computed; this is less sensitive to the influence of working memory and processing speed and therefore is useful with children who have learning disabilities or attention-deficit/hyperactivity disorder (ADHD) (Box 7C-2).

BOX 7C-2 CASE 2: COGNITIVE ASSESSMENT DISCUSSION

Because of concerns related to academic ability and performance, Rachel was administered the WISC-IV. These results revealed that Rachel’s cognitive abilities have developed very unevenly (probably in relation to underlying cerebral palsy). Her verbal comprehension abilities are within the high average range and represent a significant strength for her. Significant weaknesses are perceptual reasoning and processing speed, which are in the borderline range of functioning. Tasks that required abstract perceptual reasoning were particularly difficult for her. Despite cognitive weaknesses, Rachel’s cluster scores on the Woodcock-Johnson III Tests of Achievement were all in the average range or better. This suggests that she has been able to use her verbal abilities to compensate for weaknesses in other areas. However, she has struggled in some academic subject areas, especially algebra, as the content has become more abstract.

Academic Achievement Assessment Results on the Woodcock-Johnson III Tests of Achievement, Form B (Actual Grade: 9)
Cluster	Standard Score	Grade Equivalent
Oral Language	113	13.3
Total Achievement	103	10.4
Broad Reading	106	11.0
Broad Math	93	7.9
Broad Written Language	108	12.9
Math Calculation Skills	97	8.9
Written Expression	101	9.9
Academic Skills	106	11.5
Academic Fluency	104	10.9
Academic Applications	94	8.0
Academic Knowledge	110	13.9

WECHSLER ABBREVIATED SCALE OF INTELLIGENCE ⁵⁰

The WASI is applicable to ages 6 years 0 months through 89 years. Verbal IQ, Performance IQ, and either FSIQ-4 (with four subtests) or FSIQ-2 (two subtests) scores are obtained. Although subtests are similar to those found in other Wechsler scales, the actual items differ. Subtests include Vocabulary, Matrices, Block Design, and Similarities (the first two are used to compute the FSIQ-2). T-scores are used for subtests (M = 10, SD = 5). The WASI is very useful in both clinical and research settings, because of its reduced administration time. The downside is a reduction in the amount of information obtained, particularly in terms of more specific indexes of cognitive abilities. The scores are generally a few points higher than those of more detailed tests, but they still are comparable; the correlation between the FSIQ-2 and WISC-IV FSIQ scores is 0.86; between the FSIQ-4 and WISC-IV scores, 0.83 (comparable with the correlation among the WISC-III and WISC-IV FSIQ scores). Very small differences are noted on the subtest level.

Achievement Testing

Use of individually-administered achievement tests has increased dramatically since the introduction of Public Law 94-142 (Education of All Handicapped Children Act), and these tests continue to be a critical component in the evaluation of children with academic difficulties under the IDEA revision of 2004. The major reason is that achievement tests enable the delineation of aptitude-achievement discrepancies, a hotly debated requirement for establishment of a learning disability (versus response to treatment intervention). It is assumed that such tests identify children who need special instructional assistance; help recognize the nature of a child’s difficulties/deficiencies, thereby clarifying the nature of the learning problem; and assist in planning, instruction, and intervention. Unfortunately, achievement tests do not adequately meet these needs. In general, standard scores (with percentiles) are the most precise metric; age and grade-equivalent scores are least useful. With regard to the Wechsler tests, the Verbal IQ (or Verbal Comprehension Index) and FSIQ are most highly correlated with achievement, particularly reading; the Performance IQ (Perceptual Reasoning Index), with mathematics.⁵¹ Achievement tests differ in terms of content and type of response required (e.g., multiple choice vs. recall of information), and these differences sometimes cause one test to produce lower scores than another.

KAUFMAN TEST OF EDUCATIONAL ACHIEVEMENT—II ⁵²

This test is available in two formats: the Comprehensive form (with parallel forms A and B) and the Brief form. The mean score is 100 (SD = 15). Noteworthy is the fact that this test’s norms were based on the scores of the same population as for the Kaufman Assessment Battery for Children—Second Edition. The Comprehensive form, applicable from ages 4 years 6 months to 25, assesses reading (letter/word recognition, comprehension), math (computation, concepts and application), written language (spelling, written expression), and oral language (listening comprehension and written expression). Several reading-related skill areas are also assessed (e.g., phonological awareness). The Brief form (for ages 4 years 6 months to 90 years) measures reading (word recognition and comprehension), math computation and application problems, and written expression (written language and spelling) and yields a battery composite score as well. Age- and grade-equivalent scores are provided. The test differs significantly from the original Kaufman Test of Educational Achievement and from the version with normative data update.

PEABODY INDIVIDUAL ACHIEVEMENT TEST—REVISED—NORMATIVE UPDATE ^53,⁵⁴

This test is applicable for kindergarten through grade 12 (ages 5 to 19). It differs from others in that spelling and math are presented in a multiple-choice format and in other subtests such as reading comprehension, the student selects a picture that best illustrates the sentence that was read. The test includes scores for general information, reading recognition, reading comprehension, total reading, math, spelling, written expression, written language, and the total test. The normative update version is the same test, with updated norms. Some clinicians argue that the multiple-choice format may yield higher test scores because of the recognition, as opposed to recall, format.

WECHSLER INDIVIDUAL ACHIEVEMENT TEST—II ⁵⁵

This test is applicable for prekindergarten through college (ages 4 to 85). This is an updated form of the original Wechsler Individual Achievement Test. There are four composite scores: (1) Reading (word reading, pseudoword decoding, reading comprehension); (2) Mathematics (numerical operations, math reasoning); (3) Written Language (spelling, written expression); and (4) Oral Language (listening comprehension, oral expression). Standard scores (M = 100, SD = 15), age or grade-equivalent scores, and quartile scores are reported. Reading rate can also be assessed, and the test form includes qualitative observational descriptions for various subtests. The test is linked to Wechsler IQ tests, and aptitude/achievement discrepancy tables are included.

WIDE RANGE ACHIEVEMENT TEST—3 ⁵⁶

This is the seventh edition of the Wide Range Achievement Test and is applicable for ages 5 to 75 years. There are two equivalent forms (Blue, Tan) and each contains reading (read letters, pronounce words), spelling (write letters, words from dictation) and arithmetic (40 computation problems) tests. The test is based on norms by age and not grade. Critics of this test argue that it is outdated and provides very gross estimates of academic achievement because it contains few items within each content area; conversely, it is easy and quick to administer. An Expanded Version is also available ⁵⁷ that contains a group (G) form with reading/reading comprehension, math, and nonverbal reasoning (some tests are multiple choice), and an individual (I) form that assesses reading, mathematics, listening comprehension, oral expression, and written language. The Expanded Version group form is applicable to grades 2 to 12; the Individual form, to ages 5 to 24.

WOODCOCK-JOHNSON III TESTS OF ACHIEVEMENT (WJ III)⁵⁸

The WJ III has two parallel forms (A and B) that are divided into a standard battery (12 subtests) and an extended battery (10 tests); therefore, there are 22 subtests in all. The latter provides the opportunity for more in-depth diagnostic evaluation of specific academic functions (e.g., word attack, oral comprehension). The WJ III contains a reading cluster, an oral language cluster, a math cluster, a written language cluster, and an academic knowledge cluster. Clusters are designed to correspond with IDEA areas. The standard battery provides 10 cluster scores, and the extended battery provides an additional 9 cluster scores. Broad reading, broad math, and broad written language are often used to provide an overview of the child’s achievement. The WJ III is used by many school systems. Of note is the fact that the WJ III Tests of Achievement norms were based on the scores of the same population as those of the WJ III Tests of Cognitive Abilities and are designed to be used in combination. Standard scores (M = 100, SD = 15), percentile scores, and age and grade-equivalent scores are the most helpful metrics. Computer scoring is necessary.

Neuropsychological Testing

There are three approaches to neuropsychological testing of children, and all involve the assessment of brain-behavior relationships. The first approach entails modification of traditional neuropsychological batteries such as the Halstead-Reitan Neuropsychological Battery or the Luria-Nebraska Neuropsychological Battery, to form corresponding children’s batteries.⁵⁹ The second approach involves interpretation of standard tests such as those measuring intelligence, with the use of a neuropsychological “mind-set.” In this case, results from standardized tests are tied into neuropsychological constructs and functions (e.g., the Kaufman Assessment Battery for Children—Second Edition). The third approach includes tests or rating scales designed to assess specific areas of neuropsychological function. Neuropsychological testing generally is more specific in terms of pinpointing strengths and deficits, and the results more precisely describe brain-behavior relationships. Neuropsychological testing may elucidate more subtle problems that contribute to cognitive, academic, or social difficulties; these problems may not be apparent from results of more routine measures used to detect learning disabilities. Noteworthy is the fact that standard intellectual assessment is typically part of a neuropsychological workup. Selected tests from this third approach are discussed as follows.

CHILDREN’S MEMORY SCALE ⁶⁰

The Children’s Memory Scale assesses learning and memory function with nine subtests. There are two levels: one for ages 5 to 8 years and one for ages 9 to 16 years. The Children’s Memory Scale includes three domains: Auditory/Verbal, Visual/Nonverbal, and Attention/Concentration, each with two core and one supplemental test. The first two domains have an immediate-memory component and a delayed-memory component (tested 30 minutes later). Eight index scores are produced: verbal immediate, verbal delayed, delayed recognition, learning visual immediate, visual delayed, attention/concentration, and general memory (global memory function). Core subtests include: Stories, Word Pairs, Dot Locations, Faces, Numbers, and Sequences. Word Lists, Family Pictures, and Picture Locations are the supplementary tests. The general memory score is moderately correlated with IQ scores.

NEPSY—A DEVELOPMENTAL NEUROPSYCHOLOGICAL ASSESSMENT (NEPSY)⁶¹

The NEPSY is based on Luria’s theoretical model,⁵⁹ is applicable for ages 3 to 12 years, and consists of 27 subtests that encompass five domains: (1) Attention and Executive Functions (e.g., Tower test, Auditory Attention and Response Set, Visual Attention); (2) Language (Speeded Naming, Comprehension of Instructions, Phonological Processing); (3) Sensorimotor Functions (e.g., Fingertip Tapping, Visuomotor Precision); (4) Visuospatial Functions (Design Copying, Arrows, Block Construction); and (5) Learning and Memory (e.g., Memory for Faces, Names, Sentence Repetition). There is an 18-subtest core assessment. In general, each domain contains five to six subtests. Subtest scaled scores are obtained (M = 10, SD = 3), and these can be combined into summary domain scores (M = 100, SD = 15). Correlations with the Children’s Memory Scale range from 0.36 to 0.60.

BEHAVIOR RATING INVENTORY OF EXECUTIVE FUNCTION (BRIEF)⁶²

Executive function is an umbrella construct that refers to interrelated neuropsychological functions that are responsible for purposeful, problem-solving, goal-directed behavior. Executive function is involved in guiding, directing, regulating, and managing cognitive, behavioral, and emotional functions. The BRIEF measures executive function in an ecological manner: namely, it is a questionnaire given to parents and/or teachers, thereby assessing executive function in home and school environments. The BRIEF is applicable for school-aged children (5 to 18 years), although a preschool version is also available (BRIEF-P). In addition, a BRIEF-SR (self-report) version has become available for ages 11 to 18 years, requiring a fifth grade reading level. Each version consists of 86 items scored “never” (1), “sometimes” (2), or “often” (3). There are eight clinical scales: Inhibit (controlling impulses, modifying behavior), Shift (cognitive flexibility, transitioning), Emotional Control (emotional modulation), Initiate (beginning a task/activity, in-dependently generating ideas), Working Memory (holding information in mind, persistence), Plan/Organize (anticipating future events, setting goals), Organization of Materials (workspace, play areas, orderliness), and Monitor (work checking, keeping track of how behaviors affect others). The first three scales combine to form the Behavioral Regulation Index; the remaining five constitute the Metacognition Index. The Global Executive Composite is computed from the combination of the Behavioral Regulation Index and Metacognition Index. There are also two validity scales, the Inconsistency and Negativity scales, that assist in detecting response biases. T-scores and percentiles are computed from raw scores and can be graphed on the reverse side of the scoring summary sheet. T-scores higher than 65 (1.5 standard deviations above average) are considered to have reached a clinical threshold. There are different norms for boys and girls. The BRIEF is particularly useful in evaluating children with ADHD, traumatic brain injury, autism spectrum disorders (ASDs), and learning disorders and those who experience cognitive, behavioral, or academic problems and whose initial test results are inconclusive.

WIDE RANGE ASSESSMENT OF MEMORY AND LEARNING (WRAML)/WIDE RANGE ASSESSMENT OF MEMORY AND LEARNING—2 (WRAML-2)^63,⁶⁴

The WRAML (ages 5-17) and WRAML-2 (ages 5-90) are designed to test visual and verbal memory. The WRAML-2 contains six core subtests (the WRAML has nine): Story Memory, Verbal Learning, Design Memory, Picture Memory, Finger Windows, and Number/Letter Memory. Verbal Memory Index (Story Memory, Verbal Learning), Visual Memory Index (Design Memory, Picture Memory) and Attention/Concentration (Finger Windows, Number/Letter Memory) summary scores are obtained (M = 100, SD = 15). There are optional Sentence Memory, Sound-Symbol, Verbal Working Memory, and Symbolic Memory subtests. Delayed recall and recognition memory can also be assessed. A General Memory Index is computed from the core subtests. Scores on Memory Screening, consisting of the first four core subtests (taking 20 minutes), correlate highly with those of the General Memory Index (r = 0.91). In contrast to the WRAML, there is no Learning Index in the WRAML-2. The WRAML-2 also allows assessment of primary/recency effects, immediate/delayed recall, rote versus meaningful information, visual/verbal differences, working memory, short-term memory, sustained attention, and recognition versus retrieval memory. This test is useful in evaluation of children with learning disorders, those suspected of having verbal processing problems, and those suspected of having ADHD.

TESTING METHODS: BEHAVIORAL AND EMOTIONAL

Assessment of social, emotional, and behavioral adjustment of children typically begins with a parent or caregiver interview regarding the nature, severity, and frequency of concerns. Most child assessment techniques rely on caregiver reports because it is presumed that adults who interact daily with a child are the most knowledgeable informants about a child’s functioning. School-aged children and adolescents should also have the opportunity to provide their own perceptions and information about their symptoms. Younger children (younger than 10 years) can provide assessment information, but their self-descriptions tend to be less reliable; therefore, direct and multiple observations and interviews may be necessary.

A criticism of reliance on caregiver reports in child assessments is that they are subject to reporter bias. However, all reports are subject to “bias,” including those from the child, parents, clinicians, teachers, and other observers. All reports are to some extent limited (or “biased”) by the perspectives, knowledge, recall, and candor of the informants. Because there is no unbiased “gold standard” source of data about children’s problems, data from multiple sources are always needed. Regardless of the child’s age, behavioral and emotional assessment strategies almost always should include information obtained from multiple sources, including parents, teachers, and the child, as well as by direct observation of the child. Data from multiple informants with different perspectives provide critical information about how the child functions in different settings such as at home, at school, and with friends. Even when there is discrepant information obtained from caregivers (as is often true), multiple vantage points are useful in determining the scope and functional effect of behavior problems.⁶⁵

Assessment of child and adolescent emotional and behavioral problems is further complicated because of the high rate of comorbidity, heterogeneity, and severity of concerns. Children referred for assessments often meet diagnostic criteria for multiple disorders or display symptoms associated with multiple disorders. Thus, it is often important to assess not only a referred problem but also a broad range of social, emotional, and behavioral domains. For example, in their review of evidence-based assessment of conduct problems, McMahon and Frick ⁶⁶ concluded that because of the high rate of comorbid disorders (e.g., ADHD, depressive and anxiety disorders, substance use problems, language impairment, and learning difficulties), initial assessments of youth with conduct problems should include broadband measures to screen for all conditions, followed by disorder-specific scales, interview strategies, and standardized testing of conduct and comorbid disorders.

Behavioral Rating Scales

Behavior rating scales are an extremely useful and efficient method for obtaining data on child functioning. Most rating scales use a standard questionnaire, checklist, or Likert-response format for surveying areas of interest and usually are completed by caregivers without much assistance. Rating scales include brief screening measures that assess global, broad-based measures, and problem-specific scales.

Broad-based behavioral assessment instruments assess multiple dimensions of behavior in children. Most are empirically developed taxonomies that are symptom driven and do not necessarily correspond to specific diagnostic schemas. On rating scales, informants rate the child on a broad range of social competencies and problematic behaviors. Results produce empirically derived factor scores on broad dimensions (e.g., internalizing and externalizing problems) and specific symptom areas (e.g., depression or aggressiveness) based on age and gender norms. Parent, teacher, and self-report forms are available for crossinformant comparisons. Rating scales yield very useful information about a child’s functioning in comparison with children of the same age and gender, and generally are viewed as necessary components of most child assessments.

ACHENBACH SYSTEM OF EMPIRICALLY BASED ASSESSMENT/CHILD BEHAVIOR CHECKLIST ^11,⁶⁷^–⁷¹

The CBCL was one of the first broad-based rating scales of behavior in children to be developed, and it continues to be the most widely used method for behavioral assessments in children. Achenbach began work on what would become the CBCL in the 1960s in an effort to differentiate child and adolescent psychopathology.⁶⁸ At that time, the DSM provided just two categories for childhood disorders: Adjustment Reaction of Childhood and Schizophrenic Reaction, Childhood Type. Achenbach and collaborators applied an empirically based approach to child psychopathology much like what was used in the development of the Minnesota Multiphasic Personality Inventory. This approach involved recording problems for large samples of children and adolescents, performing multivariate statistical analyses to identify syndromes of problems that co-occur, using reports to assess competencies and adaptive functioning, and constructing age and gender-specific profiles of scales on which to display individuals’ scores.¹¹ These taxonomic procedures revealed that most behavior problems in children could be broadly divided into “internalizing” and “externalizing” conditions. This pioneering work had enormous influence on clinical and research assessment practices and established the empirical foundation for contemporary conceptualizations of child psychopathology.

The CBCL was published first in 1983 as a measure of behavior problems in children aged 4 to 18 years. Currently, there are ASEBA materials for ages 1½ to older than 90 years. There are forms for preschoolers (1½ to 5 years, parent and teacher/daycare versions)⁶⁹ and school-aged children (parent, teacher versions for children aged 6 to 18 years and youth self-report for ages 11 to 18 years),⁶⁷ as well as for adults (18 to 59 years)⁷⁰ and older adults (60 to older than 90 years)⁷¹ (both with caregiver and self-report formats). For each problem listed, informants provide ratings on the following scale: 0 = “not true,” 1 = “somewhat or sometimes true,” and 2 = “very true or often true.” Hand-scored and computer-scored profiles are available, as are Spanish-language forms.

The Child Behavior Checklist for Ages 1½-5 (CBCL/1½-5) obtains parents’ ratings of 99 problem items along with descriptions of concerns and competencies. Scales are based on parent ratings of 1728 preschool children; norms are based on a national sample of 700 children. Raw scores can be translated into standard T-scores, yielding interpretative information on three summary scales (Internalizing, Externalizing, and Total Problems), as well as on clinical syndromes scales (Emotionally Reactive, Anxious/Depressed, Somatic Complaints, Withdrawn, Attention Problems & Aggressive Behavior, and Sleep Problems). A Language Development Survey is included to screen for language delays. DSM-oriented scales pertaining to affective problems, anxiety problems, pervasive developmental problems, attention-deficit/hyperactivity problems, and oppositional defiant problems are now available.

The Child Behavior Checklist for Ages 6-18 (CBCL/6-18) similarly obtains reports from parents, close relatives, and/or guardians regarding school-aged children’s competencies and behavioral/emotional problems. The competency scale includes 20 items about a child’s activities, social relations, and school performance. Specific behavioral and emotional problems are described in 118 items that are rated along the 0-to-2 scale described previously, along with two open-ended items for reporting additional problems. A scoring profile provides raw scores, T-scores, and percentiles for three competence scales (Activities, Social, and School); Total Competence; eight crossinformant (clinical scale) syndromes; and Internalizing, Externalizing, and Total Problems (broad scales). The eight clinical scales scored from the CBCL/6-18 Teacher Report Form and Youth Self-Report are Aggressive Behavior; Anxious/Depressed; Attention Problems; Rule-Breaking Behavior; Social Problems; Somatic Complaints; Thought Problems; and Withdrawn/Depressed. Now available are also six DSM-oriented scales associated with affective problems, anxiety problems, somatic problems, attention-deficit/hyperactivity problems, oppositional defiant problems, and conduct problems. The school-age scales are based on new factor analyses of parents’ ratings of nearly 5000 clinically referred children, and norms are based on results from a nationally representative sample of 1753 children aged 6 to 18 years ¹¹ (Box 7C-3).

BOX 7C-3 CASE 2: BEHAVIORAL AND EMOTIONAL ASSESSMENT DISCUSSION

The behavior problem profiles obtained on the CBCL/6-18 and the Youth Self-Report for Rachel are shown in the following two illustrations. On the CBCL problem scales (completed by her mother), Rachel’s Total Problems, Internalizing, and Externalizing scores and syndrome scales were all in the normal ranges for girls aged 12 to 18. Similarly, a teacher completed a Teacher Report Form, and results were all within the normal range. However, on the Youth Self-Report problem scales, Rachel reported more problems than are typically reported by teenage girls, particularly withdrawn behavior, somatic complaints, problems of anxiety or depression, problems in social relationships, thought problems, attention problems, and problems of an aggressive nature. Rachel’s responses on the Minnesota Multiphasic Personality Inventory—Adolescent indicated that she was experiencing high levels of general distress. Elevations on clinical scales 2,3,7,8,0 suggested that she may have felt anxious, lonely, and pessimistic much of the time and may have felt isolated from others and inferior. In other words, Rachel reported having high levels of internalizing symptoms, as well as difficulties managing social relationships and aggression. Crossinformant comparisons indicate that adults in Rachel’s life were not aware of the level of her internal distress. Discrepancies between Rachel’s self-report of symptoms and the ratings by her mother became a springboard for validating Rachel’s need for mental health attention and led to better communication within the family.

ASEBA materials are backed by extensive research in their development and have been used in more than 6000 studies pertaining to a broad range of behavioral health topics. There is strong support for its use with multidimensional child assessments in pediatric settings, (e.g., Mash and Hunsley ²; Riekert et al,⁷² Stancin and Palermo⁷³), although criticisms have been raised about the validity of the CBCL for populations of chronically ill children.⁷⁴

BEHAVIOR ASSESSMENT SYSTEM FOR CHILDREN—SECOND EDITION (BASC-2)⁷⁵

The BASC-2 is another broad, multidimensional rating scale system designed to measure behavior and emotions of children and adolescents. It includes a Parent Rating Scale, a Teacher Rating Scale, and a Self-Report of Personality. Norms are provided for ages 2 years 0 months through 21 years 11 months (Teacher Rating Scale and Parent Rating Scale) and 8 years 0 months through college age (Self-Report of Personality). T-scores and percentiles for a general population and clinical populations are available for interpretation. Computer scoring and Spanish language forms are available. The Parent Rating Scale requires approximately a fourth grade reading level; forms pertaining to three age levels—preschool (ages 2 to 5), child (ages 6 to 11), and adolescent (ages 12 to 21)—measure adaptive and problem behaviors in the community and home setting. The Parent Rating Scale contains 134 to 160 items and entails use of a four-choice response format. Clinical scales include Aggression, Anxiety, Attention Problems, Atypicality, Conduct Problems, Depression, Hyperactivity, Somatization, and Withdrawal. Adaptive scales include Activities of Daily Living, Adaptability, Functional Communication, Leadership, and Social Skills. The Teacher Rating Scale similarly measures adaptive and problem behaviors in the preschool or school setting. An additional clinical domain in the Teacher Rating Scale is Learning Problems; Study Skills are measured on the Adaptive Scales. The Self-Report of Personality provides insight into a child’s or adolescent’s thoughts and feelings, including scales such as Anxiety, Attention Problems, Sense of Inadequacy, Social Stress, Interpersonal Relations, and Self Esteem (among others). One strong advantage of the BASC-2 over other rating scales is the inclusion of validity and response set indexes that may be used to judge the quality of responses.

INFANT-TODDLER SOCIAL-EMOTIONAL ASSESSMENT SCALE (ITSEA)⁷⁶

The ITSEA provides a comprehensive analysis of emerging social-emotional development of infants and toddlers aged 12 to 36 months. It includes parallel parent and child care provider forms that contains 166 items focusing on behavioral and emotional problems and competencies. A national normative sample consisted of 600 children, with clinical groups that included children with autism, language delays, prematurity, and other disorders. English and Spanish forms are available with computer or hand scoring that yield T-scores for 4 broad domains, 17 specific subscales, and 3 index scores. An interesting feature of the ITSEA is its companion measure, the Brief Infant-Toddler Social-Emotional Assessment Scale.⁷⁷ This measure contains 42 items, is completed by a parent or caregiver, and can be used first to screen for possible concerns and then followed with the ITSEA for more comprehensive evaluation.

MINNESOTA MULTIPHASIC PERSONALITY INVENTORY—ADOLESCENT (MMPI-A)⁷⁸

Although it is not a behavior rating scale per se, the MMPI-A is a self-report questionnaire that yields indices pertaining to the nature and severity of symptoms in relation to peers with psychiatric disorders. Norms are based on a nationally representative sample of more than 1600 male and female adolescents in the United States. The MMPI-A scoring yields T-scores for 7 validity scales, 10 clinical scales, 15 content scales, and other supplementary scales and indices. The MMPI-A is a lengthy measure (478 true/false items) that requires at least a sixth grade reading level; therefore, some adolescents find it to be difficult to complete. However, a shortened 350-item version yielding basic results can be administered to save administration time.

Projective Techniques

Projective assessment techniques encourage a respondent to “project” issues, concerns, and perceptions onto ambiguous stimuli such as an inkblot or a picture. The basic premise is that when the child is faced with an ambiguous stimulus or one requiring perceptual organization, underlying psychological issues affecting the child will influence interpretation of these stimuli. The most commonly used projective techniques with children include use of child human figure or family drawings, storytelling responses to pictures or photographs, and reactions to Rorschach inkblots. Once the mainstay of personality assessment, projective assessment techniques have fallen out of favor in the era of evidence-based assessment techniques. However, some techniques continue to have clinical utility and validity with specific assessment purposes. They can provide clues that subsequently can be pursued with interviews and other techniques. For example, family drawings can be a helpful source of qualitative information about a child’s view of family relations, especially with younger children with more limited verbal expressions. Responses to incomplete sentences, story cards, and “3 wishes” (“if you could have 3 wishes, what would they be?”) can reveal insights into a child’s internal representations of relationships. In addition, the Rorschach has been shown to be a valid method for examining perceptual accuracy in youth with possible thought disorders when used with validated scoring systems such as John E. Exner’s system for scoring the Rorschach test.⁷⁹

Assessing Peer Relationships

Peer perspectives contain unique and important information about children but are usually missing in multi-informant clinical assessments. Peers play critical social roles in children’s lives and have access to information that adults may not have and that children may be reluctant to self-report. For example, social acceptance within a peer group is an important aspect of a child’s functional status, but it can be difficult to assess accurately by interview or parent report. Sociometric assessments that use peer nomination methods have been developed as a systematic way of gathering information about the extent to which a child is accepted or rejected within a peer group.⁸⁰ Strategies may involve asking children by interview or on paper to nominate three classmates with whom they most like to play (positive nominations/peer acceptance) and three classmates with whom they would least like to play (negative nominations/peer rejection). An alternative method is for children to rate how much they like to play with each classmate, for example, on a scale from 1 (“I don’t like to”) to 5 (“I like to a lot”). Using various statistical classification schemes, children can be considered to be popular, accepted, rejected, neglected, or controversial.

Peer nomination assessment instruments have been used to measure specific domains of child functioning besides peer acceptance. Techniques often involve presenting children in a classroom with a list of behavioral descriptions and asking them to select which of their peers best match each descriptor. Peer nomination approaches with acceptable reliability and validity have been developed to obtain peer ratings for a number of specific behavioral or emotional problem domains in children, such as ADHD symptoms, aggression and withdrawal, and depression.⁸¹

The Peer-report Measure of Internalizing and Externalizing Behavior ⁸¹ was developed to assess a broad range of peer-reported externalizing and internalizing child psychopathology. As with other peer-nomination inventories, students are provided with classroom roster sheets that contain listings of all of the children in the classroom. Then, they are asked to select up to three classmates (either gender) who best fit the description read to them (e.g., “worry about things a lot” or “get mad and lose their temper”). Preliminary reports suggest that this measure demonstrates adequate reliability and validity as a broad measure of psychopathology.

Peer nomination procedures may be useful for psychosocial screening in the classroom, evaluating the effectiveness of mental health interventions on social behaviors in school settings, and conducting research on a range of child social behaviors. However, for clinical purposes, it may difficult (and impractical) to obtain peer ratings of an individual child. For this reason, parent, teacher, and rating scales of behavior in children can be used to as a more practical alternative for multi-informant assessments of peer relationships and functioning. For example, the ASEBA scales (e.g., the CBCL Teacher Report Form) include positive peer relationship items on the competence scales and a social problems scale highlighting peer difficulties on the problem scales.

Testing for Specific Problems

PARENT-CHILD INTERACTIONS

Parent-child interaction problems contribute significantly to the origin and maintenance of a wide range of behavior problems in children. Therefore, treatment of children in mental health settings, especially children with negative, externalizing behaviors, often focuses on promoting optimal parenting styles and parent-child interactions. For these reasons, assessment of parent-child interactions is essential when treatment interventions are planned for children with a wide range of behavioral problems.⁸²

Parent-child interactions may be assessed through observation, Q-sorts (cards with descriptive labels are “sorted” into piles as to how well they pertain to a child), or rating scales. Qualitative assessments through observations may be conducted in vivo or by using videotape recordings of parent-child interactions. The Dyadic Parent-Child Interaction Coding System ⁸³ is widely used in clinical and research settings to code direct observations in a standardized laboratory setting. Observations (through a one-way mirror or videotape) are made during three standard parent-child interaction settings: child-led play, parent-led play, and cleaning up. Parent and child verbalizations and physical behaviors are coded along 25 categories. Reliability and validity studies provide good support for use of the Dyadic Parent-Child Interaction Coding System to evaluate baseline and post treatment behaviors, as well as to measure ongoing treatment progress.⁸³ In addition to this structured method of observation, it is sometimes useful to observe parent-child interactions in more naturalistic settings.⁸⁴

The classic research method for assessing the quality of parent-child relationships is the laboratory Strange Situation Paradigm developed by Ainsworth (described by Shaddy and Colombo ⁸⁵). Strength and quality of an infant’s attachment to a caregiver are assessed by placing the child in situations in which he or she is alone with the caregiver, separated from caregiver and introduced to a stranger, and then reunited with the caregiver. The infant can be classified as securely attached, ambivalent/resistant, avoidant, or disorganized on basis of reactions in those situations.

Information about parent-child interactions in clinical settings can be obtained from sorting techniques and rating scales. The Attachment Q-Set (as described by Querido and Eyberg)⁸² is a measure of a child’s attachment related behaviors. Parents sort 90 behavioral dimensions of security, dependency, and sociability into piles according to the extent to which they describe the child. Results of the Q-set are related to results obtained by exposing infants to the Strange Situation Paradigm. In addition, there are a variety of measures by which to assess various dimensions of parent-child relationships and interactions through the use of rating scales and checklists.⁸²

DEPRESSION

Self-report questionnaires and rating scales are usually preferred over parent or teacher rating scales for screening depression in children and teens and for monitoring symptoms during treatment. However, they tend to have limited sensitivity and specificity and therefore should be used cautiously.⁸⁶ Moreover, they can be influenced by respondent bias if the child does not want to divulge information. The most widely used depression rating scale for children and adolescents is the Children’s Depression Inventory.⁸⁷ This instrument includes 27 items covering a range of depressive symptoms and associated features and it can be used in youth ages 7-17. Research on the Children’s Depression Inventory has generally shown it to have good internal consistency, test-retest reliability, and sensitivity to change, but the evidence for discriminant validity is more limited.⁸⁶

The Mood and Feeling Questionnaire ⁸⁸ is a 32-item measure of depression (and there is an even briefer 13-item version) that has been shown to have good estimates of reliability, discriminant validity, and sensitivity to change for children aged 8 to 18 years.⁸⁶ The Reynolds Child Depression Scale⁸⁹ and the Reynolds Adolescent Depression Scale⁹⁰ are 30-item scales for youth aged 8 to 12 and 13 to 18. These scales have also been shown to be internally consistent and stable, although there is more limited evidence of discriminant validity and sensitivity to change.⁸⁶

The Children’s Depression Rating Scale ⁹¹ is an interesting hybrid measure that combines separately obtained responses from a child and an informant along with the clinician’s behavioral observations. Seventeen items assess cognitive, somatic, affective, and psychomotor symptoms; cutoff scores provide estimates of level of depression. Moderate reliability, convergent validity, and sensitivity to treatment have been demonstrated, but, as with most measures of depression, it does not distinguish between depression and anxiety very well.⁸⁶

Assessment of depression in infants and preschool children is very challenging because of the difficulty of eliciting self-report information in a reliable or valid manner. Caregiver reports obtained with broadband measures (such as the CBCL/1½-5 or Teacher Report Form 1-5) may be a useful alternative or adjunctive tool. A new parent report screening measure of preschool depression is the Preschool Feelings Checklist.⁹² This 20-item checklist of depressive symptoms in young children was shown to have high internal consistency and to be correlated highly with the Diagnostic Interview for Children—IV and the CBCL on a sample of 174 preschool children from a primary care setting. Moreover, preliminary study suggested that it had acceptable sensitivity and specificity when a cutoff score of 3 was used.⁹²

ANXIETY

Screening for anxiety disorders is most often done with rating scales, although data supporting their use are sparse, and several scales have been shown to measure different anxiety constructs.⁹³ The Multidimensional Anxiety Scale for Children⁹⁴ is a youth self-report rating scale that assesses anxiety in four domains: physical symptoms, social anxiety, harm avoidance, and separation/panic. Children aged 8 to 19 are asked to rate how true 39 items are for them. Internal consistency reliability coefficients of subscales and total scores range from 0.74 to 0.90, although interrater reliability is lower (0.34 to 0.93). The Multidimensional Anxiety Scale for Children has some support for use as a screener for anxiety disorders, as does the Social Phobia and Anxiety Inventory for Children,⁹⁵ the Social Anxiety Scale for Children⁹⁶ and the Social Anxiety Scale for Adolescents.⁹⁷ The Revised Children’s Manifest Anxiety Scale,⁹⁸ although widely used, does not appear to discriminate between children with anxiety disorders and those with other psychiatric conditions and therefore should be used cautiously as a screening or diagnostic tool.⁹³ However, it does appear to be sensitive to change and therefore may be a useful tool for monitoring treatment effects.

ATTENTION-DEFICIT/HYPERACTIVITY DISORDER

ADHD is one of the most common childhood mental health disorders and a frequent diagnostic consideration in developmental-behavioral pediatric settings. Despite the vast literature on ADHD psychopathology and treatment, considerably less research has been directed toward determining best assessment practices.⁵ The most efficient empirically based assessment methods for diagnosing ADHD are parent and teacher symptom rating scales based on DSM-IV criteria (e.g., the ADHD Rating Scale ⁹⁹ or the Vanderbilt ADHD Diagnostic Scales¹⁰⁰) or derived from a rational or empirical basis (e.g., BASC or CBCL).¹⁰¹ Broadband rating scales (such as the BASC or CBCL) were not recommended for diagnosing ADHD in the American Academy of Pediatrics Diagnostic Guidelines¹⁰² because broad domain factors (e.g., externalizing) do not discriminate children referred for ADHD from nonreferred peers.^103,¹⁰⁴ However, a more recent review⁵ challenged this recommendation, concluding that the Attention Problems subscales within the CBCL and BASC do accurately identify children with ADHD. Because of their ability to identify other comorbid conditions and impairments, broadband measures (which also have advantages of extensive normative information across gender and developmental ages) are probably more efficient than DSM-IV—based rating scales for diagnosing ADHD.⁵

As with any disorder, ADHD should not be diagnosed with symptom rating scales alone. Clinical interviews and other sources of data are needed to establish pertinent history, to rule out other disorders that may better account for symptoms (e.g., autism, low intellectual functioning, post-traumatic stress disorder, adjustment problems), and to assess comorbid conditions. Interestingly, DSM-based structured interviews have not been shown to add incremental validity to parent and teacher rating scales.⁵ Behavioral observation assessment procedures have been shown to be empirically valid in numerous studies but practically impossible in most clinical settings, although parent and teacher proxy observational measures have been developed.⁵ Measures of child functioning and impairment in key domains including peer relationships, family relationships, and academic settings, should be included in an ADHD assessment and are likely to be more useful for treatment purposes than are global ratings of impairment. Moreover, assessment of ADHD needs to emphasize situational contexts and socially valid target behaviors (i.e., functional analysis of behavior) necessary for treatment planning (Box 7C-4).

BOX 7C-4 CASE 3: BEHAVIORAL ASSESSMENT RESULTS

Jose is reported to have a short attention span and to display social and academic impairment. Parent and teacher CBCL measures were obtained to broadly examine the nature and severity of behavior problems (the Spanish version was administered to parents). Clinically significant scores were on the following parent and teacher subscales: Social Problems, Attention Problems, and Aggressive Behavior. Scores on the Teacher Report Form Attention Problems subscales were further clinically significant for Inattention (98th percentile) and Hyperactivity-Impulsivity (97th percentile). Scores on the Vanderbilt ADHD Diagnostic Scales, used to collect information about the presence of DSM-IV symptoms, showed that Jose’s mother and teacher considered him to display symptoms associated with ADHD, combined type. Maternal reports on the Vanderbilt scales were considered cautiously because of possible language and cultural differences from the normative sample. Clinical interviews and academic screening to rule out communication problems and learning disorders led to diagnoses of ADHD and Oppositional Defiant Disorder. Idiographic measures of daily behavior problems targeted out-of-seat behaviors during instruction, schoolwork completion, and peer aggression for behavioral interventions. A treatment plan was developed to include a trial of stimulant medication, home and classroom behavioral interventions, parent training in behavior management skills, and a social skills group.

AUTISM SPECTRUM DISORDERS

Empirically based procedures for assessing ASDs have emerged since the 1990s, greatly improving the accuracy and validity of the diagnoses and the ability to plan and evaluate interventions. Ozonoff and associates ¹⁰⁵ summarized the current state of the art with regard to assessment of ASDs and recommended a core assessment battery that includes collecting diagnostic information from parents and by direct observation along with standardized measures of intelligence, language, and adaptive behavior. One ASD-specific measure is the Autism Diagnostic Interview—Revised,¹⁰⁶ a comprehensive, semistructured diagnostic parent interview that elicits current behavior and developmental history. It yields three algorithm scores measuring social difficulties, communication deficits, and repetitive behaviors; these scores have been shown to distinguish children with autism from children with other developmental delays. I is very labor intensive in terms of training (3 days) and administration time (3 hours) and therefore has been used more in research than in clinical settings.¹⁰⁵

The Autism Diagnostic Observation Schedule (ADOS)¹⁰⁷ is a widely used semistructured, interactive assessment of ASD symptoms. It includes four graded modules and can be used with a broad range of patients from the very young and nonverbal to high-functioning, verbal adults. Modules 1 and 2, geared toward developmentally younger children, assess social interest, joint attention, communication behaviors, symbolic play, and atypical behaviors. Modules 3 and 4 assess higher level functioning individuals, with a focus on conversational reciprocity, empathy, insight into social relationships, and special interests. Administration time is typically less than an hour. For either pair of modules there are empirically derived cutoff scores for autistic disorder and for broader ASDs (such as Asperger syndrome). Studies on the psychometric properties of the Autism Diagnostic Observation Schedule indicate excellent reliability (interrater, internal consistency, and test-retest reliability) for each module, as well as excellent diagnostic validity.¹⁰⁵

A parent-report alternative to the Autism Diagnostic Interview—Revised for children older than 4 years is the Social Communication Questionnaire.¹⁰⁸ This instrument has a lifetime-behavior version helpful for diagnostic purposes, as well as a current-behavior version that can be used for evaluating a person’s change over time.¹⁰⁵ Currently, the widely popular Gilliam Autism Rating Scale¹⁰⁹ has not been subjected to sufficient psychometric study to recommend its use.¹⁰⁵ Several parent report measures have been developed to help diagnose other ASD disorders (e.g., Asperger syndrome), but at present, there is not sufficient empirical study to recommend their use. A clinically practical method of direct observation for children older than 24 months is the Childhood Autism Rating Scale.¹¹⁰ Little training is necessary to rate 15 items on a 7-point scale (from “typical” to “severely deviant”); the results yield a composite score that is correlated highly with that of the Autism Diagnostic Interview—Revised (although it may overidentify children with mental retardation as having ASD).

Family Assessment

Evaluations in developmental and behavioral pediatrics often include a family assessment in order to understand the interpersonal dynamics of the family system.¹¹¹ Using an unstructured interview format, a clinician may inquire about family structure, roles, and functioning and explore each family member’s perception of a presenting issue or problem. This assessment approach is often useful in family therapy sessions. Structured interviews may be employed to ensure that specific areas or topics are covered. Genograms are graphic representations of families that begin with a family tree and may include additional details about family structure, cohesiveness or conflicts, timelines of events, and family patterns (e.g., domestic violence, substance abuse, divorce, suicides, health conditions, presence of behavioral disorder). Formal, validated observational approaches to family assessment typically involved trained observers who coded ratings during live or videotaped observations of family interactions and are mostly confined to research settings.

There are many family self-report questionnaires targeting different aspects of functioning that may be useful in family assessments, especially in research settings.¹¹² Although questionnaires have psychometric appeal, they carry biases of the individual completing them, which is counter to the spirit of family assessment. Moreover, questionnaires may have limited utility when specific treatment recommendations are developed in clinical settings for a particular family’s set of concerns.¹¹¹ A popular example of a parent report family questionnaire with research and clinical applications is the Parenting Stress Index.¹¹³ This index consists of 120 items about child characteristics, parent personality, and situational variables, and it yields a Total Stress Score, as well as scale scores for child and parent characteristics. It has been translated and validated for use with a variety of international populations and has been shown to be useful in a clinical contexts.

Functional Outcomes

Measures of global functioning are typically ratings of a clinician’s judgment about a child or adolescent’s overall functioning in day-to-day activities at school, at home, and in the community.¹¹⁴ Measures of global functioning are useful for identifying need for treatment, as well as for monitoring treatment effects and predicting treatment outcome. The importance of global functioning is reflected in the placement of the Global Assessment of Functioning—which stipulates that impairment in one of more areas of functioning is necessary in order to meet criteria for a diagnosis—as Axis V on the DSM-IV. The Global Assessment of Functioning is a scale of a mental health continuum from 1 to 100 with 10 anchor descriptions; higher scores reflect better functioning. For example, a score between 31 and 40 would be given for a child with major functional impairment in several areas (frequently beats up younger children, is unruly at home, and is failing in school); a score between 61 and 70 is given to a child with mild symptoms (mild depressed mood) or some difficulties in functioning (disruptive in school) but who generally functions fairly well and who has good social relationships. Shaffer and colleagues modified the anchors of the Global Assessment of Functioning to pertain better to youth, creating the Children’s Global Assessment Scale (CGAS).¹¹⁵ This instrument yields one score and has been used in a large number of psychiatric outcome studies, especially medication-related research.¹¹¹

A widely used measure of functioning is the Child and Adolescent Functional Assessment Scale.¹¹⁶ This measure is a clinician-rated instrument consisting of behavioral descriptions (e.g., is expelled from school, bullies peers) grouped into levels of impairment for each of five domains: role performance (school/work, home, community), behavior toward others, moods/self-harm, substance use, and thinking. The Child and Adolescent Functional Assessment Scale has been shown to have considerable criterion-related and predictive validity and is widely used to evaluate outcome in clinical settings and in clinical research.¹¹¹

Adaptive functioning measures such as the Vineland Adaptive Behavior Scales ¹¹⁷ are used to assess personal and social skills needed for everyday living and are especially useful for identifying children with mental retardation, developmental delays, and pervasive developmental disorders. The Vineland scales include survey interview and parent/caregiver rating forms that yield domain and adaptive behavior composite standard scores (M = 100, SD = 15), percentile ranks, adaptive levels, and age-equivalent scores for individuals from birth to age 90 years. Domains assessed include Communication, Daily Living Skills, Socialization, Motor Skills, and an optional Maladaptive Behavior Index.

Health-related quality-of-life (HRQOL) measures have been developed to evaluate functional outcomes in clinical and health services research. HRQOL measures differ from more traditional measures of health status and physical functioning by also assessing broader psychosocial dimensions such as emotional, behavioral, and social functioning. The Pediatric Quality of Life Inventory (PedsQL 4.0)¹¹⁸ is an example of an HRQOL measure that has been developed and validated for use in pediatric settings. The PedsQL 4.0 Generic Core Scales assess physical, emotional, social, and school functioning with child self-report (ages 5 to 18) and parallel parent proxy-report formats (for children aged 2 to 18 years). Physical Health and Psychosocial Health summary scores are transformed to a scale of 0 to 100 in which higher scores reflect better health-related quality of life. The PedsQL 4.0 had excellent internal consistency reliability in a large pediatric sample, distinguished healthy children from those with chronic health conditions, and was related to other indicators of health status.¹¹⁸

SUMMARY AND IMPLICATIONS FOR CLINICAL CARE

Interviews, psychological tests, rating scales, and other measurement strategies are central in the comprehensive assessment of behavior and development of children. Use of assessment techniques in the cases featured in this chapter highlight the contributions of multi-informant, multimethod evidence-based approaches to the clinical care of children referred for developmental and behavioral services. As a result of the comprehensive evaluation, the teenager in Case 2 (Rachel) received a diagnosis of Major Depression, single episode, along with Cognitive Disorder not otherwise specified. Treatment recommendations included individual cognitive behavior therapy to focus on adaptive coping, a trial of antidepressive medication, family education, and educational adjustments to allow her to have more time to complete school work. She opted to continue to take advanced language courses but enrolled in slower paced math courses. Interventions were very successful; subsequent assessments were used to verify treatment effects.

A psychological evaluation is complete when assessment data have been organized, synthesized, integrated, and presented, usually in the form of a written report.^1,¹⁷ Reports are usually independent documents written with an intended audience in mind. They should include assessment findings, such as relevant history, current problems, assets, and limitations, as well as behavioral observations and test interpretations. A typical report includes the following sections or elements: identifying information, reason for referral, sources of assessment information (including tests administered if any), behavioral observations, results and impressions, recommendations, and summary.

A major concern in developmental and behavioral assessment has been the misuse of test data.¹ For example, deviations from standardized procedures in test administration, disrespect for copyrights, use of tests for purposes without adequate research support, interpretation of results without taking into account appropriate norms or reference groups, and use of a single test score for making decisions about a child are among more common problems with test use. Led by a consortium of professional associations (including the American Educational Research Association, the American Psychological Association, and the National Council on Measurement in Education), the Joint Committee on Testing Practices has ongoing workgroups charged with improving quality of test use. Several documents have been created to guide professionals who might develop or use educational or psychological tests, including Standards for Educational and Psychological Testing¹¹⁹ and the Code of Fair Testing Practices in Education (Revised).¹²⁰

Another important clinical issue pertains to what qualifications are necessary for psychological test administrators. Although a thorough review of these issues is beyond the scope of this chapter, the Joint Committee on Testing Practices has developed guidelines that address this issue.^121,¹²² Most discussions about user qualifications emphasize knowledge and skills necessary to administer and interpret tests in the context in which a particular measure is being used, as opposed to a particular professional degree or license. Some instruments can be administered with relatively little training in psychometric issues (e.g., clinical rating scales such as the Vanderbilt ADHD Diagnostic Scales), whereas other instruments require extensive training and supervised experience (e.g., individually administered ability tests such as the BSID or Wechsler tests). To be qualified to administer most of the instruments discussed in this chapter, a test user should have extensive knowledge and skills related to psychometrics and measurement, selection of appropriate tests, test administration, and other variables that influence test data. Such knowledge and skills generally require advanced graduate level coursework in psychology and supervised clinical experience. Psychologists (among others) are generally those who are qualified to use psychological tests properly.

Proper use of tests in clinical assessments require high level skills and professional judgments in order to make valid interpretation of scores and data collected from multiple sources, with the use of proper test selection, administration, and scoring procedures.¹²² When selecting methods, the clinician evaluates whether the construction, administration procedures, scoring, and interpretation of the methods under consideration match the current assessment need, knowing that mismatches may invalidate test interpretation. Instrument selection also is influenced by practical considerations such as training, familiarity, personal preference, and availability of test materials. Cost considerations may also factor into instrument selection. Test development can be very costly, especially if normative samples are broadly developed. Therefore, it may not be financially feasible to purchase test materials for all clinical assessments.

We wish to emphasize the importance of adhering to standardized administration procedures in using psychological tests. Valid interpretation of measurement results cannot be made if there are deviations in administration or scoring procedures. For example, interpretations based on test procedures that have been altered or shortened for convenience or other reasons without accompanying psychometric study are not valid or clinically sound. Likewise, interpretation of assessment results should never rely solely on test scores.¹ Clinical judgments should be made by integrating assessment and observational data, taking into consideration whether results are congruent with other pieces of information, discrepancies from different sources, and factors affecting the reliability and validity of results (e.g., motivation of child, language barriers).

Use of standardized ability, achievement, and behavioral tests has come under attack since the 1980s. Critics have argued that intelligence and achievement tests used to allocate limited educational resources penalize children whose family, cultural, and socioeconomic status are different from middle-class European American children.¹ Specifically, it has been argued that intelligence and achievement tests are culturally biased and thus harmful to African American children and other ethnic minorities. Other experts have been critical of test use to label children or have argued that normreferenced tests are imperfect in what they measure and therefore have little or no utility in the classroom. Dialog on these criticisms has led to improved test practices, including more representative normative groups, increased availability of tests in languages other than English, increased awareness of cultural factors among clinicians administering and interpreting tests, and use of criterion- or curriculum-based assessments.

Computers are playing more of a role in clinical assessments. They can facilitate administration and scoring of some tests and interview methods, recording of observational data, preparation of reports, and transmittal of assessment information.¹ For example, the CBCL’s computer scoring program yields several score profiles, including useful crossinformant comparisons along with a narrative report.⁶⁷ Computer-administered assessment methods have several advantages, including eliminating human clinicians’ biases, calculation errors, and memory difficulties. Computers will probably be used more extensively in the future to assist in selecting assessment instruments, making diagnoses, designing interventions, and monitoring treatment effects. However, it unlikely that computers will supplant the clinician, who will still be needed to integrate computer-generated results into meaningful recommendations. In fact, there are potential dangers of using computer-generated reports, and knowledgeable professionals understand that these reports should be used cautiously when being incorporated into assessment reports.