CHAPTER 62 Statistics in Psychiatric Research
The word “statistics” derives from a term used for “numbers describing the state;” that is, the original statistics were numbers used by rulers of states to better understand their population. Thus, the first statistics were simply counts of things (such as the population of towns, or the amount of grain produced by a particular town). Today, we call these kinds of simple counts or averages “descriptive statistics,” and these are used in almost every research study, to describe the demographic and clinical characteristics of the participants in a particular study.
Modern psychiatric research also involves two additional classes of statistics: psychometric statistics and inferential statistics. Most psychiatric studies will involve all three classes of statistics.
In psychiatric research, demographic variables (such as gender and height) can be measured objectively. However, most of our studies also require the measurement of variables that are not as objective (e.g., clinical diagnoses and rating scales of psychopathology). Here, we usually cannot measure directly the characteristics we are really interested in, so instead, we rely on a subject’s score on either self-report or on investigator-administered scales. Psychometrics is concerned with how reproducible a subject’s score is (i.e., how reliable it is), and how closely it measures the characteristic we are really interested in (i.e., how valid it is).
Psychiatric researchers study relatively small samples of subjects, usually with the intent to generalize their findings to the larger population from which their sample was drawn. This is the realm of inferential statistics, which is based on probability theory. Researchers are reporting inferential statistics when you see the telltale p-values and asterisks denoting statistical significance in the text and tables of the Results sections.
All three kinds of statistics (descriptive, psychometric, and inferential) are present in most published papers in psychiatric research, and are considered in a particular order, for the following reasons. First, without reliable and valid measures, neither of the other kinds of statistics will be meaningful. For example, if we rely solely on clinicians’ judgments of patient improvement, but the study clinicians rarely agree on whether a particular patient has improved, any additional statistics will be meaningless. Likewise, a measure can be very reliably measured, as with a patient’s cell phone number, but this measure is not reliable for any of the purposes of the study. Second, descriptive statistics are needed to summarize the many individual subjects’ scores into summary statistics (such as counts, proportions, averages [or means], and standard deviations) that can then be compared between groups. Inferential statistics would be impossible without first having these summary statistics. Third, without inferential statistics and their computed probability values, the researcher cannot generalize any positive findings beyond the particular group being studied (and this is, after all, the usual goal of a research study).
Table 62-1 illustrates the characteristics of each class, as well as the order in which the classes must be considered, since each successive class rests on the foundation of the preceding class.
|Class of Statistic||Purpose||Examples|
To provide a concrete example of these sometimes abstract concepts, consider a fictional study based on the simplest research design in psychiatric research: a randomized double-blind trial of a new drug versus a placebo pill for obsessive-compulsive disorder (OCD).
Figures 62-1 through 62-3 contain the annotated Method and Results sections for this fictional study, showing how the various psychometric statistics are presented in the Method section, while descriptive statistics are presented in the Method and Results sections, and inferential statistics are presented in the Results section (for definitions of terms used in these figures, refer to the section on statistical terms and their definitions).
Researchers should test only a few carefully selected hypotheses (specified before collecting their data!) if their obtained p-values are to have any meaning. The more statistical tests you perform, the greater the chance of finding at least one significant by chance alone. Table 62-2 illustrates this phenomenon.
|Number of Statistical Tests Performed at p < .05||Probability of at Least One False-Positive Finding*|
One should not be impressed by a researcher who conducts eight t-tests, finds one significant at p < .05, and proceeds to interpret the findings as confirming his theory. Table 62-2 shows us that with eight statistical tests at p < .05, the researcher had a 33% chance of finding at least one result significant by chance alone.
The two key determinants in choosing a statistical method are (1) your research goal, and (2) the level of measurement of your outcome (or dependent) variable(s). Table 62-3 illustrates the key characteristics of the various levels of measurement and provides examples of each.
|Level of Measurement||Description of Level||Examples|
|Continuous (also known as interval or ratio)||A scale on which there are approximately equal intervals between scores|
Once the level of measurement of your outcome variable has been determined, you will decide whether your research question will require you to compare two or more different groups of subjects, or to compare variables within a single group of subjects. Tables 62-4 and 62-5 will help you choose the appropriate statistical method once you have made these decisions. (Note that these tables consider only univariate statistical tests; multivariate tests are beyond the scope of this chapter.)