Construct validity is perhaps the most critical of the subtypes of validity. It establishes if the test actually achieves what it is supposed to achieve. It measures the extent to which a test correctly distinguishes the presence but also the absence of the condition that the test is supposed to detect. In colloquial terms, construct validity measures if the test actually works or not, and how well it works.
In the past, it was the habit of medicine to believe that all test results were correct and true; that if a test result was positive, the condition truly was present; and that if the test result was negative, the condition was definitely not present. This tradition was been refuted and supplanted.
There is a science that involves testing tests. It involves comparing, in the same sample of patients, the results of a test with unknown validity with the results of some other test whose validity is beyond question. That latter test is known as the criterion standard, formerly known as the ‘gold’ standard.
In reality, no test is perfect; and no criterion standard is absolute. The working definition of a criterion standard is that it is a test about whose results there is substantially less dispute than the test undergoing scrutiny. Examples of a criterion standard might include imaging findings, operative findings, or what a pathologist finds at postmortem. In practice, the criterion standard is usually a test that allows a more direct detection of the condition in question than the test under scrutiny, and which is less subject to errors of observation.
When a test is compared with a criterion standard, the results can be expressed as a contingency table (Table 16.1). Such a table shows the number of patients who have the condition according to the criterion standard, and how many do not; and in how many of each category the test in question was positive or negative. Four cells emerge. The ‘a’ cell is the number of patients in whom the condition is present and in whom the results of the test are positive. These are patients with true-positive responses. The ‘b’ cell contains those patients who do not have the condition but in whom the test was nevertheless positive. These responses are false-positive. The ‘c’ cell represents those patients who have the condition but the test is negative. These responses are false-negative, for the test failed to detect the condition when it should have done so. The ‘d’ cell represents those patients who do not have the condition and in whom the test is negative. The test correctly identified these patients as not having the condition, and their responses are true-negative.
From such a table, several descriptive statistics can be derived, which can be used to quantify the virtues of a diagnostic test, or the lack thereof. Paramount amongst these are the sensitivity and the specificity of the test.
Sensitivity is the extent to which the test correctly detects the condition that the test is supposed to detect. Conceptually, this is read down the first column of the figure. Numerically, sensitivity is the ratio between ‘a’ and ‘a+c’, for ‘a’ is the number of patients known to have the condition in whom the test was positive, while ‘a+c’ is the total number of patients who had the condition. Sensitivity is also known as the true-positive rate, for it describes the proportion of cases who should have been positive that the test actually did find, correctly, as positive.
Specificity is the extent to which the test correctly detects the absence of the condition. Conceptually it is read up the second column of the figure. Numerically, specificity is the ratio between ‘d’ and ‘b+d’, for ‘d’ is the number of patients known not to have the condition in whom the test was negative, while ‘b+d’ is the total number of patients who did not have the condition. Specificity is also known as the true-negative rate, for it describes the proportion of cases who should has been negative that the test actually did find, correctly, as negative.
A companion statistic is the false-positive rate. This is the proportion of cases who did not have the condition but in whom the test was, incorrectly, positive. Numerically it is the ration of ‘b’ to ‘b+d’. It is also the complement of the specificity, i.e.:
Failure to recognize both the occurrence and the prevalence of false-positive responses has been one of the major transgressions of medicine in the past. It is both false and illusory to assume that every test result that is positive is correctly positive. A test can be positive for reasons other than the sought-for condition being present. Unless the prevalence of false-positive results is known, the validity of the test remains in question, for an investigator cannot otherwise tell if a positive result is true-positive or false-positive.
The significance of false-positive responses can be realized by analyzing the contingency table. The total number of positive responses to the test is the number of true-positive cases (‘a’) and the number of false-positive cases (‘b’). The confidence that an investigator can have, that a given positive response is true-positive, is determined by the ratio of ‘a’ to ‘b’, or the ratio of ‘a’ to ‘a+b’. The greater the value of ‘b’, the less confidence an investigator can have that a given positive response is true-positive. In other words, false-positive responses compromise diagnostic confidence.
Buy Membership for Physical Medicine and Rehabilitation Category to continue reading.
Learn more here