diagnostic tests
play

Diagnostic Tests 1 Introduction Suppose we have a quantitative - PowerPoint PPT Presentation

Diagnostic Tests 1 Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1 , ..., n , and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement X i is thought to be related to


  1. Diagnostic Tests 1

  2. Introduction Suppose we have a quantitative measurement X i on experimental or observed units i = 1 , ..., n , and a characteristic Y i = 0 or Y i = 1 (e.g. case/control status). The measurement X i is thought to be related to the characteristic Y i in the sense that units with higher X i values are more likely to have Y i = 1. We can make a prediction about Y i based on X i by setting a threshold value T , and predicting Y i = 1 when X i > T . This is called a “diagnostic test.” 2

  3. Applications of diagnostic testing Cancer detection The amount or concentration of a protein X i in serum obtained from person i may be used to predict whether the person has a particular form of cancer. Credit scoring A person’s credit score at the time that he or she receives a loan may be used to predict whether the loan is repaid on time. 3

  4. Labeling conventions • The labeling of outcome categories as 1 or 0 is arbitrary in principal – for example we could label cancer as 1 and non-cancer as 0, or vice-versa. But in practice, label 1 is typically used for the rarer category, or the category that would require some action or intervention. Label 0 usually denotes a default category that requires no action. • Depending on the situation, it may be that either larger values of X or smaller values of X are associated with higher probabilities that Y i = 1. In the latter case we can work with − X i , or use prediction rules of the form X i < T rather than X i > T . 4

  5. Diagnostic testing terminology A diagnostic test is a balance between two types of successful predictions and two types of errors: • Successful predictions: True positive a situation in which X i > T and Y i = 1, for example when a person with cancer is predicted to have cancer. True negative a situation in which X i < T and Y i = 0, for example when a cancer-free person is predicted to be cancer-free. • Errors: False positive a situation in which X i > T but Y i = 0, for example when a person is predicted to have cancer but actually does not. False negative a situation in which X i < T but Y i = 1, for example when a person is predicted to be cancer-free but actually has cancer. 5

  6. Marginal categories • The actual status of a unit is positive or negative: Positive everyone with Y i = 1 (all true positives and false negatives). The proportion of positives is often called the “prevalence.” Negative everyone with Y i = 0 (all false positives and true negatives). • The predicted status of a unit is “called positive” or “called negative:” Called positive everyone with X i > T (all true positives and false posi- tives). Called negative everyone with X i < T (all true negatives and false neg- atives). 6

  7. The relationships among all these terms is summarized as follows: Called positive Called negative Positive True positive False negative Negative False positive True negative 7

  8. Sensitivity and specificity A common way to evaluate a diagnostic test is in terms of sensitivity and specificity. Sensitivity the proportion of positive units that are called positive, the pop- ulation value is P ( X i > T | Y i = 1). Specificity the proportion of negative units that are called negative, the population value is P ( X i < T | Y i = 0). Since sensitivity and specificity are calculated conditionally on case/control status ( Y i ), they can be estimated using either a population sample or a case/control sample. • 1-specificity is called the “false positive rate” (FPR) • 1-sensitivity is called the “false negative rate” (FNR) 8

  9. Example: Suppose we have a biomarker X i for colon cancer such that 75% of people with colon cancer have X i > T and 5% of people without colon cancer have X i > T . Thus the sensitivity is 75% and the specificity is 100%- 5%=95%. We then screen 1000 people from a population with 15% colon cancer prevalence. We should expect the following results: Called positive Called negative Positive 1000 · 0 . 15 · 0 . 75 = 112 . 5 1000 · 0 . 15 · 0 . 25 = 37 . 5 Negative 1000 · 0 . 85 · 0 . 05 = 42 . 5 1000 · 0 . 85 · 0 . 95 = 807 . 5 The overall error rate is 80 / 1000 = 8%, and there is a rough balance between false positives and false negatives. Most of the people who have colon cancer are detected. 9

  10. Example: Now suppose we are screening for pancreatic cancer with a preva- lence of 0.5% using tests with the same sensitivity and specificity. We expect to get: Called positive Called negative Positive 1000 · 0 . 005 · 0 . 75 = 3 . 75 1000 · 0 . 005 · 0 . 25 = 1 . 25 Negative 1000 · 0 . 995 · 0 . 05 = 49 . 75 1000 · 0 . 995 · 0 . 95 = 945 . 25 The overall error rate improves to 50 . 25 / 1000 ≈ 5%. The errors overwhelm- ingly consist of cancer-free false positives. Note that we could get an error rate of 0.5% by predicting everybody to be cancer-free. 10

  11. Sensitivity and specificity for normal populations Suppose that X | Y = 0 is normal with mean µ 0 and standard deviation σ 0 , X | Y = 1 is normal with mean µ 1 and standard deviation σ 1 . Sensitivity = P ( X > T | Y = 1) = P (( X − µ 1 ) /σ 1 > ( T − µ 1 ) /σ 1 | Y = 1) = P ( Z > ( T − µ 1 ) /σ 1 | Y = 1) = 1 − P ( Z ≤ ( T − µ 1 ) /σ 1 | Y = 1) P ( Z ≤ ⋆ ) can be obtained from a normal probability table. Exercise: Derive a similar formula for specificity. 11

  12. Positive and negative predictive values Another way to evaluate a diagnostic test is based on the positive and negative predictive values. Positive predictive value (PPV) the proportion of units called positive that are positive, the population value is P ( Y i = 1 | X i > T ). Negative predictive value (NPV) the proportion of units called negative that are negative, the population value is P ( Y i = 0 | X i < T ). • 1-PPV is called the “false discovery rate” – the proportion of called positives that are negative. 12

  13. Relationships between sensitivity, specificity, positive predictive value, and negative predictive value If we know the prevalence, we can use Bayes’ theorem to convert between sensitivity/specificity and positive/negative predictive values. For example: P ( Y i = 1 | X i > T ) = P ( X i > T | Y i = 1) P ( Y i = 1) /P ( X i > T ) PPV = sensitivity · prevalence /P (positive call) Exercise: Derive a similar relationship for NPV. Note: If pre − valance /P (positive call) is approximately 1 then the PPV and sensitivity are similar. Note: PPV depends on prevalence, so cannot be estimated from a case/control sample unless we have an independent estimate of the prevalence. 13

  14. Example: The probability of being a called positive in the colon cancer example above is 0 . 15 · 0 . 75+0 . 85 · 0 . 05 = 0 . 155. Thus the positive predictive value is 0 . 75 · 0 . 15 / 0 . 155 = 0 . 73. Exercise: show that the negative predictive value for the colon cancer ex- ample is 0.96. Example: For the pancreatic cancer example the probability of being a called positive is 0 . 005 · 0 . 75 + 0 . 995 · 0 . 05 = 0 . 05, so the positive predictive value is 0 . 75 · 0 . 005 / 0 . 05 = 0 . 075. Exercise: show that the negative predictive value for the pancreatic cancer example is 0.995. Note that pancreatic cancer screening looks easier than colon cancer screen- ing based on overall error rate (5% versus 8%) but PPV reveals that the pancreatic cancer test produces a high fraction of false positives. 14

  15. Which cancer is truly easier to detect? It depends on the follow-up: • Suppose that for colon cancer there is a secondary test that can quickly and safely differentiate the 113 true positives from the 43 false positives, and there is a treatment that substantially helps 50% of people whose colon cancer is detected at screening. Then the 43 false positive only need to go through the inconvenience and stress of a secondary test, and half of the 113 true positives have substantially improved outcomes. • Suppose that for pancreatic cancer the only way to confirm the disease is by an invasive procedure that has a 10% rate of serious complications, and therapy only improves the outcome for 20% of people with the disease. Then 4.6% (=46/10) of healthy people are put at serious risk in order to identify 5 people with pancreatic cancer, of whom only one on average will benefit from treatment. Note: the numbers used for the colon and pancreatic cancer examples are made up, but are roughly realistic. 15

  16. ROC curves Suppose we want to evaluate how much information a measurement X i con- tains about a characteristic Y i , but we don’t yet want to fix a specific threshold value T . A graphical approach is to plot sensitivity on the vertical axis against 1 − specificity on the horizontal axis for all possible values of T . 1.0 0.8 Sensitivity Specificity Sensitivity 0.6 Red 0.93 0.31 Blue 0.76 0.62 0.4 Green 0.50 0.84 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1-Specificity 16

  17. The following facts constrain a plot of sensitivity against 1 − specificity: • As T increases, the sensitivity is non-decreasing. • As T increases, the specificity is non-increasing, so 1-specificity in non- decreasing. • When T is −∞ the sensitivity and 1-specificity are both 0. • When T is + ∞ the sensitivity and 1-specificity are both 1. 17

  18. ROC curves A plot of sensitivity against 1-specificity is called a “Receiver Operating Char- acteristics curve,” or “ROC curve.” Due to the constraints discussed above, a ROC curve is a non-decreasing path from (0 , 0) to (1 , 1). 18

Recommend


More recommend