12 Bayes Rule

Most of you have had a lateral flow test for COVID-19. This is the one you do at home with test kit that looks like a pregnancy test. Suppose you get a positive result from a lateral flow test. What is the probability that you actually have COVID-19?

First, we need to define two parameters associated with diagnostic tests, sensitivity and specificity.

Sensitivity is the probably of a positive test given that you have COVID. The name makes sense - how sensitive is the test for detecting the disease?

Specificity is the probability of a negative test given that you don’t have COVID. This name is less intuitive. You might be more familiar with the ‘false positive rate’, which is (1-specificity), which is the probability of a positive test given that you don’t have COVID.

To find out the probability you have COVID given a positive test, we need to reverse the conditional probability associated with sensitivity. For this we need Bayes rule:

\[P(B|A) = \frac{P(A|B)*P(B)}{P(A|B)*P(B)+P(A|!B)*P(!B)}\]

In terms of sensitivity and specificity, let

Remember, \(P(A|B)\) is the conditional probability that A happens given that B is true.

We’ll let

A = positive, you got a positive COVID-19 test, and

B = COVID, you have COVID-19

In terms of our test:

sensitivity is \(P(positive | COVID) = P(A|B)\), and

specificity is \(P( !positive | !COVID) = P(!A|!B)\)

(I’m using R’s convention of ‘!’ to mean ‘not’)

Bayes rule needs one more parameter, \(P(B)\). For a diagnostic test, this is \(P(COVID\), which is the probability that the person getting tested has the disease. This is sometimes called the ‘baserate’ or ‘prior probability’.

Putting this together:

\[P(COVID|positive) = \frac{P(positive | COVID)*P(COVID)}{P(positive | COVID)*P(COVID) + P(positive | !COVID)*P(!COVID)}\]

\[= \frac{sensitivity*baserate}{sensivity*baserate + (1-specificity)*(1-baserate)}\]

Let’s talk about some real numbers. The reports out there vary a lot. A meta-analysis from August 2021 found that the sensitivity of lateral flow tests range from as low as 38% to as high as 99%. That’s not very helpful. We’ll consider a recent paper in PLOS biology which reports that when done at home, the sensitivity of the lateral flow test is estimated at 58%. (it’s 79% when done by a trained scientist). Let’s use that 58% number.

The same meta-analysis states that the specificity ranges from 92.4% to nearly 100%. That’s a big range. Let’s pick something in the middle, like 96%.

Estimating the baserate is complicated because it depends on who is taking the test - or more formally, what population is the test-taker coming from? If this person is taking the test because he or she is feeling symptomatic, then the probability of infection is much higher than if this is a mandatory test due to work requirements. Is this person vaccinated? Vaccinated individuals have a much lower baserate than unvaccinated.

Let’s take a specific example. My youngest son, Teddy, is going to a school in York, UK, this year. The school requires each kid take a lateral flow test at home twice a week. Last week he tested positive. He didn’t have a cough or any other symptoms, but there have been about 8 positive cases in his class over the past month (!) Let’s set the baserate pretty high and say he actually has a 1% chance of having COVID on any given day.

Sure enough, Teddy tested positive last week.

We’re ready to calculate the probability that he actually has COVID given his positive test. Here’s some R code that sets up the variables and then uses Bayes Rule to calculate the probability of infection, pCOVIDpositive:

sensitivity <- 0.58
specificity <- 0.96
baserate <- .01

pCOVIDpositive <- sensitivity*baserate/
       (sensitivity*baserate + (1-specificity)*(1-baserate))

Here’s the math with our numbers:

\[P(COVID|positive) = \frac{0.58 *0.01}{0.58 *0.01 + (1-0.96) *(1-0.01)} = 0.13\]

This might seem low to you. It does to me - I had to check the code. Why does this seem so low? There is a lot of psychological research on this ‘base rate fallacy’. Wikipedia has a good page on this’. which shows that we tend to equate the two reversed conditional probabilities: P(COVID | positive) and P(positive | COVID) and fail to take into account the effects of the baserate and false positives.

Mathematically, the probability is low because of the low baserate which lowers the numerator \(sensitivity*baserate\), and because of the influence of false positives, which increases the term in the denominator \((1-specificity)*(1-baserate)\).

By the way, you can copy this R markdown script and play with the numbers yourself at:


The text will change each time you knit the document!

OK, so Teddy tested positive. Now what? Naturally, we tested him again, and he tested positive again. Does he really have COVID? Let’s calculate. Thanks to Bayes Rule, we have a new baserate - we now think the probability of him having COVID is 0.13. Here is the new calculation

sensitivity <- .58
specificity <- 0.96
baserate2 <- pCOVIDpositive  # note the new baserate based on the previous test

pCOVIDpositive2 <- sensitivity*baserate2/
       (sensitivity*baserate2 + (1-specificity)*(1-baserate2))

\[P(COVID|positive) = \frac{0.58 *0.13}{0.58 *0.13 + (1-0.96) *(1-0.13)} = 0.68\]

The probability that he has COVID jumped up from 0.13 to 0.68. But you might expect it to be higher. After all, if someone has COVID, the probability of two false positives is only \((1-specificity)^{2} = (1-0.96)^{2} = 0.0016\). This means that if Teddy actually has COVID, the probability of one or two positive tests is \(1- 0.0016 = 0.9984\). But again, this neglects the influence of the baserate and false positives.

After a positive test in the UK you’re required to get a much more sensitive PCR test, which thankfully turned out negative.