Here’s How to Understand What a “95% Accurate” Test Is Actually Telling You
An article on the misuse of health screening tests highlights a concept everyone should know
Welcome to the first Range Widely of 2022. I hope your new year is off to a wonderful start!
If you missed last week’s post with my favorite opening lines from 2021 (how dare you spend time with your family when you could be reading my newsletter), you can read it here.
As always, you can subscribe here:
And please feel free to join the conversation; I respond to many of the comments below each post.
Over the weekend, the New York Times ran a long article about prenatal screening tests that mislead expectant parents into thinking that a child will have a rare disorder, when in fact they will not. Obviously, this has tremendous implications. The article cites an instance in which an expectant mother ended a pregnancy based on a false test result.
The piece is worth your time, with excellent visualizations, dramatic personal stories, and insight into the growing prenatal screening industry. But whether or not you have time to read it, I want to highlight an idea at the core of the story — one that I wrote about in the last chapter of Range, and that I think is important for pretty much everyone to understand, with implications far beyond prenatal screening.
Pretend You’re A Doctor…
…and here’s the scenario:
You’re using a new test for Disease X, which afflicts 1 out of every 1,000 adults. The new test has perfect “sensitivity,” i.e. it detects every single true positive case of Disease X. It also has a false positive rate of 5%. Your last patient doesn’t have obvious symptoms, but you just got their positive test result. What is the chance that they actually have Disease X?
For a 2014 study, the above question was given to doctors and med students. The most common answer they gave was that the patient has a 95% chance of actually having the disease. The correct answer, however, is that there is only about a 2% chance that the patient actually has the disease — 1.96% to be exact.
The Explanation
Let’s say we randomly test 10,000 people for Disease X; remember, only 1 in 1,000 people get Disease X. If our sample is representative of the general public, 10 of those 10,000 will have Disease X. Because the test has perfect sensitivity, all 10 of those people will get a true positive result.
But given the 5% false positive rate, 5 out of every 100 people tested in the entire group will get a false positive. That’s 500 false positives in 10,000 tests.
So in our batch of 10,000 test results, there are 10 true positives, and 500 false positives — 510 positives overall. Thus, the chance of a patient who tests positive actually having the disease is 10/510, or 1.96%.
So The Screening Test Is Total Crap, Right?
No. At least, I don’t think so. But it needs to be understood in context. If I were the patient who got that positive, I should think that there’s about a 2% chance I actually have the disease, whereas before I took the test, I thought I had a 1 in 1,000 chance of having the disease.
I still probably don’t have it, but I’m more suspicious. And if the disease is really bad, I might want to do follow-up testing.
The problem, as noted in the New York Times piece, is that patients generally don’t think of screening tests this way, and sometimes take any positive as a true positive, and proceed to medical interventions that can have drastic effects on their lives. The 2014 study I cited above suggests that there are also probably plenty of healthcare practitioners who aren’t thinking about this issue in a way that will empower them to help patients interpret the results of screening tests.
At Some Level, You Know This Already
I want to share a silly, fictional example, just because I hope it will be intuitive and stick in your mind when you face a situation like this:
Disease Y is a terrible new disease that afflicts 1 out of every billion people on Earth. The cure for Disease Y is a new and ingenious technique that unfortunately involves that machine in The Princess Bride that suctions years of life away.
There’s also a screening test for Disease Y that detects every true positive, and has a mere 1% false positive rate.
Now, if you test positive for Disease Y, will you assume you have it and jump on the Princess Bride torture machine app to schedule an appointment as soon as you can? Probably you will not. Instead, you’ll probably think something like: “Huh, only 8 people on Earth have Disease Y. I should probably get more information before assuming I have it and scheduling the suction machine.”
That’s your intuition telling you that this disease is so incredibly rare that it’s just not likely you have it, and that — because the treatment is really rough — you’ll need overwhelming evidence to convince you otherwise.
And that’s exactly as it should be. If we screen all 8 billion people on earth for Disease Y, 1% of them (or 80 million) will get false positives, and only 8 will get true positives. Your chances just went from one in a billion to one in 10 million, but (assuming you don’t have symptoms or other indications) you still probably don’t have it.
The more rare the condition — as the Disease Y example conveys — the lower the false positive rate must be in order for false positives not to outnumber true positives. This is one reason why guidelines for prostate and breast cancer suggest that screening for most people should not begin until at least middle age. As people age, the prevalence of those cancers in the population increases, and so does the probability that a positive test is a true positive. In other words, the ratio of true positives to false positives increases as an illness gets more common, making the screening test a more reliable tool.
But again, even if a test produces more false than true positives, that doesn’t necessarily mean it’s terrible; it really depends on what happens after the results come in. If false positives lead a lot of people to pursue unnecessary interventions or adopt counterproductive behaviors, that’s very bad.
For example, early in the pandemic, Christie Aschwanden wrote about how Covid antibody tests faced this false-positive issue, and might have led a lot of people to assume they’d had Covid and developed immunity, when in fact they had not.
It’s Just One Tool
A big topic in the New York Times article is how some labs market their screening tests for rare diseases — making them seem definitive because of their low false positive rates. You really have to know how common an illness is in the first place (its “base rate”) in order to know the test’s “positive predictive value,” or how definitive a positive result really is.
I think this is a really important issue to keep in mind amid the proliferation of companies that promise quick and easy screening tests for rare (or extremely rare) conditions.
Screening tests are tools that must be understood in context. That context includes all the other information about the condition the test is looking for — like how common it is — and the other reasons why a patient might have the condition, like symptoms or exposures. Doctors I’ve interviewed about this have sometimes used the phrase “index of suspicion” to refer to how all of these individual pieces of information go into a decision-making stew that helps them decide how likely a patient is to have a condition. A screening test is just one tool.
And as Arnold Toynbee put it in A Study of History:
No tool is omnicompetent. There is no such thing as a master-key that will unlock all doors.
Yeah ok, I just love that quote and look for any excuse to share it. I’m sure it won’t be the last time.
Thank you for reading. And if you found this post valuable, please share it.
If a friend sent this to you, you can subscribe here:
Until next week…
David
P.S. If you liked today’s post, you might like this one I did in October; it’s about how new aspirin guidelines highlight a health concept that most people have never heard of but that everyone should understand.