Furrywisepuppy - Base Rates, Base Rate Neglect and Bayesian Thinking

Post No.: 0526

Furrywisepuppy says:

Firstly some definitions – the base rate, prevalence or prior probability (which term gets used depends on the context) is the probability of some thing or event occurring, such as the percentage of the total population with a particular disease. The sensitivity is the proportion of people with the disease who’ll get a positive test result (the chances of a true positive). The specificity is the proportion of people without the disease who’ll get a negative test result (the chances of a true negative). The positive predictive value is the probability that you have the disease given a positive test result. And the negative predictive value is the probability that you don’t have the disease given a negative test result.

Bayesian thinking involves assigning probabilities to things or events based on one’s previous understanding of the world (or ‘priors’, which will mean things like base rates and prior odds) and then updating our hypotheses according to any conditions or new evidence gathered.

If we’re told that, within a city, 85% of taxis are green and 15% are blue, and a hit-and-run accident occurred where a witness identified a blue cab as being the perpetrator, and we know that this witness correctly identifies the colour 80% of the time and fails 20% of the time – most people would assume that a blue taxi was most likely the perpetrator.

If we’re told that, within a city, 85% of accidents involve green taxis and 15% involve blue, and again a hit-and-run accident occurred where a witness identified a blue cab as being the perpetrator, and again we know that this witness correctly identifies the colour 80% of the time and fails 20% of the time – most people would assume that a green taxi was most likely the perpetrator. However, both scenarios present statistically the same events! If 85% of the taxis are green then the prior probability is that 85% of accidents will be expected to involve green taxis.

Delving deeper, the probability of the witness being correct (that the taxi was blue when he/she thought it was blue) is 0.80 x 0.15 = 0.12. The probability of the witness being incorrect (that the taxi was actually green when he/she thought it was blue) is 0.20 x 0.85 = 0.17. Therefore the probability of the hit-and-run perpetrator actually being a blue taxi is 0.12 / (0.12 + 0.17) = 0.41. And therefore the probability of the perpetrator actually being a green taxi is 1 – 0.41 = 0.59. In other words, there’s statistically more chance that the perpetrator was actually a green taxi here.

When applying ‘Bayes’ theorem’ – the prior odds are the odds of the perpetrator being a blue cab calculated from the base rate (or the probability that the cab was blue, divided by the probability that the cab was green), and the likelihood ratio is the odds of the witness being correct from the sensitivity (or the probability of the witness correctly saying that the cab was blue, divided by the probability of the witness incorrectly saying that the cab was blue when it was green). The posterior odds equals the prior odds (0.15 / 0.85) multiplied by the likelihood ratio (0.80 / 0.20) = 0.71. And when we convert this into a resultant probability, the probability that the cab was blue equals 0.71 / 1.71 = 0.41.

This is a standard Bayes’ theorem problem, involving the base rate (15% of the taxis are blue) and the witness’s reliability or sensitivity (him/her being correct only 80% of the time). But we tend not to know how to use the base rate in the first presentation of the problem above and so we often ignore the base rate, and therefore in turn will assume that the perpetrator was most likely a blue taxi based on the witness being correct 80% of the time. Meanwhile, we tend to give the base rate considerable weight in the second presentation and so will more likely understand the true odds better in this second presentation, and therefore in turn will more correctly assume that the perpetrator was most likely a green taxi based on being explicitly told that the base rate is 85% of accidents involve green taxis.

This demonstrates the ‘base rate neglect’ or ‘base rate fallacy’. This is the tendency to ignore relevant base rates or general-population information while preferring to focus on only information pertaining to a specific case or individual as if it’s somehow automatically assumed to be a ‘special case’. People like to think they (or their own children) are special, such as other people who drink heavily have a 13% chance of getting cirrhosis but one will beat those odds based on some presumptions (e.g. one handles one’s drink better than others). But unless you have specific evidence to show why you belong to a special sub-population then join the queue because it’s not special to think you’re special(!) It’s as if one doesn’t believe one also falls under the more general population group. People might even ignore base rates altogether when estimating probabilities, and use other methods such as perceived similarities to a stereotype to make estimates instead. (Probability neglect was explored in Post No.: 0404 too.)

In the taxi scenario, the first presentation doesn’t suggest a causal narrative to the accident (just that more taxis are green), whilst the second presentation does suggest a causal story (that more accidents involve green taxis) thus leading us to conclude and stereotype that individual green taxi drivers must be more reckless (especially when primed with the description ‘hit-and-run’).

But although our answer would be correct if we thought that the perpetrator was more likely to be a green taxi following the second presentation – the reasoning by causality and stereotype isn’t necessarily the correct way to think either, because here the green taxis are involved in more accidents simply because there are far more green taxis than blue. It’s like companies in China manufacture more cheap goods than any other country in the world in large part because China currently manufactures far more goods than any other country in the world full-stop. Or it’s like more fluffy coyotes are killed by people in North America than anywhere else in the world because that’s where they’re native. Therefore applying a causal stereotype about every known and unknown individual manufacturer in China, or a causal stereotype about every person in North America being cruel to coyotes, would be fallacious. Woof.

There are statistical base rates, which are facts about a population in general but aren’t necessarily facts about any individual case within that population – these base rates tend to be underweighted or even neglected if specific information about an individual case is available. And there are causal base rates, which are treated as information about how an individual case came to be – and if a causal story takes the form of a stereotype (e.g. that green taxi drivers appear reckless) then these kinds of base rates can be overweighted (to suggest that every green taxi driver is reckless).

Therefore, in our minds, causal reasons or causal stereotypes tend to outweigh statistics. ‘System one’ can deal with causal stories and individual cases well but is weak in logical statistical reasoning. Causal reasons may be attributed to individuals (e.g. ‘all the graduates are bright’) as well as to situations (e.g. ‘the exam was easy’). We tend to read too much into causal base rates or causal stories and not enough into statistical base rates or logical facts. Causal base rates should really be given no weight in most contexts, such as hiring or profiling, because everyone should be judged as an individual. However, in some contexts where it’s more efficient in time and/or money to be correct ‘most of the time’ rather than ‘all of the time’, such as regarding which group of people to vaccinate first, they can be useful.

So stereotypes can sometimes be useful to make efficient judgements that work most of the time, even if they’re not totally accurate judgements that work all of the time. And that’s why such instincts evolved in humans and other animal species – some individuals will die for making the wrong judgements but enough won’t and will be able to pass on these instincts onto the next generation, and so forth. But – forgetting about the overall human population as a whole for the moment – if you want to increase your own odds of making better judgements and the right decisions then it’s best to learn to apply statistical reasoning.

Whenever a stereotype is perceived, one will also tend to look for causal stories to explain them. Mere statements like ‘an interest in surfing is widespread in Australia’ can evoke a causal relation in system one – that being Australian or being in Australia ‘somehow causes’ people to be interested in surfing – and thus an assumptive stereotype whenever one meets or thinks of a random individual Australian.

But saying something about a population as a whole doesn’t necessarily speak about any individual within that population. Yet we prefer simple causal relationships and generalisations to understand the world, or we naturally favour efficiency (or laziness) over accuracy, even though in reality the world is far more complex and nuanced than what we tend to perceive.

Bayes’ theorem can also be demonstrated with a drug testing example. If a drug test is 99% sensitive (i.e. it’ll produce true positive results 99% of the time) and 99% specific (i.e. it’ll produce true negative results 99% of the time), and if only 0.5% of the population are users of this drug that’s being tested for, then if a randomly-selected person tests positive for it, the probability that he/she is actually a true user of this drug is only 33.2%. This means that despite the apparent accuracy of the test and despite this person testing positive – it’s more likely that they do not use this drug than do.

The true positives equals 0.99 x 0.005 = 0.00495. The false positives equals 0.01 x 0.995 = 0.00995. The true negatives equals 0.99 x 0.995 = 0.98505. And the false negatives equals 0.01 x 0.005 = 0.00005. The surprising result is because the number of non-users is very large compared to the number of users i.e. the base rate is very low, thus the number of false positives (0.995%) outweighs the number of true positives (0.495%).

So if 1,000 random people are tested, there are expected to be 5 users, of which ~5 true positives will be expected; and there are expected to be 995 non-users, of which ~10 false positives will be expected. And since, out of all of the positive results (both true and false), only ~5 out of ~15 positive results are true positives, a positive test result is only expected to be correct about a third of the time.

What will greatly improve our confidence in the result is repeating the test on someone who has tested positive for their first drug test, because the chances of two false positive results happening consecutively will be far less than one false positive result happening. One positive (pre) test result may be due to chance, but a repeated (post) test result that turns out positive too will most likely be a correct hit, depending on the base rate, sensitivity and specificity. Even a highly accurate test will have a low predictive reliability if the event tested for is extremely rare i.e. has a very low base rate. So the rarer an event or occurrence in a population, the more inaccurate a given test for it will be – and this is why repeating tests is crucial in these cases, and why one must not jump to conclusions based on a single test.

Woof! It’s really important to understand how statistics work and to apply Bayesian thinking and conditional probabilities in order to make better judgements and smarter decisions.

Comment on this post by replying to this tweet:

BASE RATES, BASE RATE NEGLECT AND BAYESIAN THINKING