Furrywisepuppy - Reading Numbers and Statistics in Science and the Media

Post No.: 0167

Fluffystealthkitten says:

Always read statistics carefully – read what they’re saying exactly, not what you think they’re saying. For example, don’t read too much into outliers yet don’t completely ignore them either, and don’t over-extrapolate statistics to mean something they weren’t ever saying.

Even when they’re not lies (although sometimes they are!) – numbers and statistics can be biased, cherry-picked, missing, over-extrapolated, twisted and stated out of context or misrepresented. A common tactic is ‘don’t lie, imply’ and let the readers or consumers jump to the implied conclusion themselves. But it’s not always fair to blame the readers or consumers because some implications are stated too unambiguously that they’re tantamount to lies.

Absolute frequencies (the total number of times a particular event occurred e.g. A happened 12 times) can be presented as relative frequencies (the absolute frequency but normalised by dividing it by the total number of events e.g. A happened 12 times out of 30 attempts), or vice-versa, to suit an author’s conclusions and agenda better – so try converting absolute values into relative percentages or ratios, and vice-versa, to get both perspectives. For instance, a low relative percentage decrease doesn’t necessarily mean a low useful real-world absolute value decrease (e.g. a 0.3% reduction in tuberculosis deaths per year will still result in thousands of lives saved). So check if the statistics are really that big/small or good/bad as the author tries to make them sound.

In the media, statistics can be cherry-picked to serve the most impact, such as stating the relative risk increase if the base natural frequency is low e.g. a 1% base rate or prevalence that increases to 2% can be expressed as a ‘100% risk increase’ or a ‘doubling of risk’, which makes it sound like a huge threat, but the overall risk is still just a very low 2% in natural frequency terms, thus sensationalising what is really not that sensational. There are likely more important things to worry about in the grand scheme of things i.e. things that affect more than 2% of the population in this example, although if a small problem is expected to keep rising then it might be wise to tackle it whilst it’s still small. Meow.

So always ask for absolute frequency or natural frequency figures (e.g. 7 out of 100 people, or 15 out of 1,000 cases), for these are generally more informative, rather than rely only on relative probabilities since these can sound big or small depending on what one is comparing the figures to. Or best use both statistics in order to help you make up your own mind (e.g. the absolute number of sales of product B, and the relative change of sales of product B over the last few quarters, in order to decide whether product B is still important to a company’s product portfolio or not).

A story with statistics could be made to sound more alarmist by e.g. stating that 50% of the world’s countries are C. But when you drill down into the details, that doesn’t mean that 50% of the world is C. Those countries could happen to be the smallest, and even together won’t make up 10% of the world’s population or economic output. (Of course, the presentation could be reversed to make a story sound less severe than it really is too.) Countries come in a vast range of sizes yet each will count as one country each. So the details are always important to read and we must look beyond the method of presentation used or the headlines.

It demonstrates that quantitative data or numbers can be highly questionable too, rather than just qualitative data or words (such as ‘strong effect’ or ‘reliably’). The chosen statistics highlighted by an author may be true in themselves but could be one-sided and only a part of the story or one possible perspective and therefore are biased and misleading. It’s not just about what is stated but also about what has possibly been left or edited out. So don’t just focus on the statistics the authors are trying to draw your attention to – look at all the data and take note of the data points they barely want to discuss too. Think of reasons why the authors, marketers, politicians or whoever chose to highlight a specific subset of data points rather than a simpler and more general set (e.g. it may be true that crime statistics were high during one or two particular months in a particular city compared to another city but what about the more general picture of the entire year?) Think of any potential confounds or alternative explanations why a statistic is the way it seems to be?

$7 million sounds like a big number, but not after dividing it between 7 million people. Therefore even cold, hard and objectively truthful numbers can be manipulated or presented in certain ways to achieve news sensationalism or to unfairly serve certain self-interests. Other shady techniques with numbers include – rounding up or down arbitrarily, only using the figures for the latest few cases or arbitrarily recent time period, arbitrarily selecting one length of time period to study over another, comparing the worst of one time period to the best of the next, not disclosing selection biases, or simply fabricating figures completely!

When reading or reporting a risk, you should want to know – which group are we talking about (e.g. 40-50 year-olds), the baseline risk (e.g. 4 out of 100 women will have D over 10 years), the change in risk as a natural frequency (e.g. 3 out of 100 additional women will have D over 10 years), and what exactly is causing that change in risk (e.g. 1 pill or 5 pills per day)? Take into account a growing or changing total population size too if noting an absolute increase in the total number of cases of something. For example, if there are more A-grade pupils in school this decade compared to last decade then could this be at least partly down to the population, and therefore number of pupils going to school, rising during this time period?

Some furry weasels are cool, but watch out for weasel words such as ‘up to’ or ‘may’ (e.g. ‘up to 17Mbps broadband’ or ‘this may make you rich!’); although in some contexts such language cannot be avoided and isn’t intended to paint an unrepresentative picture (e.g. ‘may contain traces of nuts’). ‘May’ could legitimately mean more research is required to be surer of the conclusion, the conclusion only applies to certain people or certain contexts thus it depends, and/or it’s complicated because other exogenous factors play a role too so this thing plays a contributory but variable role (e.g. something may play either a key or imperceptible role in helping or harming you depending on your own set of individual circumstances) – so again read the details carefully.

‘Up to…’ though, just states an upper bound or maximum, so if I say, “I’m going to give you up to 3 blue diamond hairballs this month” then as long as I don’t give you any more than 3 this month then I’ve not lied. In fact, since it doesn’t state a lower bound or minimum, if I give you nothing then I’ve not lied either!

I also want to note here that, in mathematics, precision relates to how many significant digits or decimal places a figure goes to (e.g. the number 2.563 is more precise than the number 2.6), whereas accuracy relates to how close to the true value a figure is (e.g. if the true length of a road is 60 miles then the measurement 59 miles would be more accurate than 55 miles). But in other contexts, precision relates to the consistency of a grouping of values made for the same intended measurement (e.g. the group of values 3.4, 3.1 and 2.9 are more precise than the group of values 7.3, 1.3 and 4.8), and accuracy again relates to how close to the true value a figure is; and in statistics, the word variability is usually used instead of imprecision, and the word bias is usually used instead of inaccuracy. The main takeaway is that precision and accuracy are independent from each other and don’t mean the same thing in scientific contexts.

Very crucially too, in scientific studies, we cannot just look at a raw ‘n in every 1,000’ rate increase and then automatically deem that as down to the intervention – we must work out if that result was likely down to chance or not. Something being ‘statistically significant’ means that the result was not likely down to mere chance or luck – and only rigorous statistical tests can truly determine if the ‘null hypothesis’ (the default position, which is that there is no relationship between two measured phenomena) can be rejected, especially with small differences or small sample sizes. (Post No.: 0141 looks more into the laws of large and small numbers, which concerns what can happen when we use different sample sizes.) Getting a bit more technical – typically a ‘p-value’ of ≤0.05 is used i.e. for every 100 times one does a certain experiment, one should expect a spurious, chance or coincidental positive result ≤5 times on average, hence one will be ≥95% confident that a positive result is not by chance but is real.

You’ve also got to check the actual values and not just the ordinal rankings of results. For example, group E may be greater than group F according to some aspect, but it might make a huge real-world difference if it was just by the smallest possible unit or by a million units (e.g. the richest group has $5,318,008, the second-richest group has $7,734 and the third-richest group has $3,507). And what about the variance (or the size of the spread of results) within each group rather than just their collective average or mean? In another example, there may be a performance rank order to a particular group of individuals, but are all of them essentially (not) good enough when we assess their stats in their own right?

We must also pay attention to ‘effect sizes’. A news article could present the conclusion that G causes H and make a big fanfare out of it – but actually delving into the source research material, it states a statistically significant probability that G contributes to H but only by a tiny amount. So even though it seems true to say that G likely causes H and this result is statistically significant – the effect size could be so small that it has very little practical significance (e.g. something that cures baldness by a mere 1.5%!)

And in many cases, it’s easier to improve a bad thing/situation than an already good thing/situation in relative terms (e.g. in contexts such as economic growth, exchange rates, product sales), so take note of the background and context too then ask ‘is that growth/decline rate figure really that great/bad in the big picture?’ When you’re at the top, staying there is seen as nothing but falling to second is seen as bad (and the media might make this situation sound like a crisis), and when you’re at the bottom, staying there is seen as nothing but climbing to second-from-bottom is seen as good (and the media might make this situation sound like a triumph) – but where would you rather be?

…This is all why we cannot just rely on reading headlines or even full news articles unless they present all of the relevant information there, even if no falsehoods have been presented. We must search for and peruse the source research papers, look at the methodologies and raw figures in the results, and not just the summary statistics, and then come to our own conclusions about what it all means.

Meow!

Comment on this post by replying to this tweet:

READING NUMBERS AND STATISTICS IN SCIENCE AND THE MEDIA