Furrywisepuppy - Question Even the Numbers

Post No.: 0559

Furrywisepuppy says:

We usually think that qualitative information is always subjective or vague (e.g. ‘the dog smells like a rower’s armpit’) whilst quantitative information, involving numbers, is always objective or definite (e.g. ‘the dog measures 89.2cm from nose to tail’).

Indeed, the word ‘most’ just means anything between 50.1% and 99.9%, which is quite a broad range; and the word ‘some’ can mean anything, say, between 0.1% and 50%. According to current UK law, an advertised APR or AER for loans or savings, respectively, only needs to be given to 51% of successful applicants for a bank to be able to call it ‘typical’ or ‘representative’ – the rest will get a worse rate. Using these sorts of fuzzy weasel words or ‘guarding terms’ isn’t untruthful – and is frequently the most succinct and fair conclusion we can state for the given evidence in a complex real world – but one must bear the above in mind.

However, we must also question even numerical or statistical data too. For example, the number of goals a striker scored one season doesn’t reveal enough information about whether most of them where due to, say, lucky deflections, fantastic assists or penalty kicks; thus don’t merely value a striker based on their goals statistic. The numbers – or at least the specific numbers gathered – may not tell the entire story. So a person’s BMI score, when we’ve not gathered any other numbers such as their fat or muscle ratio, might lead us to the wrong conclusion.

It’s not just about the reliability of the data or how those numbers were obtained but also about how one processes and interprets or infers conclusions from them i.e. what are those numbers saying? We must also understand the differences between different scientific terms (e.g. the risk difference versus the relative risk, odds ratio or hazard ratio).

Is the number 5,346,012 a big or small number? We can only answer this by comparing it to some other number as context. So if you want €100M/year to seem large then simply present that number or state how many school meals that could fund. Or if you want it to seem small then present it like it’s ‘just’ ~€1.50/person/year perhaps.

We need to see numbers in perspective, context or proportion – hence rates are typically more useful than bare absolute numbers (e.g. deaths/100 cases, carbon emissions/capita). Lots of politicians prefer to use absolute numbers when they want to express an improvement to some public service or scheme, but these figures may fail to account for the generally rising population during the same time period (e.g. 100,000 more jobs were created since last year… but the working-age population rose by 200,000 during this time!) An employment statistic may not reveal what types of jobs they were either, so it may be ‘the lowest unemployment rates for twenty years’ but were they full-time, part-time or gig work jobs? There are also other ways to operationally define things like ‘poverty’, or what should be included in a market basket when calculating inflation too. The politician (or business, other organisation or individual) isn’t lying per se but they’re not presenting the useful context or full picture.

Authors, reporters or marketers may cherry-pick only certain parts of the data that are favourable to the conclusions they want to portray. For example, with a face cream that reportedly reduces the appearance of wrinkles after a measure made 30 minutes after application – you might want to ask what the appearance looks like after 30 minutes? Perhaps it starts to dissolve your flesh, but hey, you’ll have looked fantastic during the 30th minute so they didn’t actually lie in the advert(!) Maybe they did measure the appearance after 60 minutes, 90 minutes, etc. but chose not to disclose those results because they were not favourable when trying to promote their product? Who knows unless you’re given the full set of data.

Hence reporting a result that only occurred in very specific conditions may be truthful but not terribly useful to know – it’d be like declaring that one was the best singer… in this house… this morning… for a person who was wearing orange socks(!) The details always matter when reading statistics. Reading just the headlines is seldom informative enough.

How we present data or other information matters. In under-regulated markets, broadband providers usually only state the best-case-scenario speeds (e.g. up to 64Mbps download speed) but this seldom represents what the majority of their customers will get. Thus we should not (only) present the best or worst case (ex post or predicted) predictions but the full range of likely possibilities (where there is, by usual convention, a 95% chance that the (ex ante or actual) real result will fall within this range). This is why graphs that project predictions should present this range of uncertainty, as well as the most likely prediction, which would normally be somewhere right in the centre of this range. Current measurements aren’t always precise too (e.g. trying to measure the average weight of the total human population when one has only the resources to measure a sample, or subset, of this population). This is all why graphs that indicate such predictions or measurements should include error bars that show one standard error of uncertainty or a particular confidence interval (usually a 95% CI).

In the social sciences at least, don’t take figures such as population averages or means as always absolutely accurate – as a rough rule (unless the data already presents error bars) allow a few percent plus or minus either way (e.g. if Country A has a literacy rate of 75.6% and Country B has a literacy rate of 77.4% then they’re essentially in the same ballpark as each other and it’d be silly for Country B to be that proud over Country A).

The way data is visually displayed can shape our conclusions too. The choice of a linear versus logarithmic graph scale, and/or what value the y-axis starts at (e.g. at 50 rather than 0), could make a 0.1% difference between two groups seem huge. Or presenting just group averages can hide the spread or variance of the data within those groups, where these groups could actually have hugely overlapping data points (e.g. the average height of men is greater than for women but the variance would show that there’s a tremendous amount of overlap i.e. some women are taller than some men). If there are averages then there’s probably a range or variance.

Comparing two extreme examples (e.g. the heaviest man with the lightest woman) can also exaggerate the presence of a gap, when in reality there is a blend of individuals and the vast majority of these individuals sit somewhere in the middle of these two extremes. So understand that the average isn’t the range, and the majority doesn’t mean everybody.

Our perceptions of figures can also depend on what units are used (e.g. millimetres or kilometres). We’re not always good at converting units in our own heads.

Extrapolating trend lines on graphs is as natural an instinct as predicting the trajectory or path of a tennis ball that has been thrown in a game of fetch – but graph lines on paper aren’t objects that must follow the laws of motion for physical objects, thus they don’t always extrapolate into the future in a way that one may assume (e.g. linearly).

When we’re presented a plain or bare statistic, we must seek the reasons behind it rather than assume a reason that fits with our own agenda. For example, if one country has more recreational drug-related deaths/capita than another then is this because more people are taking these drugs there, is there less help for those who take drugs there, is it the laws (or lack of) there regarding tackling suppliers and criminalising drugs, is it a lack of enforcement despite what laws are present, is it possibly a fundamental problem of collecting accurate enough data or estimating the number of users in one country compared to another, and so on, or a combination of such factors? With just the summary statistic presented, we cannot be sure. Lots of numbers have had to be read very carefully during the COVID-19 pandemic (e.g. excess deaths, vaccine efficacy).

More deeply, maths – as in everything adding up correctly – is necessary but is not sufficient on its own as unequivocal proof of anything. Mathematical theories are correct because they can be logically deduced according to a set of accepted axioms. But many theories add up on paper but fail to match the observed evidence. Therefore something can be logical but not empirical/real. Not all mathematical models represent reality. As a simplified illustration, if one sees 0 apples on a table one moment, turns away, then sees 1 apple on the same table a moment later, it doesn’t necessarily mean that ‘plus 1,001 apples’ then ‘minus 1,000 apples’ happened, even though this adds up correctly. The numbers may tally correctly but it doesn’t necessarily mean that reality is/was that way.

So just because something adds up mathematically, it doesn’t necessarily mean it’s the right or only correct model – some things have multiple different mathematical descriptions (e.g. the many-worlds, de Broglie-Bohm, and Copenhagen, interpretations or models of quantum theory are all currently valid). And which perspective we take will shape our own version of reality even though there are potentially other equally correct and therefore equally valid interpretations of reality. (There are also subjective perspectives on whether differing interpretations matter at all – some believe that if a model apparently predicts reality correctly then that’s all that matters.) So even if your own view is correct – there may still be other equally correct yet different views, and this is why we must remain open-minded. There are facts, but then there are also our limited understandings, perceptions and resultant interpretations.

Even in a subject that’s as objective as mathematics, the first of Gödel’s incompleteness theorems states that (simplified here) certain true statements cannot be proved!

We also in many instances forget or fail to realise that many things we accept without question in our day-to-day lives are in fact arbitrary – like why do we usually celebrate the top 3 in a competition and not the top 4, 5, etc. or just the top 1? Why do we use the decimal number system and not a hexadecimal or other base system? Some numbers don’t even fit in a decimal system at all (e.g. irrational numbers like 1/3rd or pi). The number system we select means we chase arbitrary round-number targets – like completing a 2-hour marathon, having €1M in the bank, being 6’ tall, and so on, without questioning why?! The length of a day, month or year is determined by ‘reasonably consistent’ natural events but why are there 60 seconds/minute, 60 minutes/hour or 24 hours/day? The day chosen to be the first day of the year is arbitrary because where’s the beginning of an elliptical orbit?

Not numerical but the Mercator projection and which way up a world map ‘should’ be aren’t objective. Infrared and ultraviolet images are false colour images. So many things we unquestionably accept are conventions rather than are objective, and these arbitrary presentations can affect how we literally view the world!

Sticking with traditions has merits but we must occasionally question them. It’s like neither nature nor logic dictates that the male lineage is more important than the female lineage – it’s 50% genes from the mother and from the father, hence a female heir should be just as valuable as a male heir. It’s just the arbitrary choice of (only) taking the father’s and husband’s surnames in most cultures. So historical or conventional reasons don’t make things non-arbitrary – and we should be allowed to question arbitrary things.

In short, mathematics is the language of nature but even numbers mustn’t always be assumed to be objective or unquestionable. Our critical-thinking hats must stay on.

Woof!

Comment on this post by replying to this tweet:

QUESTION EVEN THE NUMBERS