**Post No.: 0141**

Furrywisepuppy says:

**The law of large numbers** tells us that large sample sizes of random events will more likely produce one or two extreme events compared to small sample sizes, but these extreme events could be just purely coincidental random chance events (e.g. if millions of people play the lottery, the chances of *at least somebody* winning increases, but it doesn’t mean that there’s anything special about this person – it’s just statistical chance. Or if there’s a room with 60 random people then there’s actually over a 99% chance that *at least two of them* will share the same birthday). This is currently a major problem in some fields of scientific research when using machine learning algorithms and ‘big data’ to look for patterns – when patterns are found within large sets of noisy data, sometimes these patterns are merely chance artefacts within the data rather than representative of anything remarkable happening in the real world. It also manifests in some superstitions (e.g. being ‘cursed’ then finding *something* go wrong somewhere in the millions of things that could go wrong at any time in your life afterwards).

**Research hypotheses, and prophecies, therefore need to state upfront exactly what they’re testing for and hoping or predicting to see**, rather than creating post-hoc hypotheses after the data and results have come in. It’ll be more impressive to predict that a specific person will slip on a rainbow trout at 1:13pm tomorrow, and then they did. Yet this still might not indicate that you’ve got clairvoyant powers because that might have been a one-off fluke…

**The law of small numbers** tells us that small sample sizes can also create the illusion of patterns, trends or ‘rules of thumbs’ when something is actually random (e.g. rolling 4 sixes out of 4 rolls of a die isn’t truly sufficient to indicate that the die is rigged because it can happen by pure chance with a fair die – however, rolling 10 sixes in a row would be a far better indication). When we see a pattern, we assume a cause for that pattern, which would be erroneous if something is *random*, or we can assume the wrong cause if there’s a *sampling bias/error* (a systematic skew in how those samples were selected e.g. most of the participants in a study were university students hence their results might not be generalisable to the wider, general population).

Extreme average outcomes – either high or low – are statistically more likely to be found in small samples than large samples, creating artefacts in observations. It’s why the two countries with the tallest and shortest populations on average are both not likely to have the largest populations in the world. And it is why research requires large enough sample sizes to better cancel out any random chance events, and why one must also look at the true negatives as well as the true positives. For example, the *best* performing schools may have small classes, but the *worst* performing schools may also have small classes too (and if we only hear the first statistic, our fuzzy coherency and causal story-seeking system ones might say that ‘it’s obviously because of the greater amount of teacher attention each pupil receives’, or if we only hear the second statistic, our fuzzy coherency and causal story-seeking system ones might say that ‘it’s obviously because of the lack of other pupils to interact with’! **In other words, we’ll spontaneously infer causal explanations for any (seemingly idiosyncratically) co-occurring events even when a connection is completely spurious.** Woof!

The smaller the sample size, the more chance we’ll see artefacts such as homogenous streaks one extreme way or another (e.g. flipping all heads or all tails if only flipping 3 coins at a time compared to flipping 6 coins at a time), but it’s just likely due to chance, not causality. In sports like football, pundits frequently hype players up too much, and clubs overvalue such players, when they’ve merely had one amazing season, but that one season might be a freak season rather than the sign of a true pattern to their long-term performances.

When comparing samples with other samples, small samples can have larger variances in their means or averages. Hence one must consider the *variance* and not just focus on the mean, because larger variances indicate greater unpredictability or uncertainty in any conclusions derived from a set of data (the *error bar* will be large). The chances of error are also much greater with something that has a 51-49 percentage split than a 99-1 split, for instance.

**So with small samples, there’s nothing we can (confidently) explain in terms of causality.** One can calculate the recommended sample size required for a survey or research experiment one wishes to conduct based on the population size and the confidence interval and confidence level one is willing to accept.

**This all relates to ethnic stereotyping and racism.** For example, a young person who faced bullying from a few members of a certain ethnic group growing up with a generalised hatred towards that entire ethnic group – but do those *few* members truly represent that entire ethnic group? And then that person as an adult watches the news and hears one or two members of that ethnic group committing a terrible crime so reconfirms that they must be all like that – but out of the *millions* of members of that ethnic group, wouldn’t there bound to be at least a few members who confirm that bias?

Woof. Basically, the chances of finding *something* seemingly amazing increase when there is a large number of trials, and *flukes* one way or another are more likely to happen when there are only a small number of trials – **so take note of the sample sizes (e.g. the number of participants or attempts at a task they took) used in experiments and don’t read too much into these results!**

Comment on this post by replying to this tweet: