Furrywisepuppy - Operationalisation and Operational Definitions

Post No.: 0461

Furrywisepuppy says:

Many phenomena or attributes that we intuitively think exist cannot be directly measured, but their existence can be inferred via other phenomena that are measurable. Examples include ‘intelligence’, ‘health’, being ‘strong’, ‘beautiful’, ‘deadly’ and oftentimes what it means to be the ‘best’ or ‘worst’ at something. In science, we need to be far more precise about what we mean, hence ‘operationalisation’ is the process of defining such measures, or operational definitions.

For instance, we want to determine which animal has the best hearing or eyesight in the animal kingdom. When deciding on the operational definition of ‘better hearing’ or ‘better eyesight’ – shall it be measured by the range of frequencies an animal can pick up, the signal sensitivity, directional accuracy or field of vision, or some other measure? And how should one rank or weight these factors in terms of importance against each other to create an overall score – is having a greater range better than having a greater sensitivity, or vice-versa? Shall one only use each animal’s greatest attribute or use the average of all of their scores to base our conclusions on? And other potentially arbitrary decisions.

Should ‘the fastest car’ be defined by a quarter-mile race, 30 laps around a particular circuit or a 24-hour race, for instance? Using different operational definitions or measures for what it means to be ‘the fastest’ – even when it’s all proper science – can produce different results and therefore different winners (and marketing departments know this hence why they’ll use whichever definitions suit them best when trying to advertise their products!)

Likewise, why is ‘the fastest person on Earth’ the fastest runner over 100m? Why not over 1,000m, or perhaps 10m or 1m, or a distance not based on whole numbers of the (itself arbitrary) decimal system? We could measure the fastest speed in kph though – yet how much wind and/or downhill assistance is objectively ‘acceptable’?

The ‘golden ratio’ (~1.62:1) is frequently claimed to be ‘the most beautiful proportion’ by some scientists, but this is an arbitrary claim. There are other ratios too, such as symmetry (1:1) – indeed, one side of a person’s body being 1.62x the size of the other would look more like something from Resident Evil(!) How we choose the points to start and end measurements from on organically fuzzy-edged, curved or moveable surfaces can be highly subjective or arbitrary too. And you can literally divide any number by another to get a ratio.

A selected measure might be able to generate a quantitative value or number (e.g. 100%), which can seem objective compared to a qualitative statement (e.g. ‘good’) – but this doesn’t necessarily make the claim completely objective. That’s why it’s vitally important to carefully read and take note of the details of a study rather than rely on trusting the author’s own given conclusions to their own study or experiment, or over-extrapolating the results just ‘because science apparently came to the conclusion’. There are potentially many different ways to operationalise things hence potentially many different conclusions that could be reached.

So whenever people say something like ‘the fittest’ or ‘the smartest’ or some other hypothetical construct, we must understand that the conclusion is based upon a specific measure or set of measures that is or are possibly arbitrary and pretty certainly incomplete. Hence one must not over-generalise conclusions to mean anything more than pertaining to what specific measure(s) were observed in a particular study.

Thus if a study only tested for colour depth then we can only comment on colour depth rather than over-generalise or over-extrapolate the result to mean that a particular furry animal has ‘better eyesight’ than the other animals that took part in the study; which likely won’t be a complete list of animals from the entire animal kingdom either – not least because new animals are continually being discovered. Therefore the most we can conclude from such a study would be ‘this animal can detect the greatest colour depth amongst the animals tested, assuming that those particular animals tested weren’t outliers (in the same way that different individual humans perform differently when it comes to eyesight hence it can depend on which animals as individuals took part in the study), and probably according to a whole host of other limitations of the study too’. The media, and in turn public, though, might report the conclusion as ‘this animal has the greatest eyesight’(!) And then they might wonder why science seems to say one thing one day and then another thing the next – well the explanation might be in the details of each study, but they’ve been ignored in favour of simplistic conclusions and a simplistic view of the world.

It’s the same with IQ tests as a measure of ‘intelligence’. Post No.: 0459 explored this already. We must understand what specific tests were conducted, when, where, etc. and not over-extrapolate the results to mean anything that a study never actually tested for (whilst being sensible about it e.g. the shoes the participants wore probably didn’t affect their scores, but something like whether they were hungry or digesting a large meal during the test might?)

A statistic that claims that ‘x experience more abuse than y’, and then another statistic from a separate study that claims that ‘y experience more abuse than x’, could potentially both be true, depending on the definitions those studies used for ‘abuse’ – such as whether each and every incident of verbal abuse counted, and whether they were weighted the same as each and every physical slap or shove? This is again why we must carefully read the study papers to find out these things rather than simply rely on the conclusions provided. People will tend to automatically trust more whichever statistic fits their personal worldview or agenda though. Scientists, for being humans too, may also try to intentionally select definitions that’ll most elicit the conclusions they hope to find and want to present, in a kind of ‘creative accounting’ way. (Indeed, accounting is about numbers too, yet that doesn’t mean they cannot be organised and presented in a multitude of different ways to suit whatever purpose an organisation wants.)

We therefore mustn’t read scientific study conclusions as always being ‘a definitive, objective, indisputable or complete test or answer’ – the core of science is the process, not a book of purported facts. Science is the method – science isn’t a bible. We must therefore learn and understand how the various types of scientific research are conducted and their pros and cons (e.g. randomised-controlled trial, case-control observational study, longitudinal cohort study, meta-analysis); read the original papers in question and understand their contexts and critique their methodologies, not just read and unquestionably accept the scientist’s (or journalist’s) own potentially arbitrary operational definitions or subjective interpretations of the data and conclusions; and come up with our own conclusions, while understanding that these will likely be subjective too unless they take into account the combined limitations of all of the studies that have ever looked at a particular phenomena.

One scientific study alone won’t necessarily give us the full picture – a particular study’s results can depend on what methodology was used, the operational definitions or ‘measures of success’ used, and the authors’ own interpretations of the results (especially in the social sciences e.g. how difficult is ‘too difficult’?) The results may be factual for the given methodology but the conclusions we draw from them could be our own potentially subjective interpretations. Whether a number is a ‘high’ or ‘low’ one depends on what we compare it to, and even if everyone agrees that something is too high or low – what to do about it, if anything, will be up for debate too. In the case of relaxing a pandemic lockdown, for instance – what death toll over the next month is worse or better than what economic damage, poverty and state of mental health, that would cause suffering and in turn indirect deaths, over the following months? These are difficult political decisions without the benefit of hindsight.

Nevertheless, the scientific process is still the best process for discovering facts – the key is to read everything carefully, to be careful not to over-extrapolate findings, to rely on aggregating the findings of multiple independent studies in the same area, to understand that some piece of new evidence could possibly turn everything that’s currently accepted upside-down (although thinking ‘this thing could be proven wrong in the future’ isn’t a good reason to dismiss the best truth according to the preponderance of empirical evidence we have at the present time), and to therefore understand that science isn’t some kind of centralised bible that once appended to is set in stone or will ever likely be complete (because we logically cannot know whether we know everything there is to ever know hence something new could always unexpectedly pop up).

When we argue like lawyers and one side cherry-picks only evidence that supports their own case, and the other side cherry-picks only evidence that supports their own case, it usually ends up with an impasse. In court, there are judges and limited sets of jurors to break such impasses but in science there aren’t and nor should there be because no one can fill these positions. Even experts can disagree with each other. And although an overall consensus amongst relevant experts adds serious weight – finding the truth isn’t really a popularity contest or democratic matter. Woof.

The ‘video assistant referee’ (VAR) system used in professional football nowadays shows that merely using technology doesn’t automatically make something objective either – there’s still human interpretation involved.

Big data combined with machine learning techniques can somewhat remove the bias of cherry-picking so that we can find out where the data overall points towards according to the statistical patterns. But it’s still not perfect for testing every kind of hypothesis because it’s still dependent on what data is fed in (we would ideally analyse the entire dataset of something to remove sampling biases but this isn’t always available), the keywords chosen, the operational definitions and parameters used once again, and once more there’s the human interpretations of the results too. Machine learning software needs diverse and large (preferably up-to-date and complete) datasets to reduce or overcome sampling errors when it comes to making fair generalised claims about something.

When analysing religious corpora, for instance, machine learning ‘distant reading’ techniques (non-linear reading, such as finding word associations or patterns from a ‘bird’s-eye’ perspective) can provide a broader perspective on texts compared to ‘close reading’ techniques (traditional linear word-by-word, line-by-line, reading), and can reveal a lot of quantitative data rapidly, efficiently and without cherry-picking. But this data still requires expert human qualitative interpretations (who have done a lot of close reading on the subject before). And there’s still also the issue of poor data in, poor results out (e.g. if the dataset is incomplete or full of biased data – in which case the only thing that can be practically done is cross-referencing the results with other independent evidence, and lowering the confidence levels for one’s conclusions).

In summary, even if something’s ‘according to science’ or ‘scientifically researched’ and it’s proper, well-conducted science – the conclusions that a particular study suggests can still be potentially debateable depending on the operational definitions and methodology selected by the scientist(s), and other factors. It’s usually straightforward to define something like ‘height’, ‘weight’ or ‘temperature’, for any unit of measure we select can be reliably converted into another (e.g. centimetres to inches), thus it’s these sorts of measures we’ll need to use in order to operationalise something like ‘athleticism’. Perhaps ‘how high people can jump’ shall be how we define ‘athleticism’? But will this, or any limited number of selected definitions be a complete test for measuring who’s ‘most athletic’? When we cannot fully measure something then – although we should still be sensible and reasonable – there’ll be a space left for opinions or guesses to fill the gaps.

Woof.

Comment on this post by replying to this tweet:

OPERATIONALISATION AND OPERATIONAL DEFINITIONS