with No Comments

Post No.: 0821certain

 

Furrywisepuppy says:

 

If we want to ask if a certain drug is more effective than a placebo for relieving joint pain, we should at least ask what’s the population (who’s being studied e.g. elderly people with joint pains), the intervention or treatment (what’s being studied e.g. an anti-inflammatory), the comparison or control used (how is the non-intervention group being treated e.g. with a placebo), and the outcome (what’s the important thing you’re trying to change e.g. joint pain) – or the acronym PICO.

 

Independent measures experimental studies are also known as ‘between groups’ experiments because different participants are used for the experimental group (who’ll receive the intervention or independent variable) and for the control group (who’ll receive nothing or something else). Participants must be randomly assigned to each group though to minimise the risk of systematic errors e.g. any gender differences between the groups, for which gender might be a confounding variable because it could be the case that one’s gender affects one’s response to an intervention or treatment.

 

Repeated measures experimental studies are also known as ‘within groups’ experiments because the same participants are used for the intervention and control conditions. The order that the intervention or control conditions are given to these participants must alternate though to counterbalance any order effects e.g. in a cognitive test, participants might do better during the second test simply because they’ve done it before and know what to expect (a practice effect), or they might alternatively do worse during the second test simply because they’re tired or bored from having done it before (a fatigue effect).

 

An ‘idealised’ study would have equal expectations for both intervention and control groups (i.e. they are perfectly randomised), and will treat both groups exactly the same except for the intervention, so that the only difference is the intervention. However, real-world experiments seldom conform to this ideal because not everyone behaves exactly as they’re told to behave, there’s attrition (drop outs), misreporting, the experimenters may inadvertently behave differently towards different participants or not do what they said they’d do, each outcome may be detected or determined differently, etc.. There are a plethora of internal and external biases that can make a study prone to errors. When processing the data, scientists can also inadvertently read or input the data incorrectly, for instance. And when forming their conclusions, one must question how they calculated the comparisons to other studies or historical evidence, and ask what did the study really test for?

 

A scientific conclusion is based on a significance level, or the probability of coming to the wrong conclusion when the null hypothesis is true; along with a confidence level, or the probability that if an experiment or survey were repeated again and again, the results obtained would be the same; and in conjunction with a confidence interval, or the range of results that would be expected to contain the true value of the parameter of interest. The null hypothesis is the commonly accepted fact that one is trying to nullify or invalidate, because one thinks that one’s own alternative hypothesis – the one that one is trying to prove – truly explains a phenomenon instead.

 

For example, a conclusion may be reported as ‘75 with a ±5 confidence interval, and a 95% confidence level’ – which would mean that the true value (perhaps the true average weight of a population) falls between 70 to 80, and one is (at least) 95% sure that, if the experiment or survey were to be repeated (when weighing another random sample of the population), the result will be the same.

 

So it could be said that science is not about absolute certainties but about trying to disprove hypotheses with different levels of certainty. Scientific conclusions come with varying degrees of uncertainty – even though the media (such as when it comes to forensic science portrayed in TV dramas) can give us a false impression of how absolutely certain scientific conclusions always are.

 

We can never actually be perfectly 100% certain of anything unless we measure 100% of what we’re trying to measure. That’s why predictions are never 100% certain because we logically haven’t measured any future events yet. One could be 100% certain of the average weight of the total population of a town at a snapshot in time if one accurately weighs absolutely everybody in that population at that snapshot in time. But if one only samples a fraction of the total population and uses the average weight of this sample to infer what the average weight of the total population is, one cannot ever be 100% certain that one has got the value spot on.

 

It’s therefore important to pay attention to whether a conclusion is, say, 50%, 90%, 95% or 99% certain, and what interval or range of values this confidence level relates to. Not all scientific conclusions are as certain as others. (Some things that scientists state are only their mere opinions, some results come from initial studies, and some come from repeat studies, too.) The general media may report just one single number, but hidden are the confidence level and confidence interval.

 

Facts in science therefore come in various shades of grey hence we shouldn’t read them all as absolute black-or-white. It’s a mistake if one thinks that science – especially the social sciences – always gives absolutely certain answers and guidance. Social science issues are complex and often have multiple contributory causes for every effect, which makes it more difficult to rule out every potential lurking confound in real-world natural experiments.

 

We may end up misunderstanding and misrepresenting a scientific conclusion as indisputably black-or-white when a study never actually expressed that at all. For instance we’d be wrong if we were to argue that a species always needs at least a certain minimum number of individuals above a mating pair to sustain or revive its population according to its ‘minimum viable population’ – when the study relied upon itself actually states that this minimum population figure is for a 95% probability that a species will last for 100-1,000 years (based on some assumptions too). And really, any chance above 0% makes something technically possible, hence conservation efforts for near-extinct species can still be worth trying. (And well life wouldn’t ever have thrived on Earth if it needed a minimum population from the outset!)

 

Having said that, science may reveal that a torture technique works 6% of the time – which might be better than anything else that’s been tested before to elicit the truth from a terror suspect – but we cannot ignore the low odds of success.

 

People often say that something ‘isn’t an exact science’ when something isn’t always reliably true, or if an algorithm (set of instructions) doesn’t always produce the right or best answer – but science, even in normal circumstances, doesn’t deal with absolute certainties but things that have varying degrees of certainty; albeit high certainties are aimed for – most commonly a >95% confidence level that a prediction will fall within a particular confidence interval.

 

Yet that still means that an unusual, novel or rare original result may have legitimately been a fluke – well there’ll be up to a 5% or 1-in-20 chance that one will find a false positive or miss the true value of a parameter if one aims for a 95% confidence level.

 

This is where replicating studies improves our confidence in a result. We’re essentially trying to show that the original result wasn’t a mere fluke. The replication of studies is vital for our more complete and accurate understanding of the world. However, many non-significant, non-confirmatory ‘failed’ results, or replication or failure-of-replication studies in general, don’t get published because they’re considered ‘uninteresting’ and so journals aren’t as eager to publish them compared to studies that find novel results. (Novel results grab the attentions of readers more and this translates to the journal selling more subscriptions and making more revenue.)

 

And if the (top) journals don’t want to publish such findings, then scientists also won’t want to bother wasting time writing them up or submitting them either (the file drawer problem), which creates a bias in the scientific literature. But finding out that ‘x did not help z’ is often just as important as finding out ‘y did help z’.

 

A different problem is the notion of ‘publish or perish’ – see Post No.: 0800.

 

Science not only yields us predictions that aren’t ever 100% certain – what we choose to do with the information is a separate matter. Science is neutral – what we choose to do with the results we find is subjective. Science informs us of what the odds of a causal relationship or the predictive power of a formula or model is, but we still have to choose what to do about that information. Sensibly, one should go with the decision that is likely to happen >95%, rather than <5%, of the time. But still, science seeks the facts – what we decide to do with these facts is more a socio-political question. For example, science may tell us about ‘the survival of the fittest’ but that doesn’t tell us whether we ought to intervene to help the disadvantaged, do nothing at all, or actively wipe out those we consider ‘weak’ in a negative eugenics sense? (These choices roughly correspond to the range of views expressed in a leftwing to rightwing political spectrum, and this shows us why these debates will never ever be resolved via science.) So objective information can still have incredibly subjective interpretations and takeaways.

 

‘This research reveals that we should…’ usually means someone’s own interpretation of the data. How scientists deal with an incomplete or noisy dataset (e.g. kriging, Kalman filters), and how aggressively, can lead to different results and therefore different conclusions too. And does the past always predict the future when we rely on historical datasets to make furry forecasts? Predictions inherently come with uncertainty, although some predictions are more certain than others and that’s down to the statistics.

 

Quantum mechanics experiments are held to some of the most stringent standards of required proof in science, yet these experiments and results are still open to various different and currently valid interpretations of what’s happening (e.g. the Copenhagen, many worlds and ensemble interpretations).

 

It’s best to treat scientific conclusions as only ever provisional. They’re never unchallengeable because we don’t know what new evidence we might discover in the future that may contradict our previous results. Yet they’re nevertheless the best we currently know. (The best practice would be to say, “As far as I/we currently know…” and to recognise the difference between opinions and facts.) We collectively still have much to learn, but every robust scientific study gains us further knowledge and contributes to shifting our confidence in a given conclusion one way or the other.

 

If we cannot admit to being wrong, or even the possibility of being wrong, we’ll therefore be unenlightened about the nature of knowledge itself – which is worrisome when we notice how many of us sound diehard confident in what we think on social media, as we naïvely believe that complex and nuanced issues are simple and clear!

 

So notice the scientific conclusions or ideas that aren’t generally being talked about anymore because their truth has since been in serious doubt or has been completely overturned. And notice the scientific conclusions that continue to stand the test of scrutiny and time to this very day. The ‘boring’, long-standing and unchanging scientific conclusions are likely (although not guaranteed) to be the absolute truths.

 

Regarding objective matters – this all doesn’t mean that ‘any truth is possible’ and that we should be tolerant of everybody’s individual subjective beliefs. There are absolute truths – it’s just that we cannot always be absolutely certain about whether we’ve found them. So we need to hold our own present views less dogmatically, yet understand that there is only one truth regarding objective matters.

 

Woof. The language of science is probabilities, not certainties.

 

Comment on this post by replying to this tweet:

 

Share this post