Furrywisepuppy - Ecological, External and Internal Validity

Post No.: 0479

Furrywisepuppy says:

In science, ‘ecological validity’ is the degree to which the settings, materials, procedures, timescales, etc. of an experiment approximate the real world it’s trying to emulate. So when people criticise a laboratory-based experiment that’s, say, studying sleep, for not really representing the conditions people would face in the real world – perhaps because the lab is full of beeping machines and feels too cold and sterile unlike a normal bedroom where people normally sleep – then they’re criticising the experiment’s ecological validity. They’re contending that any findings gleaned from such a study might not be useful for drawing any real-world conclusions from.

But a usually far more important question than a study’s ecological validity is its generalisability, or the extent to which the conclusions of a study can be applied to other settings such as outside the context of that study – this is known as ‘external validity’. So the settings, etc. of an experiment don’t have to absolutely emulate real-world conditions yet they must do so enough so that we can be confident that the results will apply across other situations, times, places, people and so on. External validity is a crucial objective of almost every study because we’re trying to draw general conclusions (e.g. that a drug that works on the subjects in a trial will also work on everybody else wherever they are in the world as a whole).

Meanwhile, ‘internal validity’ refers to the validity of the conclusions drawn within the context of a particular study. Did those ‘causes’ really cause those ‘effects’ observed or was the relationship between those two variables merely spurious? Such inferences carry internal validity if a causal relationship between them can be properly demonstrated. At a minimum, all causes must precede their effects, and causes and effects must co-occur together (covariance). There should also be no plausible alternative explanation for their covariance. Generalisability or external validity is irrelevant if a study doesn’t even at least possess internal validity. The ‘gold standard’ for finding internally valid results is a randomised, controlled, double/triple-blinded trial.

Internal validity problems can include – the ambiguity of the temporal precedence of the alleged cause and its effect, confounds or a ‘third variable’ that influences both the independent and dependent variables (i.e. instead of A causing B, or B causing A, it’s an exogenous factor C that’s causing both A and B), biases such as selection biases, testing effects from repeated testing (e.g. someone getting better at performing a task not because of some drug they took between a pre-test and a post-test but because of simply getting some practise during the pre-test), maturation effects or the natural passage of time (people may noticeably age or mature over the duration of a study, especially children, and it could be this that creates the effects observed, and we also often just naturally get better from an illness over time without needing an intervention like a drug), the accuracy and consistency of the instruments used, subject attrition/mortality rates (the subjects who drop out before a study’s completion may or may not be telling us something), and other things that mean that the results might not really be telling us what we’re hoping for them to tell.

External validity problems can include – answering whether the results are valid for another time and place, answering whether the results are valid for other people and ages, reactivity effects, and other things that question the generalisability of the results obtained from a study to the outside/real world and/or to other situations and persons.

‘Reactivity effects’ include the placebo effect. Observer effects can occur when subjects who know they’re being watched and recorded might try harder or respond more socially acceptably than they would in reality because people are more likely to say the ‘right things’ when they know a camera is on them – but would their usual actions off-camera match these words? Pygmalion or Golem effects can alter the performances of subjects too due to respectively high or low expectations being placed on them if the experiments aren’t double-blinded.

Because people are prone to consciously or subconsciously lie, exaggerate or otherwise modify their behaviours from their normal behaviours when they know they’re being watched or assessed for a particular behaviour, such as their racial biases or level of gluttony – subjects are seldom told what they’re really going to be tested on during psychology experiments. They’re typically told that they’re being tested for one thing but really the experimenters are testing them on another thing. (Some fluffy scientists are going to be annoyed that I told you that now in case you’re going to try to second-guess them if you get to participate in a study of theirs’ in the future!) In some cases this can pose ethical questions of deception, but in most cases there is no other way to get usefully valid data.

Broadly, field studies are more contextualised and ecologically valid, while lab studies are more systematic and can better isolate and control the independent variables so that we can find out what specifically causes what effects. Really, these study methods complement each other; and can in some cases be somewhat hybridised. Good experimental methods will blend field (in context) and laboratory (controlled elements) studies.

All types of research will generate data and can legitimately be called ‘science’ if the scientific method has been rigorously applied, but this doesn’t mean they’ll all necessarily generate equally usefully valid data. All data must therefore be read and utilised according to the context and limitations of how it was gathered. And that’s why just knowing the data is never enough – we must always know the methodology that was used to gather that data too, and always talk about data and methodology concurrently. Therefore if a scientist just presents you some results or conclusions – ask about the methodology they used to obtain that data too, and critique that.

So when reading science news – ask questions such as those related to validity. Validation and verification essentially means ‘are you asking the right questions?’ and ‘are you answering them correctly/accurately?’ Can the findings be generalised? For example, does ‘being good at Fortnite’ mean ‘being good at playing videogames in general’? Does an experiment that highlights the dangers of conformity mean that all conformity in society is bad or should the conclusions remain only applicable to the specific scenario studied because conformity can be good or bad depending on the context?

When most readers come across a science article in the general media, they don’t check out the references and the actual scientific papers (although these aren’t always accessible without crossing a pay wall, but one could try directly asking the author for a copy). Most just rely on the article provided or even just take the headline at face value without reading much more. Peer reviewers for reputable scientific journals, combined with specialist science journalists for reputable news outlets, do tend to be reliable in making sure we’re fed good, critically-scrutinised information; but some journalists may misunderstand base rates, over-generalise findings or over-state the confidence of findings to amplify sensationalism (e.g. how almost everything causes cancer!)

Thus if an area of research is important to your life, such as it’s going to inform your treatment options – seek the original sources of information, delve into the data found deep in the references rather than rely on the headlines or summaries (as the results may e.g. show a difference between young and old people, males and females), and check for the amount of disconfirming data too (e.g. if something happened to 28% of the subjects then it didn’t happen to up to 72% of them).

Do crucially understand that a scientific journal publishing a paper is not a stamp to say that the results of that study are correct – that’s the valuable yet undervalued job of those who try to replicate studies i.e. try to entirely repeat someone else’s work themselves to check its results. Peer reviewers basically assess the editorial standards of a paper to make sure it meets certain standards of scientific quality (e.g. whether the study was apparently well-designed, the claims are supported by the purported evidence, logical reasoning was applied and whether it acknowledges the work of those the research builds upon).

Hence the decision to publish a paper isn’t an act of confirming the factualness of its findings. And even if it is factual, it still must be taken in the context of all of the other existing scientific findings in the field that are factual too. Or perhaps further research is still required? This is where ‘meta-analyses’ (if conducted properly) are more powerful than single studies – these analyse the results of scientific studies in the context of other past similar or related studies that are asking the same research question.

In a ‘systematic review’ – one openly describes the search strategy and search terms used in the information search, then ideally whilst blind to the results sections (to try to avoid cherry-picking studies that confirm one’s biases), assesses and rates the methodological quality and characteristics of all of the studies one will rely upon, compares the alternatives, then gives a critical weighted summary of the data. (So always review the quality of the methodology of a study first, before even glancing at its results or conclusions, to determine whether the results will likely be reliable, then look at the results.)

Whereas a meta-analysis uses statistical methods to summarise the numerical data of a collection of studies in order to get a combined picture of the literature pertaining to a particular research question – a systematic review answers a well-defined research question by collecting and summarising all of the empirical evidence that fits pre-specified eligibility criteria in a detailed, systematic and transparent way. A meta-analysis should only ever be conducted in the context of a systematic review.

Ultimately, one must learn how to read, and to carefully read, science news and any given conclusions (which could be just the authors’ own interpretations of the results), then weigh them in the context of other studies that explored a similar research question, and not over-extrapolate any findings or one’s own conclusions. Do not over-extrapolate a conclusion to other conditions if a study’s external validity doesn’t justify it – the results may only be applicable to the specific conditions set out in the research method. For instance, in one lab experiment of the prisoner’s dilemma game, it revealed that people are far more cooperative if they must make their choice to cooperate or defect within ≤10 seconds – this might be because people (must) rely on their fast ‘system one’ instincts in order to make a rapid decision rather than (over)think the decision by using their slower ‘system two’? This is just one possible interpretation of the results. It didn’t reveal anything about how people generally behave (kinder, less kind or the same?) if they could take as long as they wanted to decide how much to share either, hence one must be careful if one wants to extrapolate the results of this particular study to include all conditions.

Most laypeople consider ‘science news’ as indisputably black-or-white but the whole point of the scientific process and scientific community is systematic scrutiny by using critical-thinking techniques and statistical and logical thinking. With critical thinking skills and a better understanding of the scientific process, you don’t have to passively accept the facts or ‘facts’ that you consume anymore, which means that you’ll suffer far less from thinking that ‘science says one thing one day then another thing another day’, for instance.

Philosophy isn’t about black-or-white answers either – it’s about critical thinking, reasoning and justification. The answers here may not be objective but they’ll at least be reasoned. Sound logic is for philosophy what empirical evidence is for science.

Woof! Please use the Twitter comment button below to tell us whether you think more people should better appreciate how science actually works?

Comment on this post by replying to this tweet:

ECOLOGICAL, EXTERNAL AND INTERNAL VALIDITY