Post No.: 0724
Furrywisepuppy says:
With the benefit of hindsight – after you’ve seen the correct answer/outcome – many things appear like they should’ve been ‘obvious’ before, when they weren’t. Also, if you lack the knowledge of, or imagination for, alternative hypotheses then you’re more likely to think that the one plausible hypothesis that you can personally think of must be the ‘obvious’ one, when it mightn’t be. There are many ways that coincidences can happen too. And there’s so much chaos that we don’t normally account for (e.g. there was a ~50:50 chance you could’ve been born with a different sex instead, and all that might’ve followed as a consequence in your life). We can therefore fail to imagine how things could’ve ended up differently at every single fork in the road i.e. what ended up happening wasn’t inevitable (at least from a human-level rather than a deterministic physics viewpoint).
Some events, like pandemics or terrorist attacks, are shocking and usually surprising when they happen. But most events, with the benefit of hindsight, we assume were inevitable – so we generate a causal story of how they ‘of course’ must’ve happened and presume they ‘should’ve been’ predicted, because we fail to consider what could’ve counterfactually happened instead. Hence most things appear predictable… with hindsight. But weren’t really with foresight.
We apply ‘causal thinking’ when we assess events in isolation – we assume inevitability because A must’ve led to B, which must’ve led to C. Meanwhile, ‘statistical thinking’ – or the outside view – involves seeing outcomes as only one of many permutations that could’ve alternatively happened instead based on their probabilities. We need to consider the alternative possibilities and narratives to understand the bigger picture. (This perhaps could be called ‘multiverse thinking’ – thinking about all the permutations that could’ve happened and/but by chance in this particular furry universe (that you and I are living in) did/didn’t.)
Causal thinking is more natural, intuitive, but prone to fallible thinking and the illusion of understanding the past, which can lead to overconfident predictions of the future. A student achieves top grades. The assumption for the future is that they’ll therefore get a top job… But in reality they might not, for various reasons. They might fall into the wrong crowd? They might sustain a mental health problem? Things might’ve been different had they received more support? Likewise when assuming the past, if we notice a person with a top job, we might, with causal thinking, assume that they must’ve gotten it by pure merit and determination.
So we need to see a case as only one of many cases with varied outcomes. So instead of assuming ‘of course top grades lead to top jobs’, you should consider cases like ‘top grades, low-paid job’, ‘low grades, high-paid job’ and ‘low grades, low-paid job’ too – each with their own different probabilities of happening. Maybe a culture of racism or sexism intervenes, or family connections come to the rescue, and so forth?
Causal thinking leads to downplaying the existence of noisy, inconsistent judgements because we assume, for lots of queries, they could only have one clear, agreeable answer.
…We can easily discern up to 7 (give or take a couple) distinct categories along an intensity scale. Any more and we’ll likely make errors. As an illustration, imagine there are a couple of points, which are placed at 2” and 6” from the edge of a piece of paper – now if additional points are placed at 3 equally-spaced intervals inbetween those two initial points but we are shown only one of those new points, we’ll find it easy to call whether it’s at the first, second or third interval position by eye without error. These could instead be different levels of perceived brightness, audio tones, etc.. We find far better success when tasked to compare, say, lines directly side-by-side and saying which one is longer than the other. We do worse when looking at and assessing each item in isolation.
A real-world example of where this is relevant is that a teacher can reduce unwanted variation when grading essays by first ranking them from best to worst and then grading them; instead of marking them one-by-one and each time individually asking ‘is this a C or D?’ or ‘is this a B or A?’
In civil cases – on top of compensatory damages that make a claimant whole again – punitive damages can be awarded if the wrongdoing was particularly egregious. And in some jurisdictions and for some cases, juries decide these amounts. And to decide them, we intensity match the figure we pick based on the level of emotional outrage we feel about the crime, at the time. Our emotional outrage is a bigger deciding factor than the actual financial harm that was caused to the claimant, thus this amount can be sky-high; especially if it’s not bounded to an upper-limit according to a cap.
Should our level of emotional outrage be a fair gauge of what’s immoral? If so, what if we exaggerate it for effect? What if we’re just being too sensitive? However, we might justify the amount as a deterrent to dissuade the civil wrong from repeating. (A chosen amount might be ultimately rejected or reduced yet some have been wild e.g. $2M was awarded to a customer who wasn’t told by a dealership that a BMW car that was sold to them had actually been repainted!)
If you’ve been anchored to a figure of $50 to buy a computer keyboard then even if you don’t buy it, you’ll be more willing to pay a higher price for a computer mouse than if you were anchored to a figure of $25 for that keyboard instead. So legal cases that set a precedent really do set one that all subsequent damages are compared against. We might think this makes these decisions objective – but that first case had set its decision subjectively.
Likewise, a teacher may grade subsequent essays based relatively to the grade given to the first essay that he/she had marked, and this grade could’ve been different to what another teacher would’ve given. What could help here is having all teachers agree on the same reference graded essay from the very start to basically calibrate them all to the same anchor.
Now indeed different cases do require an individualised approach. But similar cases shouldn’t – these should be treated similarly. Rather than just make decisions on a case-by-case basis in the sense that each case is considered in isolation – we must also consider them amongst the context of all similar cases to ensure an equitably consistent evaluation between them. And a reference case can be used to calibrate everybody’s judgements.
You might personally think that someone needs to do something more exceptional to be rated as a ‘5 out of 5’ in language ability compared to what I think when I’m rating the same person (e.g. I think a ‘5 out of 5’ means being able to speak 3+ languages fluently while you may think it means being able to speak 6+ fluently). You might weight different attributes differently too (e.g. I rate kindness over physique while you think the opposite when rating a potential date).
Even if you and I agree that a movie is ‘bad’ – your idea or definition of ‘bad’ may be different to mine? Does the word ‘unlikely’ mean a <10% chance or a <50% chance? When people say ‘most deadly disease’, some might mean in absolute numbers while others might mean relative to the global population size at the time in history. So different assessors might use the same adjectives but mean different scales. Phrases like ‘appropriate’, ‘disproportionate’, ‘beyond reasonable doubt’ and ‘substantial’ can therefore lead to noisy decisions.
To reduce inconsistencies in professional judgements, we therefore need a common frame of reference so that everybody’s ‘3’ means the same thing on a scale of ‘1 to 5’. This requires training. But this can be time consuming i.e. costly.
Managers also want to preserve their discretion to make executive or unilateral decisions. They want veto or discretionary powers to override decisions suggested by even algorithms – which (reading between the lines) means ‘I want the discretion to favour those I personally like, such as that sexy female worker I want to get closer to, or my own cronies or family members’! An AI cannot be bribed or pressured by the crowd to ‘exercise its discretion’ to give one player a red card but not one to another from the opposing team who commits an identical foul.
Bureaucracy usually gets a bad press – but many checklists and procedures have evolved over time in institutions and organisations to try to precisely tackle bias and noise. Critics of bureaucracy argue that ‘all you need is to apply common sense and we can do away with lots of red tape’ – but then without that ‘red tape’, some people will use their free discretion to commit fraud! Bureaucracy can indeed gradually grow cumbersome though thus needs to be periodically assessed to see if it can be streamlined.
Sometimes some noise is desirable – or at least mechanical algorithms can sometimes preclude a fair hearing because, without explanation, ‘the computer says no’ and those affected must accept that answer. People mightn’t feel like they’re being treated with respect and dignity if a dispassionate machine, rather than an individualised hearing, makes any big decisions that affect their lives. People don’t want to be treated as dehumanised objects. (Is there therefore noise regarding the issue of tackling noise?(!)) Discretion is also required to consider the ‘complex particulars of each case at paw’ and the ‘decisive factors’ mentioned in Post No.: 0714.
In sport, the refuree’s discretion does introduce an element of, arguably exciting, luck. In business, noisy decisions can mean overpriced contracts that lose business and under-priced ones that lose money hence it can affect profits – yet eliminating noise to near zero can be financially expensive and overall not worth the trouble. (Badly written or trained) algorithms or AIs might even introduce biases if all decisions are based on the same erroneous assumptions.
Noise may also allow moral and political evolution to occur because people disagreeing on issues may be telling us that the status quo should be reassessed. Diversity and disagreements shouldn’t be stifled for the sake of homogenous consistency. An inflexible rule is solidified but (in certain situations anyway) they need to be challengeable – a noise-free system might freeze existing values. Moral values and norms can and do evolve over time thus we need to avoid overly rigid rules and allow some discretion.
Yet it must be clarified that the courts (judiciary) don’t exist to pass new laws but to interpret and apply them – it’s down to the legislative branch of government to pass, amend and thus evolve laws. Therefore judges should still aim to be consistent with themselves and each other. (Or perhaps a bit of discretion and fuzzy randomness when it comes to punishments could act as a greater deterrence, at least for the risk-averse? Would a punishment lottery be fair for those it fails to deter though?) It’s about having ways to evolve the rules themselves, rather than having ways to (re)interpret existing rules with inconstant discretion. For teachers, their teaching methods should have latitude for creativity, but the grading of works should aim to be consistent. For companies, they should try to be creative with their daily operations rather than creative with their accounts!
These objections don’t deny the existence, and generally major problem, of noise but criticise the side-effects of the specific strategies that may be implemented to try to reduce it.
Even if noise-reduction can be expensive because it’s not always easy to audit and resolve – the social cost of permitting power-tripping discretion can be demonstrably unjust. The social, as well as monetary, cost-benefit calculations need to therefore be made.
Woof! There is strongly arguably more dignity and equity in having consistent rather than arbitrary decisions.
Comment on this post by replying to this tweet: