with No Comments

Post No.: 0714noise

 

Furrywisepuppy says:

 

Following Post No.: 0702 and our quest to reduce noise – interviews, reviews or assessments should be structured rather than informal or ad hoc so that each subject receives the exact same set of non-overlapping questions delivered in the exact same way. The answers for each question should then be scored against a predetermined rubric to arrive at a final combined or aggregate score.

 

Break queries up into smaller questions or criteria (e.g. the punctuality, qualifications, work history, etc. of a job applicant); answer each one separately; then aggregate their scores at the end. This method can work when making strategic decisions too.

 

The aggregate opinion of multiple judges, interviewers or reviewersgathered independently first before coming together to cogitate holistically on the data, perhaps in a committee – will also improve the quality of the final decision.

 

In group discussions where opinions aren’t gathered independently first – who speaks first, who’s sitting next to whom, what verdict is initially popular, and other should-be-irrelevant factors, may affect the final outcome of the discussion. A small initial difference, like whose idea was explained first, may result in a very different consensus. Similar groups can thus come to different decisions – separate yet similar groups can come to different conclusions, and the same group can come to different conclusions on different days.

 

Groupthink, the echo chamber effect, and peer pressures to conform, can reduce noise (or variability in thought) but the outcome will be biased. Group polarisation occurs between groups when members within their respective groups reinforce each other’s internal views. If you think x, and another person in your group also thinks x – your confidence in thinking x is correct will increase, and so will theirs. Subjective confidence feels emotionally rewarding hence we seek it – in this case by choosing to mainly hang around those who already think like us. But feeling confident and being correct, accurate, unbiased or consistent with one’s own past judgements of similar cases aren’t correlated. You’d think that group deliberation would always reduce noise but it can amplify it when we compare decisions made between different groups. Overall, it demonstrates how groups can either reduce bias and/or noise, or amplify bias and/or noise.

 

Individuals in group discussions will likely fail to give independent opinions because they’ll subconsciously adjust their own views based on what they hear from others. This would happen with a simple ‘guess the weight of a cow’ or ‘guess the number of marbles in a jar’ task too – the ‘wisdom of the crowd’ would fail here if the guessers were able to see what those before them had guessed because then their guesses won’t be independent from each other’s. You may think that listening to others always helps a group to come to better decisions but it fails when ‘the blind lead the blind’.

 

Herding will occur based on who spoke first. This will lead to a cascade of followers, especially if the subsequent speakers were initially uncertain themselves, or they don’t want to sound dim-witted by disagreeing, or they are deferent to or highly respect the first speaker (perhaps because it was their boss who gave his/her view first and he/she is hard to speak up against even when he/she’s wrong). This creates a bias cascade. The mood of the initial speaker can thus get amplified. Overconfident people can give convincing stories to justify their predictions, but this won’t make them more likely correct. We must be aware of all these hazards in group deliberations.

 

Listening to others and aggregating all views is still critical – the issue is when this should occur. Getting together for a group discussion or brainstorming session should happen only after every individual has independently come up with and written down their own views or ideas first. A better decision will then result once all these various views or ideas are taken into account.

 

But what typically happens is that meetings are set up to gain a consensus as quickly as possible. It’s better to explore everybody’s different independent views fully (i.e. explore that noise) before discussing everything together. A good leader lets all others speak before sharing his/her own views. Allow all a chance to speak, and to speak their mind. And again assess separate criteria separately before aggregating them.

 

So you shouldn’t allow eyewitnesses to discuss their versions of events together before they’ve given their testimonies individually, in case they contaminate each other’s memories. So keep them apart until the end.

 

Onto algorithms. A good algorithm will beat a human professional. People might prefer to trust in their own intuitions over mechanically-derived decisions but formulas or models are consistent i.e. noiseless. They don’t express whims. Algorithms range from simple rules written as a checklist, equal-weight improper linear models, linear or multiple regression models, to machine-learning AI models.

 

Noise can still exist between different algorithms though (e.g. between different credit-scoring formulas). Computers are also flawless in following their programs – even if bugs exist in the software. So there are good and bad algorithms.

 

Users might assume that computer programs can never be biased – but a bad one can actually amplify biases calamitously! Poorly-designed (or poorly-trained in the case of machine learning) ones will end up being consistently wrong or biased. Overly-crude algorithms may reduce noise but increase bias or other unwanted side-effects. For instance, if a model scouting algorithm discriminates people based on height then we mightn’t think that’s problematic – but with a deeper understanding, we’ll realise that this would indirectly generally discriminate against women. A poverty or racial bias could indirectly result if, say, a machine-learning AI finds a correlation between the number of past arrests with certain postcodes or ethnicities.

 

Yet it’s not that humans can’t be biased too, whether implicitly or consciously! And at least a simple algorithm (so maybe not a deep-learning AI) will be transparent in how it arrives at its outputs compared to the opaque minds of humans who rely on ineffable ‘gut feelings’. (Algorithms should arguably be inspectable by those affected by them. Although when they are, it potentially allows people to game them, like SEO for webpages.) The benchmark to beat is humans, not perfection. The same with autonomous vehicles – the benchmark isn’t zero deaths (although this should nevertheless be the target because what else should be?!) but however many deaths human drivers cause on the roads. Woof.

 

The unknown unknowns limit the chances of ever producing perfect predictions so even the best algorithms won’t be perfect (sometimes barely better than intuitive guesses), but they’re still overall better than what humans can hope for. Humans can forget relevant variables and consider irrelevant ones from one case to another based on either their intuitions or biases. We might presume this variability in methodology reflects the variability in the individual cases being assessed, but it’s usually unwanted variation i.e. unfair noise. We might argue that this isn’t noise but just our ability to tailor assessments to the idiosyncratic specifics of individual cases, but customisation or complexity in thought process isn’t the same thing as a more fair or reliable thought process. A simple algorithm can suffice in many situations. Humans tend to make rules up as they go along!

 

If a human knows about some ‘decisive factor’ that hasn’t been accounted for in a simple model though then a human would know better, like if a simple model for predicting the winner of a race cannot account for the fact that a contestant has broken his/her leg! (Ideally, a model or checklist of rules should continually update to account for such decisive factors for the sake of subsequent cases.)

 

Individual personality and complexity is celebrated. Humans believe they’re better able to account for the uniqueness of individual cases. And humans don’t want to be treated like mere cogs in an impersonal machine either. Each and every person believes that they are each special, as outliers that require bespoke rather than cookie-cutter considerations. Yet evidence highlights that this is seldom true thus these beliefs produce noisy inconsistencies. (That’s why some futurists predict a future where computers will be making all the decisions and forecasts while humans will be left to do the fiddly manual dexterity jobs.)

 

But because humans are (biasedly!) making the decisions regarding whether or not to entrust computers with big decisions, we need to somehow reconcile the above concerns with the need to make less noisy decisions. The challenge is in discerning between what are the real outlier or edge cases that involve decisive factors, and reserving human input for where it’s truly needed. However, machine-learning AIs, if trained on the right data, can possibly learn about rare correlations too – including those that’ll be missed by humans.

 

Regardless of their personal level of skill or technique, human professionals – such as CEOs, physicians and talent scouts – usually, biasedly, believe they’re better judges than AIs because they want to protect their jobs and pay packet sizes. They may rationalise that a mechanical decision is ‘dehumanising’. They may believe they have instinctive insight that no computer can ever rival. It might be because of their lack of education about artificial intelligences, or the result of an over-generalisation after hearing a news story about how an AI made a mistake, which has led to a distrust of all AIs as an, indeed, skewed stereotype. But humans create accidents and make other mistakes too – yet these are forgiven because ‘to err is human’(!)

 

However, if every autonomous vehicle said ‘after you’ whenever they met at narrow junctions then no vehicle would ever proceed; and if every autonomous vehicle didn’t care about other road users then it’d be chaos. Perhaps that’s where noise or ‘noise’ is useful with human drivers? Some drivers are more aggressive or cautious than others, or sometimes a particular driver feels aggressive and sometimes cautious on other days, and so the traffic flows as different drivers with different moods pass each other without impasses (albeit if two highly aggressive drivers meet, there can still be chaos). No manufacturer of autonomous vehicles would likely want their own cars to be consistently the ‘timid’ type though, but perhaps some wanted variation in ‘aggression’ levels is necessary?

 

Anyway, intuition can be said to be ‘knowing without knowing why’ – but those hidden ‘whys’ could be the outputs of unconscious or implicit biases like confirmation bias, overconfidence, sexism, the effects of priming, the halo effect, etc.. There’s the ignorance or denial of ignorance itself, even when informed of how poor intuitions can be. This is the human world – it’s like we can educate people about the risks of over-consumption but this won’t mean that people will no longer find junk food delicious. So telling people that their intuitions are fallible won’t mean that they’ll no longer be drawn to finding their trust in it emotionally rewarding.

 

The combination of an algorithm with human decision-makers actually typically produces worse decisions than just relying on the algorithm alone! This is because the human will tend to override the algorithm if the human disagrees with it, such as perhaps dismissing a baseball player who looks too small during the draft despite his/her playing stats being right up there.

 

Yet some allowance for human intuition and discretion should perhaps be granted because a problem is that those who are aware of how a mechanical or pure average of aggregate scores will determine a final verdict may game the system by inputting artificial scores that’ll lead to the desired end result! (Double damn human biases!) As a more acceptable reason, this final discretion allows assessors to account for any decisive factors that weren’t accounted for during the questioning phase i.e. unanticipated deal-breakers or deal-makers. But delay your intuition on the final verdict (i.e. don’t jump to conclusions) until you’ve waited or sought for more information. Intuition should only enter for a final decision after all of the data has been collected and analysed, to avoid rushed verdicts.

 

Woof.

 

Comment on this post by replying to this tweet:

 

Share this post