Furrywisepuppy - Finding the Right Candidate for the Role

Post No.: 0725

Furrywisepuppy says:

Traditional unstructured or unguided job interviews are quite inconsistent and thus noisy – different interviewers might accept/reject different candidates for various personally partial reasons, or accept/reject a specific individual candidate depending on their particular mood on a given day.

We also have a tendency to over-weight our first impressions, which are unreliable because they’re based largely on shallow judgements since we don’t yet know someone on a deeper level yet. And what’s troubling is that plenty of people don’t see a problem whatsoever with making inferences like ‘a firm handshake means a strong candidate’(!) They think that being able to make such instinctive inferences like this is evidence of their own incredible, insightful and perhaps even prophetic intuitions and judges of character! Some instincts or intuitions will turn out to be correct, yet much of the time they’ll be guided by unconscious or implicit biases like ingroup biases. And we won’t know because intuitions are precisely ‘knowing without knowing why’. Woof!

And if all you knew about two candidates was that one was rated higher than the other after an interview – the chances that the higher-rated candidate will perform better in the actual job will be little greater than flipping a coin! This can be due to biases from the interviewer like homophily, gender and racial stereotypes, the halo effect and judging by irrelevant factors like appearance; along with the unknowns of life such as a candidate later going through a divorce that impacts upon their work performance. It could also be the nervousness of a candidate with one interviewer but not with another, or the fact that someone can be nervous during interviews but fine when doing the actual job they’ll be asked to do. We can get people who interview badly but work well, and vice-versa – and of course it’s their ability when doing the job that matters, not really their interviewing skills.

By the way – how often do interviewers blame themselves instead of the employee if they hire a flimsy employee? (How often do teachers blame themselves for not being able to get a pupil to pass a subject? How often do designers blame themselves if anyone misuses their product or can’t figure out how it works?) So shouldn’t the interviewer (who might be the boss of the organisation) be sacked too?!

Unstructured interviews steer in multiple ways e.g. if someone appears introverted under first impressions then the interviewer will ask tougher questions to probe how they’ll work in teams; whereas if someone appears extroverted then they’ll already be assumed to fit fine in a team. Or a good first impression will lead the interviewer to focus more on selling the company to the candidate; and vice-versa. A response like, “I left my old job due to strategic disagreements with the CEO” can be interpreted as a sign of intransigence or alternatively integrity. So interviewers will ask different questions and treat the candidates in different ways, which will logically produce noisy judgements.

In one experiment, interviewers didn’t twig onto the fact that the interviewees were instructed to give meaningless ‘yes/no’ responses to closed-ended questions based on the first letters of the questions asked(!) Nevertheless, the interviewers had no trouble forming a coherent narrative about each interviewee – they claimed they could infer so much about each interviewee’s character during their short times together!

We can perceive coherent patterns even in totally random noise, like faces in the clouds. In totally fictional videogames – fluffy fan theories can flesh out coherent backstories to peripheral characters e.g. regarding a mysterious shopkeeper who periodically materialises along the protagonist’s journey… when they probably pop up during these times just to ensure that you have enough upgrades, ammo and health items to complete the game in a practical sense, and the game developers considered that! Some characters in fictional creations are deliberately left with vague backstories so that it creates conversations and intriguing fan theories. Also, like lies (which are fictional creations too), the less that’s said, the less likely one will introduce contradictory information that confuses or breaks believability.

Our subjective interpretations of objective facts are coloured by our own prior attitudes and worldviews. We over-weight our own personal experiences and intuitions compared to information that could be more predictive of performance, like standard test scores. We’ll rationalise that such tests don’t capture all of the relevant metrics – which may sometimes be true, but our own judgements will fail to factor in all of the relevant metrics and will factor in several irrelevant ones too e.g. a candidate’s age or attractiveness. And whenever we’d rather trust in our intuitions, we’re most of the time really saying that we’re too lazy to apply critical thinking!

We can’t expect consistently fair judgements if the interview test process isn’t consistently applied for each and every candidate i.e. when it’s made easier for some than others. So a structured interview involves consistently asking the same comprehensive set of relevant, non-overlapping questions in the same manner for each candidate, and then only reviewing the combined scores or pictures at the end. They should be scored against a predetermined rubric that’s shared amongst all the interviewers; who should train to calibrate themselves to the rubric beforehand. It’s less chatty but it gets to the point without irrelevant extraneous stuff like flirting or whether you share the same love of cricket.

Still, most employers place great, if not greater, value in informal, face-to-face interviews. (Many biases remain stubborn despite scientific research!) People love to trust their guts because of cognitive ease, and love the feeling of power that having the final say gives!

What would improve the decision quality further is aggregating the views of multiple independent interviewers; although this would add time and cost.

Work sample tests are the best predictor of work performance because they logically most directly simulate the job the candidates will be asked to do. (However, if they’ll first need some training then these sample tasks might need to be simplified.) The best work references aren’t those provided by the candidate but by other employees that the candidate will cross paths with during their time work sampling too. (Regarding probationary periods though, these can be exploited by employees who’ll just behave well for this period and that’s it, and by employers who could just dismiss an employee without warning.)

Likewise during performance reviews at work – different reviewers might give candidates different evaluations too. Knowledge workers are harder to review than factory workers who produce consistent piece-rate work. Therefore the same reviewer might judge you differently because they’re biased towards people of your nationality, or they might judge you differently because they had a superb lunch just a moment ago. Once again, there’s so much noise or unwanted variability.

Reviewers sometimes even strategically deliberately give more/less favourable reviews to certain employees, perhaps to get someone promoted/demoted in order to get them off their team for personal reasons!

To avoid ratings inflation (giving ever higher scores) in employee appraisals – we could use forced rankings. However, this can harm team morale. And what if every employee – even if there’s a clear rank you could order them – actually met excellent work standards and you should really therefore be satisfied with every one of them?

Related criticisms concerning individual performance reviews include that performance is a team game more than an individual game. And pitching colleagues against each other reduces teamwork and collaboration. Motivating through fear and greed might not bring the best out of people.

To minimise the noise though, we could again aggregate the opinions of multiple independent people (like the boss, other colleagues and customers); and create complex, comprehensive guidelines and questionnaires. But the workload mightn’t be worth it. So unfair inconsistencies in employee treatments may be tolerated because it simply maximises profits.

Primarily due to cost-cutting reasons (with other considerations secondary) – automated algorithms are thus increasingly being used in multiple stages of the recruitment process. As extolled before, algorithms can be fairer in the sense that they’re far less noisy (i.e. more consistent) than human interviewers. But they can still be potentially biased – and if a single biased algorithm is used for all recruitment decisions then those biases will systematically skew the entire recruitment decisions of an organisation in a more amplified way in one particular direction compared to if using a diverse group of human interviewers.

In recruitment, algorithms might be used to, for instance, assess CVs/résumés, to conduct automated video interviews, or to conduct interactive games-based assessments (these might e.g. evaluate a candidate’s risk preferences, how they follow orders or break the rules, handle pressure and persist, or carry out a job-specific task).

But if an algorithm isn’t transparent then we’ll not know how it came to the decisions it did in order to assess whether it was being unfairly biased or not. Yet if it were transparent then people could learn to game the system. (This type of problem is present in other contexts too, like SEO.) We’ll likely find out that its decisions are based on oversimplified metrics like how many instances the word ‘we’ or ‘us’ was used relative to ‘I’, ‘me’ or ‘umm’ (whereas a real human would prefer a candidate to give a well-considered answer despite ‘umming’ a lot, over a candidate who can smoothly BS their way with a woolly answer). Shifting a few paragraphs around on a CV/résumé or increasing the size of some text a smidge mightn’t matter to a human but it could trick an algorithm into giving a higher score. Even totally cheating like using a white font on a white background to tell a few fibs might fool an algorithm, when this won’t be seen by a human in order to fool them!

An algorithm can be biased through the programmer who programmed it. A machine-learning AI can be biased because of what it has been taught via its training data. These biases could reflect and thus reinforce pre-existing wider-society prejudices e.g. racial biases against those with darker skin, accent biases when transcribing speech to text before assessing what’s said (you’ve probably seen how automated captions for videos occasionally get the words and thus intended meanings completely incorrect), and neurodiversity biases (like mistakenly judging an autistic person who fidgets or avoids eye contact as distracted or disinterested). A human, meanwhile, could ask extra questions for clarification. An organisation could adjust the parameters of an algorithm or adjust the score afterwards (arbitrarily?) if a candidate discloses their mental disorder – but a candidate might not wish to disclose it in case they face direct discrimination for it. Some neurodivergent people will prefer an AI over a human, and some won’t. Learn more about algorithms in Post No.: 0552.

Algorithms are ultimately made by and implemented by humans – thus they reflect all the faults and prejudices of humans. But then the alternative is human interviewers – and they definitely reflect all the faults and prejudices of humans(!)

But many people feel better being judged by another human (whether they really ought to feel this way or not because algorithms are far less whimsical and thus less arbitrary) and would like the reasons for why they failed an application to be given by a human (whether these reasons given by a human will be the true reasons or not because someone could be sexist but express that the reason for your rejection was your lack of experience).

Woof. So there are pros and cons, and plenty of experts are against using this technology in this area, at present at least. It’s a highly evolving space regarding the technology and the ethics. Perhaps it’s not a problem with the whole idea of using algorithms in recruitment but that the technology, implementation and solving the bias problems aren’t yet refined enough. (A bit like facial recognition technologies in law enforcement.) And until that point, its use can produce plenty of real or perceived unfairness.

Comment on this post by replying to this tweet:

FINDING THE RIGHT CANDIDATE FOR THE ROLE