Automated Speech Recognition Bias in Personnel Selection: The Case of Automatically Scored Job Interviews
成果类型:
Article
署名作者:
Hickman, Louis; Langer, Markus; Saef, Rachel M.; Tay, Louis
署名单位:
Virginia Polytechnic Institute & State University; University of Freiburg; Northern Illinois University; Purdue University System; Purdue University
刊物名称:
JOURNAL OF APPLIED PSYCHOLOGY
ISSN/ISSBN:
0021-9010
DOI:
10.1037/apl0001247
发表日期:
2025
页码:
846-858
关键词:
justice
Artificial intelligence
Whisper
ADVERSE IMPACT
experiment
摘要:
Organizations, researchers, and software increasingly use automatic speech recognition (ASR) to transcribe speech to text. However, ASR can be less accurate for (i.e., biased against) certain demographic subgroups. This is concerning, given that the machine-learning (ML) models used to automatically score video interviews use ASR transcriptions of interviewee responses as inputs. To address these concerns, we investigate the extent of ASR bias and its effects in automatically scored interviews. Specifically, we compare the accuracy of ASR transcription for English as a second language (ESL) versus non-ESL interviewees, people of color (and Black interviewees separately) versus White interviewees, and male versus female interviewees. Then, we test whether ASR bias causes bias in ML model scores-both in terms of differential convergent correlations (i.e., subgroup differences in correlations between observed and ML scores) and differential means (i.e., shifts in subgroup differences from observed to ML scores). To do so, we apply one human and four ASR transcription methods to two samples of mock video interviews (Ns = 1,014 and 414), and then we train and test models using these different transcripts to score multiple constructs. We observed significant bias in the commercial ASR services across nearly all comparisons, with the magnitude of bias differing across the ASR services. However, the transcription bias did not translate into meaningful measurement bias for the ML interview scores-whether in terms of differential convergent correlations or means. We discuss what these results mean for the nature of bias, fairness, and validity of ML models for scoring verbal open-ended responses.
来源URL: