Are Automated Video Interviews Smart Enough? Behavioral Modes, Reliability, Validity, and Bias of Machine Learning Cognitive Ability Assessments
成果类型:
Article
署名作者:
Hickman, Louis; Tay, Louis; Woo, Sang Eun
署名单位:
Virginia Polytechnic Institute & State University; Purdue University System; Purdue University
刊物名称:
JOURNAL OF APPLIED PSYCHOLOGY
ISSN/ISSBN:
0021-9010
DOI:
10.1037/apl0001236
发表日期:
2025
页码:
314-335
关键词:
Artificial intelligence
intelligence
Natural Language Processing
verbal behavior
linguistic inquiry and word count
摘要:
Automated video interviews (AVIs) that use machine learning (ML) algorithms to assess interviewees are increasingly popular. Extending prior AVI research focusing on noncognitive constructs, the present study critically evaluates the possibility of assessing cognitive ability with AVIs. By developing and examining AVI ML models trained to predict measures of three cognitive ability constructs (i.e., general mental ability, verbal ability, and intellect [as observed at zero acquaintance]), this research contributes to the literature in several ways. First, it advances our understanding of how cognitive abilities relate to interviewee behavior. Specifically, we found that verbal behaviors best predicted interviewee cognitive abilities, while neither paraverbal nor nonverbal behaviors provided incremental validity, suggesting that only verbal behaviors should be used to assess cognitive abilities. Second, across two samples of mock video interviews, we extensively evaluated the psychometric properties of the verbal behavior AVI ML model scores, including their reliability (internal consistency across interview questions and test-retest), validity (relationships with other variables and content), and fairness and bias (measurement and predictive). Overall, the general mental ability, verbal ability, and intellect AVI models captured similar behavioral manifestations of cognitive ability. Validity evidence results were mixed: For example, AVIs trained on observer-rated intellect exhibited superior convergent and criterion relationships (compared to the observer ratings they were trained to model) but had limited discriminant validity evidence. Our findings illustrate the importance of examining psychometric properties beyond convergence with the test that ML algorithms are trained to model. We provide recommendations for enhancing discriminant validity evidence in future AVIs.
来源URL: