A Manager and an AI Walk into a Bar: Does ChatGPT Make Biased Decisions Like We Do?

成果类型:
Article
署名作者:
Chen, Yang; Kirshner, Samuel N.; Ovchinnikov, Anton; Andiappan, Meena; Jenkin, Tracy
署名单位:
Western University (University of Western Ontario); University of New South Wales Sydney; Queens University - Canada; INSEAD Business School; McMaster University; University of Toronto; Vector Institute for Artificial Intelligence
刊物名称:
M&SOM-MANUFACTURING & SERVICE OPERATIONS MANAGEMENT
ISSN/ISSBN:
1523-4614
DOI:
10.1287/msom.2023.0279
发表日期:
2025
关键词:
large language models decision biases ChatGPT Behavioral operations management
摘要:
Problem definition: Large language models (LLMs) are being increasingly leveraged in business and consumer decision-making processes. Because LLMs learn from human data and feedback, which can be biased, determining whether LLMs exhibit human-like behavioral decision biases (e.g., neglect, risk aversion, confirmation bias, etc.) is crucial prior to implementing LLMs into decision-making contexts and workflows. To understand this, we examine 18 common human biases that are important in operations management (OM) using the dominant LLM, ChatGPT. Methodology/results: We perform experiments where GPT-3.5 and GPT-4 act as participants to test these biases using vignettes adapted from the literature (standard context) and variants reframed in inventory and general OM contexts. In almost half of the experiments, Generative Pre-trained Transformer (GPT) mirrors human biases, diverging from prototypical human responses in the remaining experiments. We also observe that GPT models have a notable level of consistency between the standard and OM-specific experiments as well as across temporal versions of the GPT-3.5 model. Our comparative analysis between GPT-3.5 and GPT-4 reveals a dual-edged progression of GPT's decision making, wherein GPT-4 advances in decision-making accuracy for problems with well-defined mathematical solutions while simultaneously displaying increased behavioral biases for preference-based problems. Managerial implications: First, our results highlight that managers will obtain the greatest benefits from deploying GPT to workflows leveraging established formulas. Second, that GPT displayed a high level of response consistency across the standard, inventory, and non-inventory operational contexts provides optimism that LLMs can offer reliable support even when details of the decision and problem contexts change. Third, although selecting between models, like GPT-3.5 and GPT-4, represents a trade-off in cost and performance, our results suggest that managers should invest in higherperforming models, particularly for solving problems with objective solutions.
来源URL: