Measuring diversity in Hollywood through the large-scale computational analysis of film
成果类型:
Article
署名作者:
Bamman, David; Samberg, Rachael; So, Richard Jean; Zhou, Naitian
署名单位:
University of California System; University of California Berkeley; University of California System; University of California Berkeley; McGill University
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-12529
DOI:
10.1073/pnas.2409770121
发表日期:
2024-11-12
关键词:
sexual violence
media
摘要:
Movies are a massively popular and influential form of media, but their computational study at scale has largely been off-limits to researchers in the United States due to the Digital Millennium Copyright Act. In this work, we illustrate use of a new regulatory framework to enable computational research on film that permits circumvention of technological protection measures on digital video discs (DVDs). We use this exemption to legally digitize a collection of 2,307 films representing the top 50 movies by U.S. box office over the period 1980 to 2022, along with award nominees. We design a computational pipeline for measuring the representation of gender and race/ethnicity in film, drawing on computer vision models for recognizing actors and human perceptions of gender and race/ethnicity. Doing so allows us to learn substantive facts about representation and diversity in Hollywood over this period, confirming earlier studies that see an increase in diversity over the past decade, while allowing us to use computational methods to uncover a range of ad hoc analytical findings. Our work illustrates the affordances of the data-driven analysis of film at a large scale.