STATISTICAL INFERENCE FOR REGRESSION WITH IMPUTED BINARY COVARIATES WITH APPLICATION TO EMOTION RECOGNITION

成果类型:
Article
署名作者:
Lin, Ziqian; Huang, Danyang; Xiong, Ziyu; Wang, Hansheng
署名单位:
Peking University; Renmin University of China; Renmin University of China
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1961
发表日期:
2025
页码:
329-350
关键词:
multiple imputation likelihood selection
摘要:
In the flourishing live streaming industry, accurate recognition of streamers' emotions has become a critical research focus with profound implications for audience engagement and content optimization. However, precise emotion coding typically requires manual annotation by trained experts, making it extremely expensive and time-consuming to obtain complete observational data for large-scale studies. Motivated by this challenge in streamer emotion recognition, we develop here a novel imputation method together with a principled statistical inference procedure for analyzing partially observed binary data. Specifically, we assume for each observation an auxiliary feature vector, which is sufficiently cheap to be fully collected for the whole sample. We next assume a small pilot sample with both the target binary covariates (i.e., the emotion status) and the auxiliary features fully observed, of which the size could be considerably smaller than that of the whole sample. Thereafter, a regression model can be constructed for the target binary covariates and the auxiliary features. This enables us to impute the missing binary features using the fully observed auxiliary features for the entire sample. We establish the associated asymptotic theory for principled statistical inference and present extensive simulation experiments, demonstrating the effectiveness and theoretical soundness of our proposed method. Furthermore, we validate our approach using a comprehensive dataset on emotion recognition in live streaming, demonstrating that our imputation method yields smaller standard errors and is more statistically efficient than using pilot data only. Our findings have significant implications for enhancing user experience and optimizing engagement on streaming platforms.