MODEL SELECTION UNCERTAINTY AND STABILITY IN BETA REGRESSION MODELS: A STUDY OF BOOTSTRAP-BASED MODEL AVERAGING WITH AN EMPIRICAL APPLICATION TO CLICKSTREAM DATA
成果类型:
Article
署名作者:
Allenbrand, Corban; Sherwood, Ben
署名单位:
University of Kansas
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/22-AOAS1647
发表日期:
2023
页码:
680-710
关键词:
influence diagnostics
online
prediction
摘要:
Statistical model development is a central feature of many scientific in-vestigations with a vast methodological landscape. However, uncertainty in the model development process has received less attention and is frequently resolved nonrigorously through beliefs about generalizability, practical use-fulness, and computational ease. This is particularly problematic in settings of abundant data, such as clickstream data, as model selection routinely ad-mits multiple models and imposes a source of uncertainty, unacknowledged and unknown by many, on all postselection conclusions. Regression models, based on the beta distribution, are a class of nonlinear models, attractive be-cause of their great flexibility and potential explanatory power, but have not been investigated from the standpoint of multimodel uncertainty and model averaging. For this reason a formalized tool that can combine model selec-tion uncertainty and beta regression modeling is presented in this work. The tool combines bootstrap model averaging, model selection, and asymptotic theory to yield a procedure that can perform joint modeling of the mean and precision parameters, capture sources of variability in the data, and achieve more accurate claims of estimate precision, variable importance, generaliza-tion performance, and model stability. Practical utility of the tool is demon-strated through a study of model selection consistency and variable impor-tance in average exit and bounce rate statistical models. This work empha-sizes the necessity of a departure from the all-too-common practice of ig-noring model selection uncertainty and introduces an accessible technique to handle frequently neglected aspects of the modeling pipeline.
来源URL: