On inference in high-dimensional logistic regression models with separated data
成果类型:
Article
署名作者:
Lewis, R. M.; Battey, H. S.
署名单位:
Imperial College London
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/asad065
发表日期:
2024
关键词:
nonconcave penalized likelihood
confidence-regions
asymptotics
EXISTENCE
selection
binary
tests
摘要:
Direct use of the likelihood function typically produces severely biased estimates when the dimension of the parameter vector is large relative to the effective sample size. With linearly separable data generated from a logistic regression model, the loglikelihood function asymptotes and the maximum likelihood estimator does not exist. We show that an exact analysis for each regression coefficient produces half-infinite confidence sets for some parameters when the data are separable. Such conclusions are not vacuous, but an honest portrayal of the limitations of the data. Finite confidence sets are only achievable when additional, perhaps implicit, assumptions are made. Under a notional double-asymptotic regime in which the dimension of the logistic coefficient vector increases with the sample size, the present paper considers the implications of enforcing a natural constraint on the vector of logistic transformed probabilities. We derive a relationship between the logistic coefficients and a notional parameter obtained as a probability limit of an ordinary least-squares estimator. The latter exists even when the data are separable. Consistency is ascertained under weak conditions on the design matrix.