INCORPORATING AUXILIARY INFORMATION FOR IMPROVED STATISTICAL INFERENCE AND ITS EXTENSIONS TO DISTRIBUTED ALGORITHMS WITH AN APPLICATION TO PERSONAL CREDIT

成果类型:
Article
署名作者:
Yu, Miaomiao; Jiang, Zhongfeng; Li, Jiaxuan; Zhou, Yong
署名单位:
East China Normal University; East China Normal University; Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1909
发表日期:
2024
页码:
2863-2886
关键词:
metaanalysis likelihood models
摘要:
Personal credits have always been a hot topic in the society. Among all of them, the evaluation of default risk is particularly concerned since robust estimation, based on personal information, can both help needy individuals to get loans and financial institutions to avoid losses. So far, there have been no good solutions due to limited data, especially default information. With the advent of the era of big data, it is possible to improve the effectiveness of estimates by using auxiliary information from external studies or public domains. However, the individual-level data can not be gained directly because of the emphasis on data privacy; that is, only some summarized statistics with auxiliary information are allowed to be shared. To effectively utilize external integrated auxiliary information to improve the accuracy of default risk estimation, this paper introduces a unified auxiliary information framework, which is referred as enhanced GEE method, to effectively incorporate various external summary results by employing the generalized estimating equations (GEE) approach and augmenting a weighted logarithm of confidence density on GEE function. We establish asymptotic properties for the new method and prove that it can achieve the gain of statistical efficiency compared to the study-specific estimator without any auxiliary information. Besides, a low-cost Map-Reduce procedure for the distributed statistical inference of enhanced GEE method in big data is developed that can achieve the same efficiency as the oracle enhanced GEE approach under mild condition. This method is demonstrated by an application to predict the loan default risk of bank customers in Shanghai and shown to be more effective and reliable compared with the method based on the own data only. Furthermore, the superiorities of our approach, especially the construction of the tighter confidence intervals, are also illustrated with extensive simulation studies and a real personal default risk case.
来源URL: