Data Fusion Using Weakly Aligned Sources
成果类型:
Article; Early Access
署名作者:
Li, Sijia; Gilbert, Peter B.; Duan, Rui; Luedtke, Alex
署名单位:
University of Washington; University of Washington Seattle; Fred Hutchinson Cancer Center; Harvard University; Harvard T.H. Chan School of Public Health; University of Washington; University of Washington Seattle
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2025.2476780
发表日期:
2025
关键词:
large-sample theory
nonparametric-estimation
empirical distributions
randomized-trial
tool
摘要:
We introduce a new data fusion method that uses multiple data sources to estimate a smooth, finite-dimensional parameter. Most existing methods only make use of fully aligned data sources that share common conditional distributions of one or more variables of interest. However, in many settings, the scarcity of fully aligned sources can make existing methods require unduly large sample sizes to be useful. Our approach enables the incorporation of weakly aligned data sources that are not perfectly aligned, provided their degree of misalignment is known up to finite-dimensional parameters. We quantify the additional efficiency gains achieved through the integration of these weakly aligned sources. We characterize the semiparametric efficiency bound and provide a general means to construct estimators achieving these efficiency gains. We illustrate our results by fusing data from two harmonized HIV monoclonal antibody prevention efficacy trials to study how a neutralizing antibody biomarker associates with HIV genotype. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.