Iterative automated record linkage using mixture models

成果类型:
Article
署名作者:
Larsen, MD; Rubin, DB
署名单位:
University of Chicago; Harvard University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214501750332956
发表日期:
2001
页码:
32-41
关键词:
linking personal records maximum-likelihood tables algorithm names
摘要:
The goal of record linkage is to link quickly and accurately records that correspond to the same person or entity. Whereas certain patterns of agreements and disagreements on variables are more likely among records pertaining to a single person than among records for different people, the observed patterns for pairs of records can be viewed as arising from a mixture of matches and nonmatches. Mixture model estimates can be used to partition record pairs into two or more groups that can be labeled as probable matches (links) and probable nonmatches (nonlinks). A method is proposed and illustrated that uses marginal information in the database to select mixture models, identifies sets of records for clerks to review based on the models and marginal information, incorporates clerically reviewed datal as they become available, into estimates of model parameters, and classifies pairs as links, nonlinks, or in need of further clerical review The procedure is illustrated with five datasets from the U.S. Bureau of the Census. it appears to be robust to variations in record-linkage sites. The clerical review corrects classifications of some pairs directly and leads to changes in classification of others through reestimation of mixture models.