Local Density Estimation in High Dimensions
成果类型:
Article; Early Access
署名作者:
Wu, Xian; Charikar, Moses; Natchu, Vishnu
署名单位:
University of California System; University of California Berkeley; Stanford University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2021.1221
发表日期:
2022
关键词:
摘要:
An important question that arises in the study of high-dimensional vector representations learned from data are, given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. We develop two estimators, LSH count and multiprobe count that use locality-sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and sample complexity of our schemes and demonstrate their effectiveness in experiments on a standard word embedding data set.
来源URL: