TESTING FOR THE RANK OF A COVARIANCE OPERATOR
成果类型:
Article
署名作者:
Charkaborty, Anirvan; Panaretos, Victor M.
署名单位:
Indian Institute of Science Education & Research (IISER) - Kolkata; Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2238
发表日期:
2022
页码:
3510-3537
关键词:
PRINCIPAL COMPONENTS
finite dimensionality
number
摘要:
How can we discern whether the covariance operator of a stochastic pro-cess is of reduced rank, and if so, what its precise rank is? And how can we do so at a given level of confidence? This question is central to a great deal of methods for functional data, which require low-dimensional representa-tions whether by functional PCA or other methods. The difficulty is that the determination is to be made on the basis of i.i.d. replications of the process observed discretely and with measurement error contamination. This adds a ridge to the empirical covariance, obfuscating the underlying dimension. We build a matrix-completion inspired test statistic that circumvents this issue by measuring the best possible least square fit of the empirical covariance's off -diagonal elements, optimised over covariances of given finite rank. For a fixed grid of sufficiently large size, we determine the statistic's asymptotic null dis-tribution as the number of replications grows. We then use it to construct a bootstrap implementation of a stepwise testing procedure controlling the fam-ilywise error rate corresponding to the collection of hypotheses formalising the question at hand. Under minimal regularity assumptions, we prove that the procedure is consistent and that its bootstrap implementation is valid. The procedure circumvents smoothing and associated smoothing parameters, is indifferent to measurement error heteroskedasticity, and does not assume a low-noise regime. An extensive simulation study reveals an excellent practi-cal performance, stably across a wide range of settings and the procedure is further illustrated by means of two data analyses.