-
作者:Fan, Jianqing; Fan, Yingying
作者单位:Princeton University; University of Southern California; Harvard University
摘要:Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is poorly understood. In a seminal paper, Bickel and Levina [Bernoulli 10 (2004) 989-1010] show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even...
-
作者:Johnstone, Iain M.
作者单位:Stanford University
摘要:Let A and B be independent, central Wishart matrices in p variables with common covariance and having in and n degrees of freedom, respectively. The distribution of the largest eigenvalue of (A + B)(-1) B has numerous applications in multivariate statistics, but is difficult to calculate exactly. Suppose that in and n grow in proportion to p. We show that after centering and scaling, the distribution is approximated to second-order, O(p(-2/3)), by the Tracy-Widom law. The results are obtained ...
-
作者:El Karoui, Noureddine
作者单位:University of California System; University of California Berkeley
摘要:Estimating covariance matrices is a problem of fundamental importance in multivariate statistics. In practice it is increasingly frequent to work with data matrices X of dimension if x p, where p and n are both large. Results from random matrix theory show very clearly that in this setting, standard estimators like the sample covariance matrix perform in general very poorly. In this large n, large p setting, it is sometimes the case that practitioners are willing to assume that many elements o...
-
作者:El Karoui, Noureddine
作者单位:University of California System; University of California Berkeley
摘要:Estimating the eigenvalues of a population covariance matrix from a sample covariance matrix is a problem of fundamental importance in multivariate statistics; the eigenvalues of covariance matrices play a key role in many widely used techniques, in particular in principal component analysis (PCA). In many modern data analysis problems, statisticians are faced with large datasets where the sample size, n, is of the same order of magnitude as the number of variables p. Random matrix theory pred...
-
作者:Rajaratnam, Bala; Massam, Helene; Carvalho, Carlos M.
作者单位:Stanford University; York University - Canada; University of Chicago
摘要:In this paper, we propose a class of Bayes estimators for the covariance matrix of graphical Gaussian models Markov with respect to a decomposable graph G. Working with the W-PG family defined by Letac and Massam [Ann. Statist. 35 (2007) 1278-1323] we derive closed-form expressions for Bayes estimators under the entropy and squared-error losses. The W-PG family includes the classical inverse of the hyper inverse Wishart but has many more shape parameters, thus allowing for flexibility in diffe...
-
作者:Nadler, Boaz
作者单位:Weizmann Institute of Science
摘要:Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n -> infinity. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the...
-
作者:Bickel, Peter J.; Levina, Elizaveta
作者单位:University of California System; University of California Berkeley; University of Michigan System; University of Michigan
摘要:This paper considers regularizing a covariance matrix of p variables estimated from it observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n -> 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resamp...
-
作者:Bickel, Peter
作者单位:University of California System; University of California Berkeley
-
作者:Rao, N. Raj; Mingo, James A.; Speicher, Roland; Edelman, Alan
作者单位:Massachusetts Institute of Technology (MIT); Massachusetts Institute of Technology (MIT); Queens University - Canada; Massachusetts Institute of Technology (MIT)
摘要:We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., eigen-inference). Results found in the literature establish the asymptotic normality of the fluctuation in the trace of po...
-
作者:Anderson, Greg W.; Zeitouni, Ofer
作者单位:University of Minnesota System; University of Minnesota Twin Cities
摘要:We consider the spectral properties of a class of regularized estimators of (large) empirical covariance matrices corresponding to stationary (but not necessarily Gaussian) sequences, obtained by banding. We prove a law of large numbers (similar to that proved in the Gaussian case by Bickel and Levina), which implies that the spectrum of a banded empirical covariance matrix is an efficient estimator. Our main result is a central limit theorem in the same regime, which to our knowledge is new, ...