-
作者:Chen, Elynn Y.; Fan, Jianqing
作者单位:University of California System; University of California Berkeley; Princeton University
摘要:This article considers the estimation and inference of the low-rank components in high-dimensional matrix-variate factor models, where each dimension of the matrix-variates (p x q) is comparable to or greater than the number of observations (T). We propose an estimation method called alpha-PCA that preserves the matrix structure and aggregates mean and contemporary covariance through a hyper-parameter alpha. We develop an inferential theory, establishing consistency, the rate of convergence, a...
-
作者:Law, Michael; Buehlmann, Peter
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich
-
作者:Zheng, Jiayin; Dong, Xinyuan; Newton, Christina C.; Hsu, Li
作者单位:Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle; American Cancer Society
摘要:Cancer is a heterogeneous disease, and rapid progress in sequencing and -omics technologies has enabled researchers to characterize tumors comprehensively. This has stimulated an intensive interest in studying how risk factors are associated with various tumor heterogeneous features. The Cancer Prevention Study-II (CPS-II) cohort is one of the largest prospective studies, particularly valuable for elucidating associations between cancer and risk factors. In this article, we investigate the ass...
-
作者:Chevallier, Augustin; Fearnhead, Paul; Sutton, Matthew
作者单位:Lancaster University; Queensland University of Technology (QUT)
摘要:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), has recently shown great promise: they are nonreversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selecti...
-
作者:Li, Zhu; Su, Weijie J.; Sejdinovic, Dino
作者单位:University of London; University College London; University of Pennsylvania; University of Oxford
摘要:Modern machine learning models often exhibit the benign overfitting phenomenon, which has recently been characterized using the double descent curves. In addition to the classical U-shaped learning curve, the learning risk undergoes another descent as we increase the number of parameters beyond a certain threshold. In this article, we examine the conditions under which benign overfitting occurs in the random feature (RF) models, that is, in a two-layer neural network with fixed first layer wei...
-
作者:Chen, Yinyin; He, Shishuang; Yang, Yun; Liang, Feng
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal theoretical investigation of the statistical identifiability and accuracy of latent topic estimation. In this article, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood that is naturally connected to the...
-
作者:Deng, Hang; Han, Qiyang; Sen, Bodhisattva
作者单位:Rutgers University System; Rutgers University New Brunswick; Columbia University
摘要:In this article, we develop automated inference methods for local parameters in a collection of convexity constrained models based on the natural constrained tuning-free estimators. A canonical example is given by the univariate convex regression model, in which automated inference is drawn for the function value, the function derivative at a fixed interior point, and the anti-mode of the convex regression function, based on the widely used tuning-free, piecewise linear convex least squares es...
-
作者:Nishimura, Akihiko; Suchard, Marc A.
作者单位:Johns Hopkins University; University of California System; University of California Los Angeles
摘要:In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10(5)-10(6) and of 10(4) -10(5). Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the large n and large psetting, however, the required poster...
-
作者:Miao, Rui; Zhang, Xiaoke; Wong, Raymond K. W.
作者单位:George Washington University; Texas A&M University System; Texas A&M University College Station
摘要:Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) to measure the dependency between two random functions. We develop a two-step p...
-
作者:Kong, Xin-Bing; Lin, Jin-Guan; Liu, Cheng; Liu, Guang-Ying
作者单位:Nanjing Audit University; Wuhan University
摘要:In this article, we study the discrepancy between the global principal component analysis (GPCA) and local principal component analysis (LPCA) in recovering the common components of a large-panel high-frequency data. We measure the discrepancy by the total sum of squared differences between common components reconstructed from GPCA and LPCA. The asymptotic distribution of the discrepancy measure is provided when the factor space is time invariant as the dimension p and sample size n tend to in...