-
作者:Han, Rungang; Willett, Rebecca; Zhang, Anru R.
作者单位:University of Wisconsin System; University of Wisconsin Madison; University of Chicago; University of Chicago
摘要:This paper describes a flexible framework for generalized low-rank tensor estimation problems that includes many important instances arising from applications in computational imaging, genomics, and network analysis. The proposed estimator consists of finding a low-rank tensor fit to the data under generalized parametric models. To overcome the difficulty of nonconvexity in these problems, we introduce a unified approach of projected gradient descent that adapts to the underlying low-rank stru...
-
作者:Gao, Chao; Ma, Zongming
作者单位:University of Chicago; University of Pennsylvania
摘要:In this paper, we test whether two data sets measured on the same set of subjects share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two deterministic two-component mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the p...
-
作者:Yuan, Mingao; Liu, Ruiqi; Feng, Yang; Shang, Zuofeng
作者单位:North Dakota State University Fargo; Texas Tech University System; Texas Tech University; New York University; New Jersey Institute of Technology
摘要:Many complex networks in the real world can be formulated as hypergraphs where community detection has been widely used. However, the fundamental question of whether communities exist or not in an observed hypergraph remains unclear. This work aims to tackle this important problem. Specifically, we systematically study when a hypergraph with community structure can be successfully distinguished from its Erdos-Renyi counterpart, and propose concrete test statistics when the models are distingui...
-
作者:Silin, Igor; Fan, Jianqing
作者单位:Princeton University
摘要:We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regr...
-
作者:Depersin, Jules; Lecue, Guillaume
作者单位:Institut Polytechnique de Paris; ENSAE Paris
摘要:We construct an algorithm for estimating the mean of a heavy-tailed random variable when given an adversarial corrupted sample of N independent observations. The only assumption we make on the distribution of the non-corrupted (or informative) data is the existence of a covariance matrix Sigma, unknown to the statistician. Our algorithm outputs (mu) over cap, which is robust to the presence of vertical bar O vertical bar adversarial outliers and satisfies parallel to(mu) over cap - mu parallel...
-
作者:Einmahl, John H. J.; Ferreira, Ana; de Haan, Laurens; Neves, Claudia; Zhou, Chen
作者单位:Tilburg University; Universidade de Lisboa; Universidade de Lisboa; Erasmus University Rotterdam - Excl Erasmus MC; Erasmus University Rotterdam; University of Reading; Erasmus University Rotterdam; Erasmus University Rotterdam - Excl Erasmus MC
摘要:The statistical theory of extremes is extended to independent multivariate observations that are non-stationary both over time and across space. The non-stationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail quantile process based on all observations, taken over time and...
-
作者:Wong, Kin Yau; Zeng, Donglin; Lin, D. Y.
作者单位:Hong Kong Polytechnic University; University of North Carolina; University of North Carolina Chapel Hill
摘要:In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve e...
-
作者:Vovk, Vladimir; Wang, Bin; Wang, Ruodu
作者单位:University of London; Royal Holloway University London; Chinese Academy of Sciences; University of Waterloo
摘要:Methods of merging several p-values into a single p-value are important in their own right and widely used in multiple hypothesis testing. This paper is the first to systematically study the admissibility (in Wald's sense) of p-merging functions and their domination structure, without any information on the dependence structure of the input p-values. As a technical tool, we use the notion of e-values, which are alternatives to p-values recently promoted by several authors. We obtain several re...
-
作者:Zhang, Anru R.; Cai, T. Tony; Wu, Yihong
作者单位:University of Wisconsin System; University of Wisconsin Madison; Duke University; University of Pennsylvania; Yale University
摘要:A general framework for principal component analysis (PCA) in the presence of heteroskedastic noise is introduced. We propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries of the sample covariance matrix to remove estimation bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspa...
-
作者:Celentano, Michael; Montanari, Andrea
作者单位:Stanford University
摘要:In high-dimensional regression, we attempt to estimate a parameter vector beta(0) is an element of R-P from n less than or similar to p observations {(y(i) , x(i))}(i <= n) , where x(i) is an element of R-P is a vector of predictors and y(i) is a response variable. A well-established approach uses convex regularizers to promote specific structures (e.g., sparsity) of the estimate (beta) over cap while allowing for practical algorithms. Theoretical analysis implies that convex penalization sche...