-
作者:Zhang, Zhixiang; Zheng, Shurong; Pan, Guangming; Zhong, Ping-Shou
作者单位:Nanyang Technological University; Northeast Normal University - China; University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital
摘要:We consider general high-dimensional spiked sample covariance models and show that their leading sample spiked eigenvalues and their linear spectral statistics are asymptotically independent when the sample size and dimension are proportional to each other. As a byproduct, we also establish the central limit theorem of the leading sample spiked eigenvalues by removing the block diagonal assumption on the population covariance matrix, which is commonly needed in the literature. Moreover, we pro...
-
作者:Feng, Huijie; Ning, Yang; Zhao, Jiwei
作者单位:Cornell University; University of Wisconsin System; University of Wisconsin Madison
摘要:Given a large number of covariates Z, we consider the estimation of a high-dimensional parameter theta in an individualized linear threshold theta(T) Z for a continuous variable X, which minimizes the disagreement between sign(X - theta(T) Z) and a binary response Y. While the problem can be formulated into the M-estimation framework, minimizing the corresponding empirical risk function is computationally intractable due to discontinuity of the sign function. Moreover, estimating theta even in...
-
作者:Ndaoud, Mohamed
作者单位:ESSEC Business School
摘要:In this paper, we study the problem of clustering in the Two component Gaussian mixture model when the centers are separated by some Delta > 0. We present a nonasymptotic lower bound for the corresponding minimax Hamming risk improving on existing results. We also propose an optimal, efficient and adaptive procedure that is minimax rate optimal. The rate optimality is moreover sharp in the asymptotics when the sample size goes to infinity. Our procedure is based on a variant of Lloyd's iterati...
-
作者:Cai, T. Tony; Wei, Hongji
作者单位:University of Pennsylvania
摘要:Distributed estimation of a Gaussian mean with unknown variance under communication constraints is studied. Necessary and sufficient communication costs under different types of distributed protocols are derived for any estimator that is adaptively rate-optimal over a range of possible values for the variance. Communication-efficient and statistically optimal procedures are developed. The analysis reveals an interesting and important distinction among different types of distributed protocols: ...
-
作者:Chzhen, Evgenii; Schreuder, Nicolas
作者单位:Universite Paris Saclay; Centre National de la Recherche Scientifique (CNRS); University of Genoa
摘要:We propose a theoretical framework for the problem of learning a real-valued function which meets fairness requirements. This framework is built upon the notion of alpha-relative (fairness) improvement of the regression function which we introduce using the theory of optimal transport. Setting alpha = 0 corresponds to the regression problem under the Demographic Parity constraint, while alpha = 1 corresponds to the classical regression problem without any constraints. For alpha is an element o...
-
作者:Banerjee, Debapratim; Ma, Zongming
作者单位:University of Pennsylvania
摘要:We study signal detection by likelihood ratio tests in a number of spiked random matrix models, including but not limited to Gaussian mixtures and spiked Wishart covariance matrices. We work directly with multi-spiked cases in these models and with flexible priors on signal components that allow dependence across spikes. We derive asymptotic normality for the log-likelihood ratios when the signal-to-noise ratios are below certain bounds. In addition, the log-likelihood ratios can be asymptotic...
-
作者:Stoltenberg, Emil A.; Mykland, Per A.; Zhang, Lan
作者单位:BI Norwegian Business School; University of Chicago; University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital
摘要:In this paper, we introduce a general method for estimating the quadratic covariation of one or more spot parameter processes associated with continuous time semimartingales, and present a central limit theorem that has this class of estimators as one of its applications. The class of estimators we introduce, that we call Two-Scales Quadratic Covariation (TSQC) estimators, is based on sums of increments of second differences of the observed processes, and the intervals over which the differenc...
-
作者:Chinot, Geoffrey; Loeffler, Matthias; van de Geer, Sara
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich
摘要:This article develops a general theory for minimum norm interpolating estimators and regularized empirical risk minimizers (RERM) in linear models in the presence of additive, potentially adversarial, errors. In particular, no conditions on the errors are imposed. A quantitative bound for the prediction error is given, relating it to the Rademacher complexity of the covariates, the norm of the minimum norm interpolator of the errors and the size of the subdifferential around the true parameter...
-
作者:Dobriban, Edgar
作者单位:University of Pennsylvania
摘要:Invariance-based randomization tests-such as permutation tests, rotation tests, or sign changes-are an important and widely used class of statistical methods. They allow drawing inferences under weak assumptions on the data distribution. Most work focuses on their type I error control properties, while their consistency properties are much less understood. We develop a general framework to study the consistency of invariance-based randomization tests, assuming the data is drawn from a signal-p...
-
作者:Mao, Cheng; Wu, Yihong
作者单位:University System of Georgia; Georgia Institute of Technology; Yale University
摘要:In applications such as rank aggregation, mixture models for permutations are frequently used when the population exhibits heterogeneity. In this work, we study the widely used Mallows mixture model. In the high-dimensional setting, we propose a polynomial-time algorithm that learns a Mallows mixture of permutations on n elements with the optimal sample complexity that is proportional to log n, improving upon previous results that scale polynomially with n. In the high-noise regime, we charact...