-
作者:Cai, T. Tony; Zhang, Linjun
作者单位:University of Pennsylvania; Rutgers University System; Rutgers University New Brunswick
摘要:In this paper, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates for the classification error. Minimax lower bounds are established to demonstrate the necessity of structural assumptions such as sparsity conditions on the discriminating direction and differential graph for the possible construction of consistent high-dimensional QDA rules. We then propose a classification algorithm called SDAR using constrained convex opti...
-
作者:Vovk, Vladimir; Wang, Ruodu
作者单位:University of London; Royal Holloway University London; University of Waterloo
摘要:Multiple testing of a single hypothesis and testing multiple hypotheses are usually done in terms of p-values. In this paper, we replace p-values with their natural competitor, e-values, which are closely related to betting, Bayes factors and likelihood ratios. We demonstrate that e-values are often mathematically more tractable; in particular, in multiple testing of a single hypothesis, e-values can be merged simply by averaging them. This allows us to develop efficient procedures using e-val...
-
作者:Dobriban, Edgar; Sheng, Yue
作者单位:University of Pennsylvania; University of Pennsylvania
摘要:Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck. In this paper, we study one-step and iterative weighted parameter averaging in statistical linear models under data parallelism. We do linear regression on each machine, send the results to a central server and take a weighted average of the parameters. Opti...
-
作者:Kim, Ilmun; Ramdas, Aaditya; Singh, Aarti; Wasserman, Larry
作者单位:Carnegie Mellon University
摘要:When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We prove two results that hold for all classifiers in any dimensions: if its true error remains epsilon-better than chance for some epsilon > 0 as d, n -> infinity, then (a) the permutation-based test is consistent (has power approaching to on...
-
作者:Loh, Wei-Liem; Sun, Saifei; Wen, Jun
作者单位:National University of Singapore
摘要:A method is proposed for estimating the microergodic parameters (including the smoothness parameter) of stationary Gaussian random fields on R-d with isotropic Matern covariance functions using irregularly spaced data. This approach uses higher-order quadratic variations and is applied to three designs, namely stratified sampling design, randomized sampling design and deformed lattice design. Microergodic parameter estimators are constructed for each of the designs. Under mild conditions, thes...
-
作者:Jacod, Jean; Li, Jia; Lia, Zhipeng
作者单位:Universite Paris Cite; Sorbonne Universite; Duke University; University of California System; University of California Los Angeles
摘要:This paper provides a strong approximation, or coupling, theory for spot volatility estimators formed using high-frequency data. We show that the t-statistic process associated with the nonparametric spot volatility estimator can be strongly approximated by a growing-dimensional vector of independent variables defined as functions of Brownian increments. We use this coupling theory to study the uniform inference for the volatility process in an infill asymptotic setting. Specifically, we propo...
-
作者:Ghosh, Satyajit; Khare, Kshitij; Michailidis, George
作者单位:Rutgers University System; Rutgers University New Brunswick; State University System of Florida; University of Florida; State University System of Florida; University of Florida
摘要:Vector autoregressive (VAR) models aim to capture linear temporal interdependencies among multiple time series. They have been widely used in macroeconomics and financial econometrics and more recently have found novel applications in functional genomics and neuroscience. These applications have also accentuated the need to investigate the behavior of the VAR model in a high-dimensional regime, which will provide novel insights into the role of temporal dependence for regularized estimates of ...
-
作者:Kleijn, B. J. K.
作者单位:University of Amsterdam
摘要:To the frequentist who computes posteriors, not all priors are useful asymptotically: in this paper, a Bayesian perspective on test sequences is proposed and Schwartz's Kullback-Leibler condition is generalised to widen the range of frequentist applications of posterior convergence. With Bayesian tests and a weakened form of contiguity termed remote contiguity, we prove simple and fully general frequentist theorems, for posterior consistency and rates of convergence, for consistency of posteri...
-
作者:Lugosi, Gabor; Mendelson, Shahar
作者单位:ICREA; Pompeu Fabra University; Australian National University
摘要:We consider the problem of estimating the mean of a random vector based on i.i.d. observations and adversarial contamination. We introduce a multivariate extension of the trimmed-mean estimator and show its optimal performance under minimal conditions.
-
作者:Cai, T. Tony; Wang, Yichen; Zhang, Linjun
作者单位:University of Pennsylvania; Rutgers University System; Rutgers University New Brunswick
摘要:Privacy-preserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical low-dimensional and modern high-dimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the (s, 8)-differential privacy...