-
作者:Butucea, Cristina; Rohde, Angelika; Steinberger, Lukas
作者单位:Institut Polytechnique de Paris; ENSAE Paris; University of Freiburg; University of Vienna
摘要:Local differential privacy has recently received increasing attention from the statistics community as a valuable tool to protect the privacy of individual data owners without the need of a trusted third party. Similar to the classical notion of randomized response, the idea is that data owners randomize their true information locally and only release the perturbed data. Many different protocols for such local perturbation procedures can be designed. In most estimation problems studied in the ...
-
作者:Einmahl, John H. J.; He, Yi
作者单位:Tilburg University; Tilburg University; University of Amsterdam
摘要:We extend extreme value statistics to independent data with possibly very different distributions. In particular, we present novel asymptotic normality results for the Hill estimator, which now estimates the extreme value index of the average distribution. Due to the heterogeneity, the asymptotic variance can be substantially smaller than that in the i.i.d. case. As a special case, we consider a heterogeneous scales model where the asymptotic variance can be calculated explicitly. The primary ...
-
作者:Han, Qiyang; Shen, Yandi
作者单位:Rutgers University System; Rutgers University New Brunswick; University of Chicago
摘要:The Convex Gaussian Min-Max Theorem (CGMT) has emerged as a prominent theoretical tool for analyzing the precise stochastic behavior of various statistical estimators in the so-called high-dimensional proportional regime, where the sample size and the signal dimension are of the same order. However, a well-recognized limitation of the existing CGMT machinery rests in its stringent requirement on the exact Gaussianity of the design matrix, therefore rendering the obtained precise high-dimension...
-
作者:Szabo, Botond; Vuursteen, Lasse; van Zanten, Harry
作者单位:Bocconi University; Delft University of Technology; Vrije Universiteit Amsterdam
摘要:We derive minimax testing errors in a distributed framework where the data is split over multiple machines and their communication to a central ma-chine is limited to b bits. We investigate both the d- and infinite-dimensional signal detection problem under Gaussian white noise. We also derive dis-tributed testing algorithms reaching the theoretical lower bounds. Our results show that distributed testing is subject to fundamentally dif-ferent phenomena that are not observed in distributed esti...
-
作者:Barber, Rina Foygel; Candes, Emmanuel J.; Ramdas, Aaditya; Tibshirani, Ryan J.
作者单位:University of Chicago; Stanford University; Carnegie Mellon University
摘要:Conformal prediction is a popular, modern technique for providing valid predictive inference for arbitrary machine learning models. Its validity relies on the assumptions of exchangeability of the data, and symmetry of the given model fitting algorithm as a function of the data. However, exchangeability is often violated when predictive models are deployed in practice. For example, if the data distribution drifts over time, then the data points are no longer ex-changeable; moreover, in such se...
-
作者:Bellec, Pierre C.; Zhang, Cun-Hui
作者单位:Rutgers University System; Rutgers University New Brunswick
摘要:New upper bounds are developed for the L2 distance between & xi;/ Var[& xi;]1/2 and linear and quadratic functions of z & SIM; N(0, In) for random vari-ables of the form & xi; = z ⠃f (z) - div f (z). The linear approximation yields a central limit theorem when the squared norm of f (z) dominates the squared Frobenius norm of backward difference f (z) in expectation.Applications of this normal approximation are given for the asymptotic normality of debiased estimators in linear regression with...
-
作者:Panigrahi, Snigdha
作者单位:University of Michigan System; University of Michigan
摘要:Complex studies involve many steps. Selecting promising findings based on pilot data is a first step. As more observations are collected, the investigator must decide how to combine the new data with the pilot data to construct valid selective inference. Carving, introduced by Fithian, Sun and Taylor (2014), enables the reuse of pilot data during selective inference and accounts for overoptimism from the selection process. However, currently, carving is only justified for parametric models suc...
-
作者:Zhang, Linfan; Amini, Arash a.
作者单位:University of California System; University of California Los Angeles
摘要:We propose a goodness-of-fit test for degree-corrected stochastic block models (DCSBM). The test is based on an adjusted chi-square statistic for measuring equality of means among groups of n multinomial distributions with d(1), ... , d(n) observations. In the context of network models, the num-ber of multinomials, n, grows much faster than the number of observations, di, corresponding to the degree of node i, hence the setting deviates from classical asymptotics. We show that a simple adjustm...
-
作者:Avella-medina, Marco; Bradshaw, Casey; Loh, Po-ling
作者单位:Columbia University; University of Cambridge
摘要:We propose a general optimization-based framework for computing differentially private M-estimators and a new method for constructing differentially private confidence regions. First, we show that robust statistics can be used in conjunction with noisy gradient descent or noisy Newton methods in order to obtain optimal private estimators with global linear or quadratic convergence, respectively. We establish local and global convergence guarantees, under both local strong convexity and self-co...
-
作者:Bu, Zhiqi; Klusowski, Jason M.; Rush, Cynthia; Su, Weijie J.
作者单位:University of Pennsylvania; Princeton University; Columbia University; University of Pennsylvania
摘要:Sorted l(1) regularization has been incorporated into many methods for solving high-dimensional statistical estimation problems, including the SLOPE estimator in linear regression. In this paper, we study how this rel-atively new regularization technique improves variable selection by charac-terizing the optimal SLOPE trade-off between the false discovery proportion (FDP) and true positive proportion (TPP) or, equivalently, between measures of type I error and power. Assuming a regime of linea...