-
作者:Arias-Castro, Ery; Qiao, Wanli
作者单位:University of California System; University of California San Diego; University of California System; University of California San Diego; George Mason University
摘要:We adapt concepts, methodology, and theory originally developed in the areas of multidimensional scaling and dimensionality reduction for Euclidean data to be applicable to distributional data. We focus on classical scaling and Isomap-prototypical methods that have played important roles in these areas-and showcase their use in the context of distributional data analysis. In the process, we highlight the crucial role that the ambient metric plays.
-
作者:Dufour, Jean-Marie; Renault, Eric; Zinde-Walsh, Victoria
作者单位:McGill University; University of Warwick
摘要:This paper provides an exhaustive characterization of the asymptotic null distribution of Wald-type statistics for testing restrictions given by polynomial functions-which may involve singularities-when the limiting distribution of the parameter estimator is absolutely continuous (e.g., Gaussian). In addition to the well-known finite-sample noninvariance, there is also an asymptotic noninvariance (nonpivotality): with standard critical values, the test may either under-reject or over-reject, a...
-
作者:Van Delft, Anne; Blumberg, Andrew J.
作者单位:Columbia University; Columbia University
摘要:We introduce a new framework to analyze shape descriptors that capture the geometric features of an ensemble of point clouds. At the core of our approach is the point of view that the data arises as sampled recordings from a metric space-valued stochastic process, possibly of nonstationary nature, thereby integrating geometric data analysis into the realm of functional time series analysis. Our framework allows for natural incorporation of spatial-temporal dynamics, heterogeneous sampling, and...
-
作者:Bellec, Pierre c.
作者单位:State University System of Florida; Florida State University
摘要:We consider observations (X, y) from single index models with unknown link function, Gaussian covariates and a regularized M-estimator beta constructed from convex loss function and regularizer. In the regime where sample size n and dimension p are both increasing such that p/n has a finite limit, the behavior of the empirical distribution of beta and the predicted values X beta has been previously characterized in a number of models: The empirical distributions are known to converge to proxim...
-
作者:Jiang, Kuanhao; Mukherjee, Rajarshi; Sen, Subhabrata; Sur, Pragya
作者单位:Harvard University; Harvard University; Harvard T.H. Chan School of Public Health
摘要:Estimation of the average treatment effect (ATE) is a central problem in causal inference. In recent times, inference for the ATE in the presence of high-dimensional covariates has been extensively studied. Among diverse approaches that have been proposed, augmented inverse propensity weighting (AIPW) with cross-fitting has emerged a popular choice in practice. In this work, we study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dim...
-
作者:Kunisky, Dmitriy
作者单位:Johns Hopkins University
摘要:We study when low coordinate degree functions (LCDF)-linear combinations of functions depending on small subsets of entries of a vector-can hypothesis test between high-dimensional probability measures. These functions are a generalization, proposed in Hopkins' 2018 thesis but seldom studied since, of low degree polynomials (LDP), a class widely used in recent literature as a proxy for all efficient algorithms for tasks in statistics and optimization. Instead of the orthogonal polynomial decom...
-
作者:Stoepker, Ivo v.; Castro, Rui m.; Arias-castro, Ery
作者单位:Eindhoven University of Technology; University of California System; University of California San Diego; University of California System; University of California San Diego
摘要:Detecting anomalies in large sets of observations is crucial in various applications, such as epidemiological studies, gene expression studies, and systems monitoring. We consider settings where the units of interest result in multiple independent observations from potentially distinct referentials. Scan statistics and related methods are commonly used in such settings, but rely on stringent modeling assumptions for proper calibration. We instead propose a rank-based variant of the higher crit...