-
作者:Song, Shanshan; Lin, Yuanyuan; Zhou, Yong
作者单位:Chinese University of Hong Kong; East China Normal University; East China Normal University
摘要:We study a class of general M-estimators in the semi-supervised setting, wherein the data are typically a combination of a relatively small labeled dataset and large amounts of unlabeled data. A new estimator, which efficiently uses the useful information contained in the unlabeled data, is proposed via a projection technique. We prove consistency and asymptotic normality, and provide an inference procedure based on K -fold cross-validation. The optimal weights are derived to balance the contr...
-
作者:Han, Rungang; Shi, Pixu; Zhang, Anru R.
作者单位:Duke University; Duke University; Duke University; Duke University
摘要:This article introduces the functional tensor singular value decomposition (FTSVD), a novel dimension reduction framework for tensors with one functional mode and several tabular modes. The problem is motivated by high-order longitudinal data analysis. Our model assumes the observed data to be a random realization of an approximate CP low-rank functional tensor measured on a discrete time grid. Incorporating tensor algebra and the theory of reproducing kernel Hilbert space (RKHS), we propose a...
-
作者:He, Yi
作者单位:University of Amsterdam
摘要:This article establishes a comprehensive theory of the optimality, robustness, and cross-validation selection consistency for the ridge regression under factor-augmented models with possibly dense idiosyncratic information. Using spectral analysis for random matrices, we show that the ridge regression is asymptotically efficient in capturing both factor and idiosyncratic information by minimizing the limiting predictive loss among the entire class of spectral regularized estimators under large...
-
作者:Westling, Ted; Luedtke, Alex; Gilbert, Peter B.; Carone, Marco
作者单位:University of Massachusetts System; University of Massachusetts Amherst; University of Washington; University of Washington Seattle; Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle
摘要:In the absence of data from a randomized trial, researchers may aim to use observational data to draw causal inference about the effect of a treatment on a time-to-event outcome. In this context, interest often focuses on the treatment-specific survival curves, that is, the survival curves were the population under study to be assigned to receive the treatment or not. Under certain conditions, including that all confounders of the treatment-outcome relationship are observed, the treatment-spec...
-
作者:Katzfuss, Matthias; Schafer, Florian
作者单位:Texas A&M University System; Texas A&M University College Station; University System of Georgia; Georgia Institute of Technology
摘要:A multivariate distribution can be described by a triangular transport map from the target distribution to a simple reference distribution. We propose Bayesian nonparametric inference on the transport map by modeling its components using Gaussian processes. This enables regularization and uncertainty quantification of the map estimation, while resulting in a closed-form and invertible posterior map. We then focus on inferring the distribution of a nonstationary spatial field from a small numbe...
-
作者:Mao, Huiying; Martin, Ryan; Reich, Brian J. J.
作者单位:North Carolina State University
摘要:Predicting the response at an unobserved location is a fundamental problem in spatial statistics. Given the difficulty in modeling spatial dependence, especially in nonstationary cases, model-based prediction intervals are at risk of misspecification bias that can negatively affect their validity. Here we present a new approach for model-free nonparametric spatial prediction based on the conformal prediction machinery. Our key observation is that spatial data can be treated as exactly or appro...
-
作者:Almendra-Hernandez, Felix; De Loera, Jesus A.; Petrovic, Sonja
作者单位:University of California System; University of California Davis; Illinois Institute of Technology
摘要:In this article, we evaluate the challenges and best practices associated with the Markov bases approach to sampling from conditional distributions. We provide insights and clarifications after 25 years of the publication of the Fundamental theorem for Markov bases by Diaconis and Sturmfels. In addition to a literature review, we prove three new results on the complexity of Markov bases in hierarchical models, relaxations of the fibers in log-linear models, and limitations of partial sets of m...
-
作者:Fan, Jianqing; Lou, Zhipeng; Yu, Mengxin
作者单位:Fudan University; Princeton University; Princeton University
摘要:We propose the Factor Augmented (sparse linear) Regression Model (FARM) that not only admits both the latent factor regression and sparse linear regression as special cases but also bridges dimension reduction and sparse regression together. We provide theoretical guarantees for the estimation of our model under the existence of sub-Gaussian and heavy-tailed noises (with bounded (1+theta) th moment, for all theta > 0), respectively. In addition, the existing works on supervised learning often ...
-
作者:Das, Manjari; Kennedy, Edward H.; Jewell, Nicholas P.
作者单位:Carnegie Mellon University; University of London; London School of Hygiene & Tropical Medicine; University of California System; University of California Berkeley; Carnegie Mellon University
摘要:Estimation of population size using incomplete lists has a long history across many biological and social sciences. For example, human rights groups often construct partial lists of victims of armed conflicts, to estimate the total number of victims. Earlier statistical methods for this setup often use parametric assumptions, or rely on suboptimal plug-in-type nonparametric estimators; but both approaches can lead to substantial bias, the former via model misspecification and the latter via sm...
-
作者:Wei, Zeyu; Chen, Yen-Chi
作者单位:University of Washington; University of Washington Seattle
摘要:We introduce a density-aided clustering method called Skeleton Clustering that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations. The clustering framework constructs a concise representation of the given data as an intermediate step and can be thought of as a combination of prototype methods, d...