-
作者:Schweinberger, Michael; Fritz, Cornelius
作者单位:Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park
-
作者:Bertanha, Marinho; Chung, Eunyi
作者单位:University of Notre Dame; University of Illinois System; University of Illinois Urbana-Champaign
摘要:Classical two-sample permutation tests for equality of distributions have exact size in finite samples, but they fail to control size for testing equality of parameters that summarize each distribution. This article proposes permutation tests for equality of parameters that are estimated at root-n or slower rates. Our general framework applies to both parametric and nonparametric models, with two samples or one sample split into two subsamples. Our tests have correct size asymptotically while ...
-
作者:Fan, Jianqing; Yang, Zhuoran; Yu, Mengxin
作者单位:Princeton University; Yale University
摘要:In this article, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of...
-
作者:Krivitsky, Pavel N.; Coletti, Pietro; Hens, Niel
作者单位:University of New South Wales Sydney; University of New South Wales Sydney; Hasselt University; University of Antwerp; University of New South Wales Sydney
摘要:The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We...
-
作者:Wang, Shulei
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Self-supervised metric learning has been a successful approach for learning a distance from an unlabeled dataset. The resulting distance is broadly useful for improving various distance-based downstream tasks, even when no information from downstream tasks is used in the metric learning stage. To gain insights into this approach, we develop a statistical framework to theoretically study how self-supervised metric learning can benefit downstream tasks in the context of multi-view data. Under th...
-
作者:Tian, Ye; Feng, Yang
作者单位:Columbia University; New York University
摘要:In this work, we study the transfer learning problem under high-dimensional generalized linear models (GLMs), which aim to improve the fit on target data by borrowing information from useful source data. Given which sources to transfer, we propose a transfer learning algorithm on GLM, and derive its l(1)/l(2)-estimation error bounds as well as a bound for a prediction error measure. The theoretical analysis shows that when the target and sources are sufficiently close to each other, these boun...
-
作者:Kramlinger, Peter; Krivobokova, Tatyana; Sperlich, Stefan
作者单位:University of Vienna; University of Geneva
摘要:In spite of its high practical relevance, cluster specific multiple inference for linear mixed model predictors has hardly been addressed so far. While marginal inference for population parameters is well understood, conditional inference for the cluster specific predictors is more intricate. This work introduces a general framework for multiple inference in linear mixed models for duster specific predictors. Consistent confidence sets for multiple inference are constructed under both, the mar...
-
作者:Chandra, Noirrit Kiran; Sarkar, Abhra; de Groot, John F.; Yuan, Ying; Mueller, Peter
作者单位:University of Texas System; University of Texas Dallas; University of Texas System; University of Texas Austin; University of California System; University of California San Francisco; University of Texas System; UTMD Anderson Cancer Center; University of Texas System; University of Texas Austin
摘要:The availability of electronic health records (EHR) has opened opportunities to supplement increasingly expensive and difficult to carry out randomized controlled trials (RCT) with evidence from readily available real world data. In this paper, we use EHR data to construct synthetic control arms for treatment-only single arm trials. We propose a novel nonparametric Bayesian common atoms mixture model that allows us to find equivalent population strata in the EHR and the treatment arm and then ...
-
作者:Henderson, Nicholas C.; Varadhan, Ravi; Louis, Thomas A.
作者单位:University of Michigan System; University of Michigan; Johns Hopkins University; Johns Hopkins Medicine; Johns Hopkins University; Johns Hopkins Bloomberg School of Public Health
摘要:Shrinkage estimates of small domain parameters typically use a combination of a noisy direct estimate that only uses data from a specific small domain and a more stable regression estimate. When the regression model is misspecified, estimation performance for the noisier domains can suffer due to substantial shrinkage toward a poorly estimated regression surface. In this article, we introduce a new class of robust, empirically-driven regression weights that target estimation of the small domai...
-
作者:Stensrud, Mats J.; Robins, James M.; Sarveta, Aaron; Tchetgen, Eric J. Tchetgen; Young, Jessica G.
作者单位:Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard T.H. Chan School of Public Health; University of Pennsylvania; Harvard University; Harvard Medical School; Harvard Pilgrim Health Care
摘要:Researchers are often interested in treatment effects on outcomes that are only defined conditional on posttreatment events. For example, in a study of the effect of different cancer treatments on quality of life at end of follow-up, the quality of life of individuals who die during the study is undefined. In these settings, naive contrasts of outcomes conditional on posttreatment events are not average causal effects, even in randomized experiments. Therefore, the effect in the principal stra...