-
作者:Han, Ruijian; Xu, Yiming; Chen, Kani
作者单位:Chinese University of Hong Kong; Utah System of Higher Education; University of Utah; Hong Kong University of Science & Technology
摘要:Statistical estimation using pairwise comparison data is an effective approach to analyzing large-scale sparse networks. In this article, we propose a general framework to model the mutual interactions in a network, which enjoys ample flexibility in terms of model parameterization. Under this setup, we show that the maximum likelihood estimator for the latent score vector of the subjects is uniformly consistent under a near-minimal condition on network sparsity. This condition is sharp in term...
-
作者:Zeng, Yanyan; Pang, Daolin; Zhao, Hongyu; Wang, Tao
作者单位:Shanghai Jiao Tong University; Yale University; Shanghai Jiao Tong University; Shanghai Jiao Tong University
摘要:High throughput sequencing data collected to study the microbiome provide information in the form of relative abundances and should be treated as compositions. Although many approaches including scaling and rarefaction have been proposed for converting raw count data into microbial compositions, most of these methods simply return zero values for zero counts. However, zeros can distort downstream analyses, and they can also pose problems for composition-aware methods. This problem is exacerbat...
-
作者:Mai, Qing; He, Di; Zou, Hui
作者单位:State University System of Florida; Florida State University; Nanjing University; University of Minnesota System; University of Minnesota Twin Cities
摘要:In statistical analysis, researchers often perform coordinatewise Gaussianization such that each variable is marginally normal. The normal score transformation is a method for coordinatewise Gaussianization and is widely used in statistics, econometrics, genetics and other areas. However, few studies exist on the theoretical properties of the normal score transformation, especially in high-dimensional problems where the dimension p diverges with the sample size n. In this article, we show that...
-
作者:Molstad, Aaron J.; Rothman, Adam J.
作者单位:State University System of Florida; University of Florida; University of Minnesota System; University of Minnesota Twin Cities
摘要:We propose a penalized likelihood method to fit the bivariate categorical response regression model. Our method allows practitioners to estimate which predictors are irrelevant, which predictors only affect the marginal distributions of the bivariate response, and which predictors affect both the marginal distributions and log odds ratios. To compute our estimator, we propose an efficient algorithm which we extend to settings where some subjects have only one response variable measured, that i...
-
作者:Gang, Bowen; Sun, Wenguang; Wang, Weinan
作者单位:Fudan University; University of Southern California
摘要:Consider the online testing of a stream of hypotheses where a real-time decision must be made before the next data point arrives. The error rate is required to be controlled at all decision points. Conventional simultaneous testing rules are no longer applicable due to the more stringent error constraints and absence of future data. Moreover, the online decision-making process may come to a halt when the total error budget, or alpha-wealth, is exhausted. This work develops a new class of struc...
-
作者:He, Baihua; Ma, Shuangge; Zhang, Xinyu; Zhu, Li-Xing
作者单位:Chinese Academy of Sciences; University of Science & Technology of China, CAS; Yale University; Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; Beijing Normal University; Beijing Normal University Zhuhai; Hong Kong Baptist University
摘要:Model averaging is an effective way to enhance prediction accuracy. However, most previous works focus on low-dimensional settings with completely observed responses. To attain an accurate prediction for the risk effect of survival data with high-dimensional predictors, we propose a novel method: rank-based greedy (RG) model averaging. Specifically, adopting the transformation model with splitting predictors as working models, we doubly use the smooth concordance index function to derive the c...
-
作者:Dunn, Robin; Wasserman, Larry; Ramdas, Aaditya
作者单位:Novartis; Novartis USA; Carnegie Mellon University; Carnegie Mellon University
摘要:We consider the problem of constructing distribution-free prediction sets for data from two-layer hierarchical distributions. For iid data, prediction sets can be constructed using the method of conformal prediction. The validity of conformal prediction hinges on the exchangeability of the data, which does not hold when groups of observations come from distinct distributions, such as multiple observations on each patient in a medical database. We extend conformal methods to a hierarchical sett...
-
作者:Ye, Ting; Shao, Jun; Yi, Yanyao; Zhao, Qingyuan
作者单位:University of Washington; University of Washington Seattle; East China Normal University; University of Wisconsin System; University of Wisconsin Madison; Eli Lilly; University of Cambridge
摘要:In randomized clinical trials, adjustments for baseline covariates at both design and analysis stages are highly encouraged by regulatory agencies. A recent trend is to use a model-assisted approach for covariate adjustment to gain credibility and efficiency while producing asymptotically valid inference even when the model is incorrect. In this article we present three considerations for better practice when modelassisted inference is applied to adjust for covariates under simple or covariate...
-
作者:Ionides, Edward L.; Asfaw, Kidus; Park, Joonha; King, Aaron A.
作者单位:University of Michigan System; University of Michigan; University of Kansas; University of Michigan System; University of Michigan; University of Michigan System; University of Michigan
摘要:Bagging (i.e., bootstrap aggregating) involves combining an ensemble of bootstrap estimators. We consider bagging for inference from noisy or incomplete measurements on a collection of interacting stochastic dynamic systems. Each system is called a unit, and each unit is associated with a spatial location. A motivating example arises in epidemiology, where each unit is a city: the majority of transmission occurs within a city, with smaller yet epidemiologically important interactions arising f...
-
作者:Chen, Yaqing; Lin, Zhenhua; Muller, Hans-Georg
作者单位:University of California System; University of California Davis; National University of Singapore
摘要:The analysis of samples of random objects that do not lie in a vector space is gaining increasing attention in statistics. An important class of such object data is univariate probability measures defined on the real line. Adopting the Wasserstein metric, we develop a class of regression models for such data, where random distributions serve as predictors and the responses are either also distributions or scalars. To define this regression model, we use the geometry of tangent bundles of the s...