-
作者:Liu, Dandan; Cai, Tianxi; Lok, Anna; Zheng, Yingye
作者单位:Vanderbilt University; Harvard University; Harvard T.H. Chan School of Public Health; University of Michigan System; University of Michigan; Fred Hutchinson Cancer Center
摘要:Large prospective cohort studies of rare chronic diseases require thoughtful planning of study designs, especially for biomarker studies when measurements are based on stored tissue or blood specimens. Two-phase designs, including nested case-control and case-cohort sampling designs, provide cost-effective strategies for conducting biomarker evaluation studies.Existing literature for biomarker assessment under two-phase designs largely focuses on simple inverse probability weighting (IPW) esti...
-
作者:Wang, HaiYing; Zhu, Rong; Ma, Ping
作者单位:University System Of New Hampshire; University of New Hampshire; University of Connecticut; Chinese Academy of Sciences; Academy of Mathematics & System Sciences, CAS; University System of Georgia; University of Georgia
摘要:For massive data, the family of subsampling algorithms is popular to downsize the data volume and reduce computational burden. Existing studies focus on approximating the ordinary least-square estimate in linear regression, where statistical leverage scores are often used to define subsampling probabilities. In this article, we propose fast subsampling algorithms to efficiently approximate the maximum likelihood estimate in logistic regression. We first establish consistency and asymptotic nor...
-
作者:Linero, Antonio R.
作者单位:State University System of Florida; Florida State University
摘要:Decision tree ensembles are an extremely popular tool for obtaining high-quality predictions in nonparametric regression problems. Unmodified, however, many commonly used decision tree ensemble methods do not adapt to sparsity in the regime in which the number of predictors is larger than the number of observations. A recent stream of research concerns the construction of decision tree ensembles that are motivated by a generative probabilistic model, the most influential method being the Bayes...
-
作者:Crawford, Forrest W.; Wu, Jiacheng; Heimer, Robert
作者单位:Yale University; Yale University; Yale University; Yale University
摘要:Estimating the size of stigmatized, hidden, or hard-to-reach populations is a major problem in epidemiology, demography, and public health research. Capture-recapture and multiplier methods are standard tools for inference of hidden population sizes, but they require random sampling of target population members, which is rarely possible. Respondent-driven sampling (RDS) is a survey method for hidden populations that relies on social link tracing. The RDS recruitment process is designed to spre...
-
作者:Hao, Ning; Feng, Yang; Zhang, Hao Helen
作者单位:University of Arizona; Columbia University
摘要:Quadratic regression (QR) models naturally extend linear models by considering interaction effects between the covariates. To conduct model selection in QR, it is important to maintain the hierarchical model structure between main effects and interaction effects. Existing regularization methods generally achieve this goal by solving complex optimization problems, which usually demands high computational cost and hence are not feasible for high-dimensional data. This article focuses on scalable...
-
作者:He, Kejun; Lian, Heng; Ma, Shujie; Huang, Jianhua Z.
作者单位:Renmin University of China; City University of Hong Kong; University of California System; University of California Riverside; Texas A&M University System; Texas A&M University College Station
摘要:Motivated by the study of gene and environment interactions, we consider a multivariate response varying-coefficient model with a large number of covariates. The need of nonparametrically estimating a large number of coefficient functions given relatively limited data poses a big challenge for fitting such a model. To overcome the challenge, we develop a method that incorporates three ideas: (i) reduce the number of unknown functions to be estimated by using (noncentered) principal components;...
-
作者:Dette, Holger; Moellenhoff, Kathrin; Volgushev, Stanislav; Bretz, Frank
作者单位:Ruhr University Bochum; University of Toronto; University of Toronto; University Toronto Mississauga; Novartis; Shanghai University of Finance & Economics
摘要:This article investigates the problem whether the difference between two parametric models m(1), m(2) describing the relation between a response variable and several covariates in two different groups is practically irrelevant, such that inference can be performed on the basis of the pooled sample. Statistical methodology is developed to test the hypotheses H-0: d(m(1), m(2)) >= epsilon versus H-1: d(m(1), m(2)) <= epsilon to demonstrate equivalence between the two regression curves m(1), m(2)...
-
作者:Shirani-Mehr, Houshmand; Rothschild, David; Goel, Sharad; Gelman, Andrew
作者单位:Stanford University; Microsoft; Columbia University; Columbia University
摘要:It is well known among researchers and practitioners that election polls suffer from a variety of sampling and nonsampling errors, often collectively referred to as total survey error. Reported margins of error typically only capture sampling variability, and in particular, generally ignore nonsampling errors in defining the target population (e.g., errors due to uncertainty in who will vote). Here, we empirically analyze 4221 polls for 608 state-level presidential, senatorial, and gubernatori...
-
作者:Xu, Yuhang; Li, Yehua; Nettleton, Dan
作者单位:University of Nebraska System; University of Nebraska Lincoln; University of California System; University of California Riverside; Iowa State University
摘要:In a plant science Root Image Study, the process of seedling roots bending in response to gravity is recorded using digital cameras, and the bending rates are modeled as functional plant phenotype data. The functional phenotypes are collected from seeds representing a large variety of genotypes and have a three-level nested hierarchical structure, with seeds nested in groups nested in genotypes. The seeds are imaged on different days of the lunar cycle, and an important scientific question is ...
-
作者:Luedtke, Alexander R.; van der Laan, Mark J.
作者单位:University of California System; University of California Berkeley
摘要:Suppose one has a collection of parameters indexed by a (possibly infinite dimensional) set. Given data generated from some distribution, the objective is to estimate the maximal parameter in this collection evaluated at the distribution that generated the data. This estimation problem is typically nonregular when the maximizing parameter is nonunique, and as a result standard asymptotic techniques generally fail in this case. We present a technique for developing parametric-rate confidence in...