-
作者:George, EI
作者单位:University of Texas System; University of Texas Austin
摘要:The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments that have led to the wide variety of approaches for this problem.
-
作者:Genovese, CR
-
作者:Gustafson, P
作者单位:University of British Columbia
摘要:There have been many recent suggestions as to how to build and estimate flexible Bayesian regression models, using constructs such as trees, neural networks, and Gaussian processes. Although there is much to commend these methods, their implementation and interpretation can be daunting for practitioners. This article presents a spline-based methodology for flexible Bayesian regression that is quite simple in terms of computation and interpretation. Smooth bivariate interactions are modeled in ...
-
作者:Duncan, GT; Mukherjee, S
作者单位:Carnegie Mellon University; Carnegie Mellon University; Nova Southeastern University
摘要:Disclosure limitation methods transform statistical databases to protect confidentiality, a practical concern of statistical agencies. A statistical database responds to queries with aggregate statistics. The database administrator should maximize legitimate data access while keeping the risk of disclosure below an acceptable level. Legitimate users seek statistical information, generally in aggregate form; malicious users-the data snoopers-attempt to infer confidential information about an in...
-
作者:Brown, CH; Indurkhya, A; Kellam, SG
作者单位:State University System of Florida; University of South Florida; Michigan State University; Maryland Department of Health & Mental Hygiene; Johns Hopkins University
摘要:Longitudinal designs often change at critical times based on available funding, staffing, scientific opportunities, and subjects. This article presents three levels of investigation into missingness by design in a partially completed longitudinal study: missingness that is completely at random (MCAR), at random (MAR), and nonignorable (MN). We first derive new expressions for the asymptotic variance and power based on multivariate normal data that are either MCAR or missing by design (MAR). Th...
-
作者:Dawid, AP
作者单位:University of London; University College London
摘要:A popular approach to the framing and answering of causal questions relies on the idea of counterfactuals: outcomes that would have been observed had the world developed differently; for example, if the patient had received a different treatment. By definition one can never observe such quantities, nor assess empirically the validity of any modeling assumptions made about them, even though one's conclusions may be sensitive to these assumptions. Here I argue that for making inference about the...
-
作者:Robins, JM; van der Vaart, A; Ventura, V
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; Vrije Universiteit Amsterdam
摘要:We investigate the compatibility of a null model H-0 with the data by calculating a p value; that is, the probability, under H-0, that a given rest statistic T exceeds its observed value. When the null model consists of a single distribution, the p value is readily obtained, and it has a uniform distribution under H-0. On the other hand, when the null model depends on an unknown nuisance parameter theta, one must somehow Set rid of theta, (e.g., by estimating it) to calculate a;o value. Variou...
-
作者:Murphy, SA; Van der Vaart, AW
作者单位:University of Michigan System; University of Michigan; Vrije Universiteit Amsterdam
摘要:We show that semiparametric profile likelihoods, where the nuisance parameter has been profiled out, behave like ordinary likelihoods in that they have a quadratic expansion. In this expansion the score function and the Fisher information are replaced by-the efficient score function and efficient Fisher information. The expansion may be used, among others, to prove the asymptotic normality of the maximum likelihood estimator, to derive the asymptotic chi-squared distribution of the log-likelih...
-
作者:Xie, Y
作者单位:University of Michigan System; University of Michigan
-
作者:Kvam, PH; Tiwari, RC; Zalkikar, JN
作者单位:University System of Georgia; Georgia Institute of Technology; University of North Carolina; University of North Carolina Charlotte; State University System of Florida; Florida International University
摘要:Data on contamination concentrations for chromium from one of the EPA's toxic waste sites consist of independent and identically distributed (iid) measurements along with additional observations from the residual distribution. The residual sample is obtained by sampling from hot spots, In where contamination concentrations are assumed to be above a given threshold value. The data are modeled using a nonparametric Bayes estimator of the distribution function. The Dirichlet process is used to fo...