-
作者:George, EI
作者单位:University of Texas System; University of Texas Austin
摘要:The problem of variable selection is one of the most pervasive model selection problems in statistical applications. Often referred to as the problem of subset selection, it arises when one wants to model the relationship between a variable of interest and a subset of potential explanatory variables or predictors, but there is uncertainty about which subset to use. This vignette reviews some of the key developments that have led to the wide variety of approaches for this problem.
-
作者:Robins, JM; van der Vaart, A; Ventura, V
作者单位:Harvard University; Harvard T.H. Chan School of Public Health; Vrije Universiteit Amsterdam
摘要:We investigate the compatibility of a null model H-0 with the data by calculating a p value; that is, the probability, under H-0, that a given rest statistic T exceeds its observed value. When the null model consists of a single distribution, the p value is readily obtained, and it has a uniform distribution under H-0. On the other hand, when the null model depends on an unknown nuisance parameter theta, one must somehow Set rid of theta, (e.g., by estimating it) to calculate a;o value. Variou...
-
作者:Kvam, PH; Tiwari, RC; Zalkikar, JN
作者单位:University System of Georgia; Georgia Institute of Technology; University of North Carolina; University of North Carolina Charlotte; State University System of Florida; Florida International University
摘要:Data on contamination concentrations for chromium from one of the EPA's toxic waste sites consist of independent and identically distributed (iid) measurements along with additional observations from the residual distribution. The residual sample is obtained by sampling from hot spots, In where contamination concentrations are assumed to be above a given threshold value. The data are modeled using a nonparametric Bayes estimator of the distribution function. The Dirichlet process is used to fo...
-
作者:Sansó, B; Guenni, L
作者单位:Simon Bolivar University; Duke University
摘要:Estimation and prediction of the amount of rainfall in time and space is a problem of fundamental importance in many applications in agriculture, hydrology, and ecology. Stochastic simulation of rainfall data is also an important step in the development of stochastic downscaling: methods where large-scale climate information is considered as an additional explanatory variable of rainfall behavior at the local scale. Simulated rainfall has also been used as input data for many agricultural, hyd...
-
作者:Piccinato, L
作者单位:Sapienza University Rome
-
作者:Poole, D; Raftery, AE
作者单位:University of Washington; University of Washington Seattle
摘要:Deterministic simulation models are used in many areas of science, engineering, and policy making. Typically, these are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many user-specified inputs. The inputs are often specified by some form of trial-and-error approach in which plausible values are postulated the corresponding outputs inspected, and the inputs modified until plausible outputs are obtained. Here we address the issue of more forma...
-
作者:Cappé, O; Robert, CP
作者单位:Universite PSL; Universite Paris-Dauphine; Institut Polytechnique de Paris; ENSAE Paris
-
作者:Best, NG; Ickstadt, K; Wolpert, RL
作者单位:Imperial College London; Duke University
摘要:Ecological regression studies are widely used to examine relationships between disease rates for small geographical areas and exposure to environmental risk factors. The raw data for such studies, including disease cases, environmental pollution concentrations, and the reference population at risk, are typically measured at various levels of spatial aggregation but are accumulated to a common geographical scale to facilitate statistical analysis. In this traditional approach, heterogeneous exp...
-
作者:Bayarri, MJ; Berger, JO
作者单位:University of Valencia; Duke University
摘要:The problem of investigating compatibility of an assumed model with the data is investigated in the situation when the assumed model has unknown parameters. The most frequently used measures of compatibility are p values, based on statistics T for which large values are deemed to indicate incompatibility of the data and the model. When the null model has unknown parameters. ?, values are not uniquely defined. The proposals for computing a p value in such a situation include the plug-in and sim...
-
作者:Preisser, JS; Galecki, AT; Lohman, KK; Wagenknecht, LE
作者单位:University of North Carolina; University of North Carolina Chapel Hill; University of Michigan System; University of Michigan; Wake Forest University; Wake Forest Baptist Medical Center
摘要:The generalized estimating equations procedure (GEE) widely applied in the analysis of correlated binary data requires that missing data depend only on remote covariates or that they be missing completely at random (MCAR); otherwise GEE regression parameter estimates are biased. A weighted generalized estimating equations (WGEE) approach that accounts for dropouts under the less stringent assumption of missing at random (MAR) through dependence on observed responses gives unbiased estimation o...