-
作者:Hokayem, Charles; Bollinger, Christopher; Ziliak, James P.
作者单位:University of Kentucky; University of Kentucky
摘要:The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) serves as the data source for official income, poverty, and inequality statistics in the United States. There is a concern that the rise in nonresponse to earnings questions could deteriorate data quality and distort estimates of these important metrics. We use a dataset of internal ASEC records matched to Social Security Detailed Earnings Records (DER) to study the impact of earnings nonresponse on estimates of pov...
-
作者:Cui, Hengjian; Li, Runze; Zhong, Wei
作者单位:Capital Normal University; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Pennsylvania Commonwealth System of Higher Education (PCSHE); Pennsylvania State University; Pennsylvania State University - University Park; Xiamen University; Xiamen University
摘要:This work is concerned with marginal sure independence feature screening for ultrahigh dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use the conditional distribution function to construct a new index for feature screening. In this article, we propose a marginal feature screening procedure based on empirical conditional distribution function. We establish the sure screening and ranking consistency properties for the proposed...
-
作者:Lai, Randy C. S.; Hannig, Jan; Lee, Thomas C. M.
作者单位:University of California System; University of California Davis; University of North Carolina; University of North Carolina Chapel Hill
摘要:In recent years, the ultrahigh-dimensional linear regression problem has attracted enormous attention from the research community. Under the sparsity assumption, most of the published work is devoted to the selection and estimation of the predictor variables with nonzero coefficients. This article studies a different but fundamentally important aspect of this problem: uncertainty quantification for parameter estimates and model choices. To be more specific, this article proposes methods for de...
-
作者:Linero, Antonio R.; Daniels, Michael J.
作者单位:State University System of Florida; University of Florida; University of Texas System; University of Texas Austin
摘要:We develop a Bayesian nonparametric model for a longitudinal response in the presence of nonignorable missing data. Our general approach is to first specify a working model that flexibly models the missingness and full outcome processes jointly. We specify a Dirichlet process mixture of missing at random (MAR) models as a prior on the joint distribution of the working model. This aspect of the model governs the fit of the observed data by modeling the observed data distribution as the marginal...
-
作者:Rosenbaum, Paul R.
作者单位:University of Pennsylvania
摘要:An observational study draws inferences about treatment effects when treatments are not randomly assigned, as they would be in a randomized experiment. The naive analysis of an observational study assumes that adjustments for measured covariates suffice to remove bias from nonrandom treatment assignment. A sensitivity analysis in an observational study determines the magnitude of bias from nonrandom treatment assignment that would need to be present to alter the qualitative conclusions of the ...
-
作者:Jiang, Wenxin; Zhao, Yu
作者单位:Shandong University; Northwestern University; Amazon.com
摘要:A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. In this article, we study how to construct confidence intervals of the LIFT measures. We point out the subtlety of this task and explain how simple binomial confidence intervals can have incorrect coverage probabilities, due to omitting variation from the sample percenti...
-
作者:Liang, Faming; Song, Qifan; Qiu, Peihua
作者单位:State University System of Florida; University of Florida; Purdue University System; Purdue University
摘要:Gaussian graphical models (GGMs) are frequently used to explore networks, such as gene regulatory networks, among a set of variables. Under the classical theory of GGMs, the construction of Gaussian graphical networks amounts to finding the pairs of variables with nonzero partial correlation coefficients. However, this is infeasible for high-dimensional problems for which the number of variables is larger than the sample size. In this article, we propose a new measure of partial correlation co...
-
作者:Villa, C.; Walker, S. G.
作者单位:University of Kent; University of Texas System; University of Texas Austin; University of Texas System; University of Texas Austin
摘要:We present a novel approach to constructing objective prior distributions for discrete parameter spaces. These types of parameter spaces are particularly problematic, as it appears that common objective procedures to design prior distributions are problem specific. We propose an objective criterion, based on loss functions, instead of trying to define objective probabilities directly. We systematically apply this criterion to a series of discrete scenarios, previously considered in the literat...
-
作者:Chen, Yunxiao; Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang
作者单位:Columbia University; University of Minnesota System; University of Minnesota Twin Cities
摘要:Diagnostic classification models (DMCs) have recently gained prominence in educational assessment, psychiatric evaluation, and many other disciplines. Central to the model specification is the so-called Q-matrix that provides a qualitative specification of the item-attribute relationship. In this article, we develop theories on the identifiability for the Q-matrix under the DINA and the DINO models. We further propose an estimation procedure for the Q-matrix through the regularized maximum lik...
-
作者:Vallejos, Catalina A.; Steel, Mark F. J.
作者单位:University of Warwick
摘要:Survival models such as the Weibull or log-normal lead to inference that is not robust to the presence of outliers. They also assume that all heterogeneity between individuals can be modeled through covariates. This article considers the use of infinite mixtures of lifetime distributions as a solution for these two issues. This can be interpreted as the introduction of a random effect in the survival distribution. We introduce the family of shape mixtures of log-normal distributions, which cov...