-
作者:Boente, Graciela; Salibian-Barrera, Matias
作者单位:University of Buenos Aires; Consejo Nacional de Investigaciones Cientificas y Tecnicas (CONICET); University of British Columbia
摘要:Principal component analysis is a widely used technique that provides an optimal lower-dimensional approximation to multivariate or functional datasets. These approximations can be very useful in identifying potential outliers among high-dimensional or functional observations. In this article, we propose a new class of estimators for principal components based on robust scale estimators. For a fixed dimension q, we robustly estimate the q-dimensional linear space that provides the best predict...
-
作者:Kim, Hang J.; Cox, Lawrence H.; Karr, Alan F.; Reiter, Jerome P.; Wang, Quanli
作者单位:University System of Ohio; University of Cincinnati; Duke University; Research Triangle Institute; Duke University
摘要:Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods t...
-
作者:Zubizarreta, Jose R.
作者单位:Columbia University; Columbia University
摘要:Weighting methods that adjust for observed covariates, such as inverse probability weighting, are widely used for causal inference and estimation with incomplete outcome data. Part of the appeal of such methods is that one set of weights can be used to estimate a range of treatment effects based on different outcomes, or a variety of population means for several variables. However, this appeal can be diminished in practice by the instability of the estimated weights and by the difficulty of ad...
-
作者:Chien, Li-Chu; Wu, Yuh-Jenn; Hsiung, Chao A.; Wang, Lu-Hai; Chang, I-Shou
作者单位:National Health Research Institutes - Taiwan; Chung Yuan Christian University; National Health Research Institutes - Taiwan; National Health Research Institutes - Taiwan; National Health Research Institutes - Taiwan
摘要:Cancer surveillance research often begins with a rate matrix, also called a Lexis diagram, of cancer incidence derived from cancer registry and census data. Lexis diagrams with 3- or 5-year intervals for age group and for calendar year of diagnosis are often considered. This simple smoothing approach suffers from a significant limitation; important details useful in studying time trends may be lost in the averaging process involved in generating a summary rate. This article constructs a smooth...
-
作者:McElroy, Tucker; Monsell, Brian
摘要:An important practical problem for statistical agencies and central banks that publish economic data is the seasonal adjustment of mixed frequency stock and flow time series. This may arise in practice due to changes in funding of a particular survey. Mathematically, the problem can be reduced to the need to compute imputations, forecasts, and backcasts from a given model of the highest available frequency data. The nonstationarity of the economic time series coupled with the alteration of sam...
-
作者:Qiu, Yumou; Chen, Song Xi
作者单位:University of Nebraska System; University of Nebraska Lincoln; Peking University; Peking University; Iowa State University
摘要:The banding estimator of Bickel and Levina and its tapering version of Cai, Zhang, and Zhou are important high-dimensional covariance estimators. Both estimators require a bandwidth parameter. We propose a bandwidth selector for the banding estimator by minimizing an empirical estimate of the expected squared Frobenius norms of the estimation error matrix. The ratio consistency of the bandwidth selector is established. We provide a lower bound for the coverage probability of the underlying ban...
-
作者:Hokayem, Charles; Bollinger, Christopher; Ziliak, James P.
作者单位:University of Kentucky; University of Kentucky
摘要:The Current Population Survey Annual Social and Economic Supplement (CPS ASEC) serves as the data source for official income, poverty, and inequality statistics in the United States. There is a concern that the rise in nonresponse to earnings questions could deteriorate data quality and distort estimates of these important metrics. We use a dataset of internal ASEC records matched to Social Security Detailed Earnings Records (DER) to study the impact of earnings nonresponse on estimates of pov...
-
作者:Liang, Faming; Song, Qifan; Qiu, Peihua
作者单位:State University System of Florida; University of Florida; Purdue University System; Purdue University
摘要:Gaussian graphical models (GGMs) are frequently used to explore networks, such as gene regulatory networks, among a set of variables. Under the classical theory of GGMs, the construction of Gaussian graphical networks amounts to finding the pairs of variables with nonzero partial correlation coefficients. However, this is infeasible for high-dimensional problems for which the number of variables is larger than the sample size. In this article, we propose a new measure of partial correlation co...
-
作者:Villa, C.; Walker, S. G.
作者单位:University of Kent; University of Texas System; University of Texas Austin; University of Texas System; University of Texas Austin
摘要:We present a novel approach to constructing objective prior distributions for discrete parameter spaces. These types of parameter spaces are particularly problematic, as it appears that common objective procedures to design prior distributions are problem specific. We propose an objective criterion, based on loss functions, instead of trying to define objective probabilities directly. We systematically apply this criterion to a series of discrete scenarios, previously considered in the literat...
-
作者:Chang, Lo-Bin; Geman, Donald
作者单位:University System of Ohio; Ohio State University; Johns Hopkins University
摘要:In recent years, reproducibility has emerged as a key factor in evaluating x applications of statistics to the biomedical sciences, for example, learning predictors of disease phenotypes from high-throughput omics data. In particular, validation is undermined when error rates on newly acquired data are sharply higher than those originally reported. More precisely, when data are collected from m studies representing possibly different subphenotypes, more generally different mixtures of subpheno...