-
作者:Da Silva, Damiao Nobrega; Skinner, Chris; Kim, Jae Kwang
作者单位:Universidade Federal do Rio Grande do Norte
摘要:Paradata refers here to data at unit level on an observed auxiliary variable, not usually of direct scientific interest, which may be informative about the quality of the survey data for the unit. There is increasing interest among survey researchers in how to use such data. Its use to reduce bias from nonresponse has received more attention so far than its use to correct for measurement error. This article considers the latter with a focus on binary paradata indicating the presence of measure...
-
作者:Agostinelli, Claudio; Yohai, Victor J.
作者单位:University of Trento; Universita Ca Foscari Venezia; University of Buenos Aires
摘要:The classical Tukey-Huber contamination model (CCM) is a commonly adopted framework to describe the mechanism of outliers generation in robust statistics. Given a dataset with n observations and p variables, under the CCM, an outlier is a unit, even if only one or a few values are corrupted. Classical robust procedures were designed to cope with this type of outliers. Recently, anew mechanism of outlier generation was introduced, namely, the independent contamination model (ICM), where the occ...
-
作者:Narisetty, Naveen N.; Nair, Vijayan N.
作者单位:University of Michigan System; University of Michigan
摘要:We propose a new notion called extremal depth (ED) for functional data, discuss its properties, and compare its performance with existing concepts. The proposed notion is based on a measure of extreme outlyingness!' ED has several desirable properties that are not shared by other notions and is especially well suited for obtaining central regions of functional data and function spaces. In particular: (a) the central region achieves the nominal (desired) simultaneous coverage probability; (b) t...
-
作者:Barut, Emre; Wang, Huixia Judy
作者单位:George Washington University
-
作者:Jobe, J. Marcus; Pokojovy, Michael
作者单位:University System of Ohio; Miami University; University of Konstanz
摘要:Detection power of the squared Mahalanobis distance statistic is significantly reduced when several outliers exist within a multivariate dataset of interest. To overcome this masking effect, we propose a computer-intensive cluster-based approach that incorporates a reweighted version of Rousseeuw's minimum covariance determinant method with a multi-step cluster-based algorithm that initially filters out potential masking points. Compared to the most robust procedures, simulation studies show t...
-
作者:Martin, Ryan; Liu, Chuanhai
作者单位:University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital; Purdue University System; Purdue University
摘要:The inferential models (IM) framework provides prior-free, frequency-calibrated, and posterior probabilistic inference. The key is the use of random sets to predict unobservable auxiliary variables connected to the observable data and unknown parameters. When nuisance parameters are present, a marginalization step can reduce the dimension of the auxiliary variable which, in turn, leads to more efficient inference. For regular problems, exact marginalization can be achieved, and we give conditi...
-
作者:Boente, Graciela; Salibian-Barrera, Matias
作者单位:University of Buenos Aires; Consejo Nacional de Investigaciones Cientificas y Tecnicas (CONICET); University of British Columbia
摘要:Principal component analysis is a widely used technique that provides an optimal lower-dimensional approximation to multivariate or functional datasets. These approximations can be very useful in identifying potential outliers among high-dimensional or functional observations. In this article, we propose a new class of estimators for principal components based on robust scale estimators. For a fixed dimension q, we robustly estimate the q-dimensional linear space that provides the best predict...
-
作者:Kim, Hang J.; Cox, Lawrence H.; Karr, Alan F.; Reiter, Jerome P.; Wang, Quanli
作者单位:University System of Ohio; University of Cincinnati; Duke University; Research Triangle Institute; Duke University
摘要:Many statistical organizations collect data that are expected to satisfy linear constraints; as examples, component variables should sum to total variables, and ratios of pairs of variables should be bounded by expert-specified constants. When reported data violate constraints, organizations identify and replace values potentially in error in a process known as edit-imputation. To date, most approaches separate the error localization and imputation steps, typically using optimization methods t...
-
作者:Zubizarreta, Jose R.
作者单位:Columbia University; Columbia University
摘要:Weighting methods that adjust for observed covariates, such as inverse probability weighting, are widely used for causal inference and estimation with incomplete outcome data. Part of the appeal of such methods is that one set of weights can be used to estimate a range of treatment effects based on different outcomes, or a variety of population means for several variables. However, this appeal can be diminished in practice by the instability of the estimated weights and by the difficulty of ad...
-
作者:Chien, Li-Chu; Wu, Yuh-Jenn; Hsiung, Chao A.; Wang, Lu-Hai; Chang, I-Shou
作者单位:National Health Research Institutes - Taiwan; Chung Yuan Christian University; National Health Research Institutes - Taiwan; National Health Research Institutes - Taiwan; National Health Research Institutes - Taiwan
摘要:Cancer surveillance research often begins with a rate matrix, also called a Lexis diagram, of cancer incidence derived from cancer registry and census data. Lexis diagrams with 3- or 5-year intervals for age group and for calendar year of diagnosis are often considered. This simple smoothing approach suffers from a significant limitation; important details useful in studying time trends may be lost in the averaging process involved in generating a summary rate. This article constructs a smooth...