-
作者:Hall, Peter; Titterington, D. M.; Xue, Jing-Hao
作者单位:University of London; University College London; University of Melbourne; University of Glasgow
摘要:Many contemporary classifiers are constructed to provide good performance for very high dimensional data. However, an issue that is at least as important as good classification is determining which of the many potential variables provide key information for good decisions. Responding to this issue can help us to determine which aspects of the datagenerating mechanism (e.g. which genes in a genomic study) are of greatest importance in terms of distinguishing between populations. We introduce ti...
-
作者:Drost, Feike C.; van den Akker, Ramon; Werker, Bas J. M.
作者单位:Tilburg University
摘要:Integer-valued auto-regressive (INAR) processes have been introduced to model non-negative integer-valued phenomena that evolve over time. The distribution of an INAR(p) process is essentially described by two parameters: a vector of auto-regression coefficients and a probability distribution on the non-negative integers, called an immigration or innovation distribution. Traditionally, parametric models are considered where the innovation distribution is assumed to belong to a parametric famil...
-
作者:Garcia-Escudero, L. A.; Gordaliza, A.; San Martin, R.; Van Aelst, S.; Zamar, R.
作者单位:Universidad de Valladolid; Ghent University; University of British Columbia
摘要:Non-hierarchical clustering methods are frequently based on the idea of forming groups around 'objects'. The main exponent of this class of methods is the k-means method, where these objects are points. However, clusters in a data set may often be due to certain relationships between the measured variables. For instance, we can find linear structures such as straight lines and planes, around which the observations are grouped in a natural way. These structures are not well represented by point...
-
作者:Guindani, Michele; Mueller, Peter; Zhang, Song
作者单位:University of New Mexico; University of Texas System; UTMD Anderson Cancer Center; University of Texas System; University of Texas Southwestern Medical Center
摘要:We discuss a Bayesian discovery procedure for multiple-comparison problems. We show that, under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule that is based on a threshold of the posterior probability of the alternative. Under a semiparametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure, which was recently introduced by Storey. Improving the approximat...
-
作者:Hall, Peter; Maiti, Tapabrata
作者单位:Michigan State University; University of Melbourne
摘要:We develop a general non-parametric approach to the analysis of clustered data via random effects. Assuming only that the link function is known, the regression functions and the distributions of both cluster means and observation errors are treated non-parametrically. Our argument proceeds by viewing the observation error at the cluster mean level as though it were a measurement error in an errors-in-variables problem, and using a deconvolution argument to access the distribution of the clust...
-
作者:Commenges, Daniel; Gegout-Petit, Anne
作者单位:Institut National de la Sante et de la Recherche Medicale (Inserm); Universite de Bordeaux; Institut National de la Sante et de la Recherche Medicale (Inserm); Universite de Bordeaux; Centre National de la Recherche Scientifique (CNRS); Inria; Universite de Bordeaux
摘要:We develop a general dynamical model as a framework for causal interpretation. We first state a criterion of local independence in terms of measurability of processes that are involved in the Doob-Meyer decomposition of stochastic processes; then we define direct and indirect influence. We propose a definition of causal influence using the concepts of a 'physical system'. This framework makes it possible to link descriptive and explicative statistical models, and encompasses quantitative proce...
-
作者:Wiens, Douglas P.
作者单位:University of Alberta
摘要:We study the construction of experimental designs, the purpose of which is to aid in the discrimination between two possibly non-linear regression models, each of which might be only approximately specified. A rough description of our approach is that we impose neighbourhood structures on each regression response and determine the members of these neighbourhoods which are least favourable in the sense of minimizing the Kullback-Leibler divergence. Designs are obtained which maximize this minim...
-
作者:Ramos, Alexandra; Ledford, Anthony
作者单位:Universidade do Porto; University of Oxford
摘要:A fundamental issue in applied multivariate extreme value analysis is modelling dependence within joint tail regions. The primary focus of this work is to extend the classical pseudopolar treatment of multivariate extremes to develop an asymptotically motivated representation of extremal dependence that also encompasses asymptotic independence. Starting with the usual mild bivariate regular variation assumptions that underpin the coefficient of tail dependence as a measure of extremal dependen...
-
作者:Lin, Fengchang; Fine, Jason P.
作者单位:University of Wisconsin System; University of Wisconsin Madison
摘要:We adapt martingale estimating equations based on gap time information to a general intensity model for a single realization of a modulated renewal process. The consistency and asymptotic normality of the estimators is proved under ergodicity conditions. Previous work has considered either parametric likelihood analysis or semiparametric multiplicative models using partial likelihood. The framework is generally applicable to semiparametric and parametric models, including additive and multipli...
-
作者:Wang, Weiwei; Scharfstein, Daniel; Tan, Zhiqiang; MacKenzie, Ellen J.
作者单位:Princeton University; Johns Hopkins University; Johns Hopkins Bloomberg School of Public Health; Rutgers University System; Rutgers University New Brunswick
摘要:We consider estimation of the causal effect of a treatment on an outcome from observational data collected in two phases. In the first phase, a simple random sample of individuals is drawn from a population. On these individuals, information is obtained on treatment, outcome and a few low dimensional covariates. These individuals are then stratified according to these factors. In the second phase, a random subsample of individuals is drawn from each stratum, with known stratum-specific selecti...