-
作者:Berk, Richard; Brown, Lawrence; Buja, Andreas; Zhang, Kai; Zhao, Linda
作者单位:University of Pennsylvania
摘要:It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid post-selection inference by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention interv...
-
作者:Bacallado, Sergio; Favaro, Stefano; Trippa, Lorenzo
作者单位:Stanford University; University of Turin; Collegio Carlo Alberto; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University; Harvard University Medical Affiliates; Dana-Farber Cancer Institute
摘要:We introduce a three-parameter random walk with reinforcement, called the (theta, alpha, beta) scheme, which generalizes the linearly edge reinforced random walk to uncountable spaces. The parameter beta smoothly tunes the (theta, alpha, beta) scheme between this edge reinforced random walk and the classical exchangeable two-parameter Hoppe urn scheme, while the parameters a and theta modulate how many states are typically visited. Resorting to de Finetti's theorem for Markov chains, we use th...
-
作者:Chung, EunYi; Romano, Joseph P.
作者单位:Stanford University; Stanford University
摘要:Given independent samples from P and Q, two-sample permutation tests allow one to construct exact level tests when the null hypothesis is P = Q. On the other hand, when comparing or testing particular parameters theta of P and Q, such as their means or medians, permutation tests need not be level a, or even approximately level alpha in large samples. Under very weak assumptions for comparing estimators, we provide a general test procedure whereby the asymptotic validity of the permutation test...
-
作者:Koltchinskii, Vladimir; Rangel, Pedro
作者单位:University System of Georgia; Georgia Institute of Technology
摘要:Let (V, A) be a weighted graph with a finite vertex set V, with a symmetric matrix of nonnegative weights A and with Laplacian Delta. Let S-* : V x V bar right arrow R be a symmetric kernel defined on the vertex set V. Consider n i.i.d. observations (X-j, X'(j), Y-j), j = 1, ..., n, where X-j, X'(j) are independent random vertices sampled from the uniform distribution in V and Y-j is an element of R is a real valued response variable such that E(Y-j vertical bar X-j, X'(j)) = S-*(X-j,X'(j)), j...
-
作者:Ma, Zongming
作者单位:University of Pennsylvania
摘要:Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features p is comparable to, or even much larger than, the sample size n. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, w...
-
作者:Bigot, Jeremie; Gendre, Xavier
作者单位:Universite de Toulouse; Institut Superieur de l'Aeronautique et de l'Espace (ISAE-SUPAERO); Universite de Toulouse; Universite Toulouse III - Paul Sabatier
摘要:We study the problem of estimating a mean pattern from a set of similar curves in the setting where the variability in the data is due to random geometric deformations and additive noise. We propose an estimator based on the notion of Frechet mean that is a generalization of the standard notion of averaging to non-Euclidean spaces. We derive a minimax rate for this estimation problem, and we show that our estimator achieves this optimal rate under the asymptotics where both the number of curve...
-
作者:Cai, T. Tony; Low, Mark G.; Xia, Yin
作者单位:University of Pennsylvania
摘要:Adaptive confidence intervals for regression functions are constructed under shape constraints of monotonicity and convexity. A natural benchmark is established for the minimum expected length of confidence intervals at a given function in terms of an analytic quantity, the local modulus of continuity. This bound depends not only on the function but also the assumed function class. These benchmarks show that the constructed confidence intervals have near minimum expected length for each indivi...
-
作者:Shalizi, Cosma Rohilla; Rinaldo, Alessandro
作者单位:Carnegie Mellon University
摘要:The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stocha...
-
作者:Van de Geer, Sara; Buehlmann, Peter
作者单位:Swiss Federal Institutes of Technology Domain; ETH Zurich
摘要:We consider the problem of regularized maximum likelihood estimation for the structure and parameters of a high-dimensional, sparse directed acyclic graphical (DAG) model with Gaussian distribution, or equivalently, of a Gaussian structural equation model. We show that the to-penalized maximum likelihood estimator of a DAG has about the same number of edges as the minimal-edge I-MAP (a DAG with minimal number of edges representing the distribution), and that it converges in Frobenius norm. We ...
-
作者:Xie, Yao; Siegmund, David
作者单位:Duke University; Stanford University
摘要:We develop a mixture procedure to monitor parallel streams of data for a change-point that affects only a subset of them, without assuming a spatial structure relating the data streams to one another. Observations are assumed initially to be independent standard normal random variables After a change-point the observations in a subset of the streams of data have nonzero mean values. The subset and the post-change means are unknown. The procedure we study uses stream specific generalized likeli...