-
作者:Roverato, Alberto; Castelo, Robert
作者单位:University of Padua; Pompeu Fabra University
摘要:A graphical model provides a compact and efficient representation of the association structure in a multivariate distribution by means of a graph. Relevant features of the distribution are represented by vertices, edges and higher-order graphical structures such as cliques or paths. Typically, paths play a central role in these models because they determine the dependence relationships between variables. However, while a theory of path coefficients is available for directed graph models, littl...
-
作者:Green, A. K. B.; McCormick, T. H.; Raftery, A. E.
作者单位:University of Washington; University of Washington Seattle
摘要:Respondent-driven sampling is an approach for estimating features of populations that are difficult to access using standard survey tools, e.g., the fraction of injection drug users who are HIV positive. Baraff et al. (2016) introduced an approach to estimating uncertainty in population proportion estimates from respondent-driven sampling using the tree bootstrap method. In this paper we establish the consistency of this tree bootstrap approach in the case of m-trees.
-
作者:Payne, R. D.; Guha, N.; Ding, Y.; Mallick, B. K.
作者单位:Eli Lilly; Lilly Research Laboratories; University of Massachusetts System; University of Massachusetts Lowell; Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station
摘要:Conditional density estimation seeks to model the distribution of a response variable conditional on covariates. We propose a Bayesian partition model using logistic Gaussian processes to perform conditional density estimation. The partition takes the form of a Voronoi tessellation and is learned from the data using a reversible jump Markov chain Monte Carlo algorithm. The methodology models data in which the density changes sharply throughout the covariate space, and can be used to determine ...
-
作者:Heng, J.; Jacob, P. E.
-
作者:Chakraborty, Antik; Bhattacharya, Anirban; Mallick, Bani K.
作者单位:Texas A&M University System; Texas A&M University College Station
摘要:We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior me...
-
作者:Lei, Jing; Chen, Kehui; Lynch, Brian
作者单位:Carnegie Mellon University; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh
摘要:We consider multi-layer network data where the relationships between pairs of elements are reflected in multiple modalities, and may be described by multivariate or even high-dimensional vectors. Under the multi-layer stochastic block model framework we derive consistency results for a least squares estimation of memberships. Our theorems show that, as compared to single-layer community detection, a multi-layer network provides much richer information that allows for consistent community detec...
-
作者:Bachoc, Francois; Genton, Marc G.; Nordhausen, Klaus; Ruiz-Gazen, Anne; Virta, Joni
作者单位:Universite de Toulouse; Universite Toulouse III - Paul Sabatier; King Abdullah University of Science & Technology; Technische Universitat Wien; Universite de Toulouse; Universite Toulouse 1 Capitole; Toulouse School of Economics; University of Turku
摘要:Recently a blind source separation modelwas suggested for spatial data, along with an estimator based on the simultaneous diagonalization of two scatter matrices. The asymptotic properties of this estimator are derived here, and a new estimator based on the joint diagonalization of more than two scatter matrices is proposed. The asymptotic properties and merits of the novel estimator are verified in simulation studies. A real-data example illustrates application of the method.
-
作者:Li, Tianxi; Levina, Elizaveta; Zhu, Ji
作者单位:University of Virginia; University of Michigan System; University of Michigan
摘要:While many statistical models and methods are now available for network analysis, resampling of network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but it is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. In this paper we propose a new network resampling strategy, based on splitting node pairs rather than nodes, that is a...
-
作者:Shin, Sunyoung; Liu, Yufeng; Cole, Stephen R.; Fine, Jason P.
作者单位:University of Texas System; University of Texas Dallas; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina; University of North Carolina Chapel Hill
摘要:We consider scenarios in which the likelihood function for a semiparametric regression model factors into separate components, with an efficient estimator of the regression parameter available for each component. An optimal weighted combination of the component estimators, named an ensemble estimator, may be employed as an overall estimate of the regression parameter, and may be fully efficient under uncorrelatedness conditions. This approach is useful when the full likelihood function may be ...
-
作者:Yoon, Grace; Carroll, Raymond J.; Gaynanova, Irina
作者单位:Texas A&M University System; Texas A&M University College Station
摘要:Canonical correlation analysis investigates linear relationships between two sets of variables, but it often works poorly on modern datasets because of high dimensionality and mixed data types such as continuous, binary and zero-inflated. To overcome these challenges, we propose a semiparametric approach to sparse canonical correlation analysis based on the Gaussian copula. The main result of this paper is a truncated latent Gaussian copula model for data with excess zeros, which allows us to ...