-
作者:Wang, Tianhao; Ratcliffe, Sarah J.; Guo, Wensheng
作者单位:Rush University; University of Virginia; University of Pennsylvania
摘要:In observational studies, the time origin of interest for time-to-event analysis is often unknown, such as the time of disease onset. Existing approaches to estimating the time origins are commonly built on extrapolating a parametric longitudinal model, which rely on rigid assumptions that can lead to biased inferences. In this paper, we introduce a flexible semiparametric curve registration model. It assumes the longitudinal trajectories follow a flexible common shape function with person-spe...
-
作者:Wu, Ruijia; Zhang, Linjun; Cai, T. Tony
作者单位:University of Pennsylvania; Rutgers University System; Rutgers University New Brunswick
摘要:Sparse topic modeling under the probabilistic latent semantic indexing (pLSI) model is studied. Novel and computationally fast algorithms for estimation and inference of both the word-topic matrix and the topic-document matrix are proposed and their theoretical properties are investigated. Both minimax upper and lower bounds are established and the results show that the proposed algorithms are rate-optimal, up to a logarithmic factor. Moreover, a refitting algorithm is proposed to establish as...
-
作者:Duan, Leo L.
作者单位:State University System of Florida; University of Florida
摘要:In Bayesian applications, there is a huge interest in rapid and accurate estimation of the posterior distribution, particularly for high dimensional or hierarchical models. In this article, we propose to use optimization to solve for a joint distribution (random transport plan) between two random variables, theta from the posterior distribution and beta from the simple multivariate uniform. Specifically, we obtain an approximate estimate of the conditional distribution Pi(beta vertical bar the...
-
作者:Liu, Wei; Lin, Huazhen; Zheng, Shurong; Liu, Jin
作者单位:Southwestern University of Finance & Economics - China; Northeast Normal University - China; National University of Singapore
摘要:As high-dimensional data measured with mixed-type variables gradually become prevalent, it is particularly appealing to represent those mixed-type high-dimensional data using a much smaller set of so-called factors. Due to the limitation of the existing methods for factor analysis that deal with only continuous variables, in this article, we develop a generalized factor model, a corresponding algorithm and theory for ultra-high dimensional mixed types of variables where both the sample size n ...
-
作者:Hung, Ying; Lin, Li-Hsiang; Wu, C. F. Jeff
作者单位:Rutgers University System; Rutgers University Newark; Rutgers University New Brunswick; Louisiana State University System; Louisiana State University; University System of Georgia; Georgia Institute of Technology
摘要:Computer simulators are widely used for the study of complex systems. In many applications, there are multiple simulators available with different scientific interpretations of the underlying mechanism, and the goal is to identify an optimal simulator based on the observed physical experiments. To achieve the goal, we propose a selection criterion based on leave-one-out cross-validation. This criterion consists of a goodness-of-fit measure and a generalized degrees of freedom penalizing the si...
-
作者:Gu, Yuqi; Xu, Gongjun
作者单位:Columbia University; University of Michigan System; University of Michigan
摘要:Structured latent attribute models (SLAMs) are a family of discrete latent variable models widely used in education, psychology, and epidemiology to model multivariate categorical data. A SLAM assumes that multiple discrete latent attributes explain the dependence of observed variables in a highly structured fashion. Usually, the maximum marginal likelihood estimation approach is adopted for SLAMs, treating the latent attributes as random effects. The increasing scope of modern assessment data...
-
作者:Han, Xiao; Tong, Xin; Fan, Yingying
作者单位:Chinese Academy of Sciences; University of Science & Technology of China, CAS; University of Southern California
摘要:Based on a Gaussian mixture type model of K components, we derive eigen selection procedures that improve the usual spectral clustering algorithms in high-dimensional settings, which typically act on the top few eigenvectors of an affinity matrix (e.g., (XX)-X-T) derived from the data matrix X. Our selection principle formalizes two intuitions: (i) eigenvectors should be dropped when they have no clustering power; (ii) some eigenvectors corresponding to smaller spiked eigenvalues should be dro...
-
作者:Vega Yon, George G.
作者单位:Utah System of Higher Education; University of Utah
-
作者:Keret, Nir; Gorfine, Malka
作者单位:Tel Aviv University
摘要:Massive sized survival datasets become increasingly prevalent with the development of the healthcare industry, and pose computational challenges unprecedented in traditional survival analysis use cases. In this work we analyze the UK-biobank colorectal cancer data with genetic and environmental risk factors, including a time-dependent coefficient, which transforms the dataset into pseudo-observation form, thus, critically inflating its size. A popular way for coping with massive datasets is do...
-
作者:Perreault, Samuel; Neslehova, Johanna G.; Duchesne, Thierry
作者单位:University of Toronto; McGill University; Laval University
摘要:Joint modeling of a large number of variables often requires dimension reduction strategies that lead to structural assumptions of the underlying correlation matrix, such as equal pair-wise correlations within subsets of variables. The underlying correlation matrix is thus of interest for both model specification and model validation. In this article, we develop tests of the hypothesis that the entries of the Kendall rank correlation matrix are linear combinations of a smaller number of parame...