-
作者:Zheng, Jiayin; Dong, Xinyuan; Newton, Christina C.; Hsu, Li
作者单位:Fred Hutchinson Cancer Center; University of Washington; University of Washington Seattle; American Cancer Society
摘要:Cancer is a heterogeneous disease, and rapid progress in sequencing and -omics technologies has enabled researchers to characterize tumors comprehensively. This has stimulated an intensive interest in studying how risk factors are associated with various tumor heterogeneous features. The Cancer Prevention Study-II (CPS-II) cohort is one of the largest prospective studies, particularly valuable for elucidating associations between cancer and risk factors. In this article, we investigate the ass...
-
作者:Chevallier, Augustin; Fearnhead, Paul; Sutton, Matthew
作者单位:Lancaster University; Queensland University of Technology (QUT)
摘要:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), has recently shown great promise: they are nonreversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selecti...
-
作者:Li, Zhu; Su, Weijie J.; Sejdinovic, Dino
作者单位:University of London; University College London; University of Pennsylvania; University of Oxford
摘要:Modern machine learning models often exhibit the benign overfitting phenomenon, which has recently been characterized using the double descent curves. In addition to the classical U-shaped learning curve, the learning risk undergoes another descent as we increase the number of parameters beyond a certain threshold. In this article, we examine the conditions under which benign overfitting occurs in the random feature (RF) models, that is, in a two-layer neural network with fixed first layer wei...
-
作者:Chen, Yinyin; He, Shishuang; Yang, Yun; Liang, Feng
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Topic models provide a useful text-mining tool for learning, extracting, and discovering latent structures in large text corpora. Although a plethora of methods have been proposed for topic modeling, lacking in the literature is a formal theoretical investigation of the statistical identifiability and accuracy of latent topic estimation. In this article, we propose a maximum likelihood estimator (MLE) of latent topics based on a specific integrated likelihood that is naturally connected to the...
-
作者:Deng, Hang; Han, Qiyang; Sen, Bodhisattva
作者单位:Rutgers University System; Rutgers University New Brunswick; Columbia University
摘要:In this article, we develop automated inference methods for local parameters in a collection of convexity constrained models based on the natural constrained tuning-free estimators. A canonical example is given by the univariate convex regression model, in which automated inference is drawn for the function value, the function derivative at a fixed interior point, and the anti-mode of the convex regression function, based on the widely used tuning-free, piecewise linear convex least squares es...
-
作者:Nishimura, Akihiko; Suchard, Marc A.
作者单位:Johns Hopkins University; University of California System; University of California Los Angeles
摘要:In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10(5)-10(6) and of 10(4) -10(5). Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the large n and large psetting, however, the required poster...
-
作者:Miao, Rui; Zhang, Xiaoke; Wong, Raymond K. W.
作者单位:George Washington University; Texas A&M University System; Texas A&M University College Station
摘要:Measuring and testing the dependency between multiple random functions is often an important task in functional data analysis. In the literature, a model-based method relies on a model which is subject to the risk of model misspecification, while a model-free method only provides a correlation measure which is inadequate to test independence. In this paper, we adopt the Hilbert-Schmidt Independence Criterion (HSIC) to measure the dependency between two random functions. We develop a two-step p...
-
作者:Kong, Xin-Bing; Lin, Jin-Guan; Liu, Cheng; Liu, Guang-Ying
作者单位:Nanjing Audit University; Wuhan University
摘要:In this article, we study the discrepancy between the global principal component analysis (GPCA) and local principal component analysis (LPCA) in recovering the common components of a large-panel high-frequency data. We measure the discrepancy by the total sum of squared differences between common components reconstructed from GPCA and LPCA. The asymptotic distribution of the discrepancy measure is provided when the factor space is time invariant as the dimension p and sample size n tend to in...
-
作者:Paparoditis, Efstathios; Shang, Han Lin
作者单位:University of Cyprus; Macquarie University
摘要:A bootstrap procedure for constructing prediction bands for a stationary functional time series is proposed. The procedure exploits a general vector autoregressive representation of the time-reversed series of Fourier coefficients appearing in the Karhunen-Loeve representation of the functional process. It generates backward-in-time functional replicates that adequately mimic the dependence structure of the underlying process in a model-free way and have the same conditionally fixed curves at ...
-
作者:Zhu, Yunzhang; Shen, Xiaotong; Jiang, Hui; Wong, Wing Hung
作者单位:University System of Ohio; Ohio State University; University of Minnesota System; University of Minnesota Twin Cities; University of Michigan System; University of Michigan; Stanford University
摘要:In multilabel classification, strong label dependence is present for exploiting, particularly for word-to-word dependence defined by semantic labels. In such a situation, we develop a collaborative-learning framework to predict class labels based on label-predictor pairs and label-only data. For example, in image categorization and recognition, language expressions describe the content of an image together with a large number of words and phrases without associated images. This article propose...