-
作者:Matechou, Eleni; Argiento, Raffaele
作者单位:University of Kent; University of Bergamo
摘要:We propose a novel approach for modeling capture-recapture (CR) data on open populations that exhibit temporary emigration, while also accounting for individual heterogeneity to allow for differences in visit patterns and capture probabilities between individuals. Our modeling approach combines changepoint processes-fitted using an adaptive approach-for inferring individual visits, with Bayesian mixture modeling-fitted using a nonparametric approach-for identifying dusters of individuals with ...
-
作者:Fan, Jianqing; Yang, Zhuoran; Yu, Mengxin
作者单位:Princeton University; Yale University
摘要:In this article, we leverage over-parameterization to design regularization-free algorithms for the high-dimensional single index model and provide theoretical guarantees for the induced implicit regularization phenomenon. Specifically, we study both vector and matrix single index models where the link function is nonlinear and unknown, the signal parameter is either a sparse vector or a low-rank symmetric matrix, and the response variable can be heavy-tailed. To gain a better understanding of...
-
作者:Chen, Yu-Ting; Chiou, Jeng-Min; Huang, Tzee-Ming
作者单位:National Chengchi University; Academia Sinica - Taiwan
摘要:We present a new approach known as greedy segmentation (GS) to identify multiple changepoints for a functional data sequence. The proposed multiple changepoint detection criterion links detectability with the projection onto a suitably chosen subspace and the changepoint locations. The changepoint estimator identifies the true changepoints for any predetermined number of changepoint candidates, either over-reporting or under-reporting. This theoretical finding supports the proposed GS estimato...
-
作者:McCulloch, Charles E.; Neuhaus, John M.
作者单位:University of California System; University of California San Francisco
摘要:Statistical models that generate predicted random effects are widely used to evaluate the performance of and rank patients, physicians, hospitals and health plans from longitudinal and clustered data. Predicted random effects have been proven to outperform treating clusters as fixed effects (essentially a categorical predictor variable) and using standard regression models, on average. These predicted random effects are often used to identify extreme or outlying values, such as poorly performi...
-
作者:Zhong, Wenxuan; Liu, Yiwen; Zeng, Peng
作者单位:University System of Georgia; University of Georgia; University of Arizona; Auburn University System; Auburn University
摘要:With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singu...
-
作者:Krivitsky, Pavel N.; Coletti, Pietro; Hens, Niel
作者单位:University of New South Wales Sydney; University of New South Wales Sydney; Hasselt University; University of Antwerp; University of New South Wales Sydney
摘要:The last two decades have seen considerable progress in foundational aspects of statistical network analysis, but the path from theory to application is not straightforward. Two large, heterogeneous samples of small networks of within-household contacts in Belgium were collected using two different but complementary sampling designs: one smaller but with all contacts in each household observed, the other larger and more representative but recording contacts of only one person per household. We...
-
作者:Xia, Yin; Cai, T. Tony
作者单位:Fudan University; University of Pennsylvania
-
作者:Jiang, Roulan; Zhan, Xiang; Wang, Tianying
作者单位:Tsinghua University; Tsinghua University; Peking University; Peking University
摘要:In microbiome studies, it is of interest to use a sample from a population of microbes, such as the gut microbiota community, to estimate the population proportion of these taxa. However, due to biases introduced in sampling and preprocessing steps, these observed taxa abundances may not reflect true taxa abundance patterns in the ecosystem. Repeated measures, including longitudinal study designs, may be potential solutions to mitigate the discrepancy between observed abundances and true under...
-
作者:Wang, Shulei
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Self-supervised metric learning has been a successful approach for learning a distance from an unlabeled dataset. The resulting distance is broadly useful for improving various distance-based downstream tasks, even when no information from downstream tasks is used in the metric learning stage. To gain insights into this approach, we develop a statistical framework to theoretically study how self-supervised metric learning can benefit downstream tasks in the context of multi-view data. Under th...
-
作者:Xu, Ganggang; Liang, Chen; Waagepetersen, Rasmus; Guan, Yongtao
作者单位:University of Miami; State University of New York (SUNY) System; Binghamton University, SUNY; Amazon.com; Aalborg University
摘要:Specification of a parametric model for the intensity function is a fundamental task in statistics for spatial point processes. It is, therefore, crucial to be able to assess the appropriateness of a suggested model for a given point pattern dataset. For this purpose, we develop a new class of semiparametric goodness-of-fit tests for the specified parametric first-order intensity, without assuming a full data generating mechanism that is needed for the existing popular Monte Carlo tests. The p...