-
作者:Li, Xiang; Ruan, Feng; Wang, Huiyuan; Long, Qi; Su, Weijie J.
作者单位:University of Pennsylvania; Northwestern University; University of Pennsylvania
摘要:Watermarking is an effective approach to distinguishing text generated by large language models (LLMs) from human-written text. However, the pervasive presence of human edits on LLM-generated text dilutes watermark signals, thereby significantly degrading detection performance of existing methods. In this paper, by modelling human edits through mixture model detection, we introduce a new method-a truncated goodness-of-fit test (Tr-GoF) for detecting watermarked text under human edits. We prove...
-
作者:Pishchagina, Liudmila; Romano, Gaetano; Fearnhead, Paul; Runge, Vincent; Rigaill, Guillem
作者单位:Centre National de la Recherche Scientifique (CNRS); Universite Paris Saclay; Lancaster University; Universite Paris Saclay; INRAE; Centre National de la Recherche Scientifique (CNRS); Universite Paris Cite; Universite Paris Saclay; AgroParisTech; INRAE
摘要:The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but a naive sequential implementation becomes impractical online due to high computational costs. We develop an online algorithm that exactly calculates the likelihood ratio test for a single changepoint in p-dimensional data streams by leveraging a fascinating connection with computational geometry. This connection straightforwardly allows...
-
作者:Frostig, Tzviel; Benjamini, Yoav
作者单位:Tel Aviv University
摘要:This study addresses the challenges of inference following selection in fields like clinical trials, genome-wide association studies, and functional magnetic resonance imaging, where traditional methods like simultaneous confidence intervals (CIs) might be too conservative. We introduce an improved false coverage-statement rate controlling CIs, when the selection is done by passing a threshold in a certain direction. The CIs for the selected parameters are similar to those proposed by Benjamin...
-
作者:Liu, Yang; Goudie, Robert J. B.
作者单位:University of Cambridge; MRC Biostatistics Unit
摘要:Standard Bayesian inference enables building models that combine information from various sources, but this inference may not be reliable if components of the model are misspecified. Cut inference, a particular type of modularized Bayesian inference, is an alternative that splits a model into modules and cuts the feedback from any suspect module. Previous studies have focused on a two module case, but a more general definition of a 'module' remains unclear. We present a formal definition of a ...
-
作者:Yeon, Hyemin; Dai, Xiongtao; Lopez-Pintado, Sara
作者单位:University System of Ohio; Kent State University; Kent State University Salem; Kent State University Kent; University of California System; University of California Berkeley; Northeastern University
摘要:Data depth is a powerful tool originally proposed to rank multivariate data from centre outward. In this context, one of the most archetypical depth notions is Tukey's halfspace depth. In the last few decades, notions of depth have also been proposed for functional data. However, a naive extension of Tukey's depth cannot handle functional data because of its degeneracy. Here, we propose a new halfspace depth for functional data, which avoids degeneracy by regularization. The halfspace projecti...
-
作者:Gazin, Ulysse; Heller, Ruth; Marandon, Ariane; Roquain, Etienne
作者单位:Universite Paris Cite; Centre National de la Recherche Scientifique (CNRS); Sorbonne Universite; Universite Paris Cite; Tel Aviv University; Alan Turing Institute; Sorbonne Universite; Centre National de la Recherche Scientifique (CNRS); Universite Paris Cite
摘要:In supervised learning, including regression and classification, conformal methods provide prediction sets for the outcome/label with finite sample coverage for any machine learning predictor. We consider here the case where such prediction sets come after a selection process. The selection process requires that the selected prediction sets be 'informative' in a well-defined sense. We consider both the classification and regression settings where the analyst may consider as informative only th...
-
作者:Wang, Shulei
作者单位:University of Illinois System; University of Illinois Urbana-Champaign
摘要:Data augmentation is a widely used technique and an essential ingredient in the recent advance in self-supervised representation learning. By preserving the similarity between augmented data, the resulting data representation can improve various downstream analyses and achieve state-of-the-art performance in many applications. Despite the empirical effectiveness, most existing methods lack theoretical understanding under a general nonlinear setting. To fill this gap, we develop a statistical f...
-
作者:Oates, Chris J.; Karvonen, Toni; Teckentrup, Aretha L.; Strocchi, Marina; Niederer, Steven A.
作者单位:Newcastle University - UK; Lappeenranta-Lahti University of Technology LUT; University of Helsinki; University of Edinburgh; Heriot Watt University; University of Edinburgh; Imperial College London; University of London; King's College London
摘要:For over a century, extrapolation methods have provided a powerful tool to improve the convergence order of a numerical method. However, these tools are not well-suited to modern computer codes, where multiple continua are discretized and convergence orders are not easily analysed. To address this challenge, we present a probabilistic perspective on Richardson extrapolation, a point of view that unifies classical extrapolation methods with modern multi-fidelity modelling, and handles uncertain...
-
作者:Zhang, Xinyu; Chan, Kung-Sik
作者单位:East China Normal University; East China Normal University; University of Iowa
摘要:Multivariate time series may be subject to partial structural changes over certain frequency band, for instance, in neuroscience. We study the change point detection problem with high-dimensional time series, within the framework of frequency domain. The overarching goal is to locate all change points and delineate which series are activated by the change, over which frequencies. In practice, the number of activated series per change and frequency could span from a few to full participation. W...
-
作者:Chang, Ming-Chung
作者单位:Academia Sinica - Taiwan
摘要:Multi-stratum factorial designs, such as block designs and row-column designs, are widely used for screening treatment factors in experiments involving complex structures of experimental units due to multiple sources of error. In this study, we propose a unified model-free approach, termed orthogonalized moment aberration, to compare the similarities between level combinations of treatment factors assigned to heterogeneous experimental units. The proposed approach, which uses kernel functions ...