您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the Royal Statistical Society: Series B > 2025

Robust detection of watermarks for large language models under human edits

成果类型：

Article; Early Access

署名作者：

Li, Xiang; Ruan, Feng; Wang, Huiyuan; Long, Qi; Su, Weijie J.

署名单位：

University of Pennsylvania; Northwestern University; University of Pennsylvania

刊物名称：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

ISSN/ISSBN：

1369-7412

DOI：

10.1093/jrsssb/qkaf056

发表日期：

2025

关键词：

higher criticism statistics tests

摘要：

Watermarking is an effective approach to distinguishing text generated by large language models (LLMs) from human-written text. However, the pervasive presence of human edits on LLM-generated text dilutes watermark signals, thereby significantly degrading detection performance of existing methods. In this paper, by modelling human edits through mixture model detection, we introduce a new method-a truncated goodness-of-fit test (Tr-GoF) for detecting watermarked text under human edits. We prove that Tr-GoF achieves optimality in robust detection of the Gumbel-max watermark in a certain asymptotic regime of substantial text modifications and vanishing watermark signals. Importantly, Tr-GoF achieves this optimality adaptively without requiring precise knowledge of human edit levels or probabilistic specifications of LLMs, unlike the optimal but impractical Neyman-Pearson likelihood ratio test. Moreover, we establish that Tr-GoF attains the highest detection efficiency rate under moderate text modifications. In contrast, sum-based detection rules used by existing methods fail to achieve optimal robustness in both regimes because the additive nature of their statistics is less resilient to edit-induced noise. We demonstrate Tr-GoF's competitive and sometimes superior performance on synthetic data and open-source LLMs in the OPT and LLaMA families.

来源URL：

访问原文