您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Applied Statistics > 2025 > 2期

A DUAL-DICTIONARY MODEL FOR MINING DOMAIN-SPECIFIC CHINESE TEXTS

成果类型：

Article

署名作者：

Xu, Jiaze; Pand, Changzai; Deng, Ke

署名单位：

Tsinghua University; Tsinghua University; Tsinghua University

刊物名称：

ANNALS OF APPLIED STATISTICS

ISSN/ISSBN：

1932-6157

DOI：

10.1214/25-AOAS2035

发表日期：

2025

页码：

1147-1166

关键词：

maximum-likelihood

摘要：

Processing domain-specific Chinese texts is one of the most challenging tasks in natural language processing (NLP) due to the unique characteristics of Chinese and specific challenges in processing domain-specific texts. Existing NLP methods, based on supervised learning or large language models, often suffer from unstable performance in processing domain-specific Chinese texts. In this study we propose a novel statistical approach called TopWORDS-MEPA (Abbreviated as TWM) that can achieve high-quality meta-pattern discovery, named entity recognition, text segmentation, and relation extraction simultaneously from unannotated target texts with little training information. Simulation studies and real data applications confirm that TopWORDS-MEPA is a powerful alternative of existing NLP methods for processing domain-specific Chinese texts that enjoys competitive performance, transparent interpretation, low training and computing costs, and efficient utilization of domain knowledge for processing domain-specific Chinese texts.

来源URL：

访问原文