Automatic tagging with existing and novel tags
成果类型:
Article
署名作者:
Wang, Junhui; Shen, Xiaotong; Sun, Yiwen; Qu, Annie
署名单位:
City University of Hong Kong; University of Minnesota System; University of Minnesota Twin Cities; University of Illinois System; University of Illinois Urbana-Champaign
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/asx016
发表日期:
2017
页码:
273290
关键词:
Classification
摘要:
Automatic tagging by key words and phrases is important in multi-label classification of a document. In this paper, we first introduce a tagging loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a sum of weighted pairwise margins between two tags by their degree of similarity. We then construct a regularized empirical loss to incorporate linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. One salient feature of the proposed method is its capability to identify novel tags absent from a training sample by using their similarity to existing tags. Computationally, the proposed method is implemented by an alternating direction method of multipliers, integrated with a difference convex algorithm. This permits scalable computation. We show that the method achieves accurate tagging, and that it compares favourably with existing methods. Finally, we apply the proposed method to tagging a Reuters news dataset.