SPARSE CLUSTERING FOR CUSTOMER SEGMENTATION WITH HIGH-DIMENSIONAL MIXED-TYPE DATA
成果类型:
Article
署名作者:
Wang, Feifei; Xu, Shaodong; Qin, Yichen; Shen, Ye; Li, Yang
署名单位:
Renmin University of China; Renmin University of China; University System of Ohio; University of Cincinnati; University System of Georgia; University of Georgia
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1886
发表日期:
2024
页码:
2382-2402
关键词:
VARIABLE SELECTION
摘要:
Customer segmentation has wide applications in business activities, such as personalized marketing and targeted product development. To realize customer segmentation, clustering methods are commonly used. However, modern customer segmentation encounters challenges characterized by highdimensionality and mixed-type variables (i.e., the mixture of continuous variables and categorical variables). It brings great challenges to customer segmentation, because most existing clustering methods are only designed for data with one single type of variables. Furthermore, the existence of noise variables highlights the necessity of simultaneous variable selection and data clustering. Motivated by these issues, we develop a Davies-Bouldin index based sparse clustering (DBI-SC) method for customer segmentation with high-dimensional mixed-type data. In this method we define dissimilarity measures for continuous variables and categorical variables separately. Then an adjusted DBI criterion is designed to measure the contribution of each variable to clustering. For variable selection we apply the sparse clustering framework and introduce different penalty parameters for the mixed-type variables. The screening consistency property of the DBI-SC method is also investigated. Extensive simulation studies demonstrate the satisfactory performance of the DBI-SC method in both clustering and variable selection. Finally, a designated driving service dataset is analyzed for customer segmentation using the proposed method.
来源URL: