基于LightGBM模型的信贷违约概率预测研究

ASDS

Applied Statistics and Data Science

3066-84333066-8441

Art and Design

10.61369/ASDS.2025040019

Article

基于LightGBM模型的信贷违约概率预测研究https://artdesignp.com/journal/ASDS/1/4/10.61369/ASDS.2025040019黄乐乐,陈林

2025

2025-06-20

&emsp;信用评级是信贷业务的核心,为此各种统计建模方法应运而生.&emsp;随着大数据时代的到来,收集数据的范围显著扩大,可用于信用评级的特征数量也随之增加.这些带来了特征冗余的风险,&emsp;因此特征选择是建模过程中至关重要的一步.本文提出了一种两阶段信用评分建模方法.首先对全部特征进行基于Mean&emsp;Variance的独立性检验,&emsp;进行初步筛选,&emsp;然后采用基于LightGBM的分类模型得到最终的违约概率预测模型.此外,&emsp;我们构建了一个虚拟特征,用于检测模型中是否仍然存在冗余特征.最后,将该方法应用于实际的在线信贷业务数据,以评估该方法的有效性。信用评级,特征冗余,独立性检验,LightGBM

[1] KE G, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient boosting decision tree[J]. Advances in Neural Information Processing Systems, 2017, 30: 3146-3154. [2] BANASIK J, CROOK J, THOMAS L. Sample selection bias in credit scoring models[J]. Journal of the Operational Research Society, 2003, 54(8): 822-832. [3] CHEN G G, ÅSTEBRO T. Bound and collapse Bayesian reject inference for credit scoring[J]. Journal of the Operational Research Society, 2012, 63(10): 1374-1387. [4] FENG X, XIAO Z, ZHONG B, et al. Dynamic ensemble classification for credit scoring using soft probability[J]. Applied Soft Computing, 2018, 65: 139-151. [5] DIRICK L, CLAESKENS G, JERUSALEM G, et al. Macro-economic factors in credit risk calculations: including time-varying covariates in mixture cure models[J]. Journal of Business & Economic Statistics, 2019, 37(1): 40-53. [6] FANG F, CHEN Y. A new approach for credit scoring by directly maximizing the Kolmogorov-Smirnov statistic[J]. Computational Statistics & Data Analysis, 2019, 133: 180-194. [7] SHEN F, ZHAO X, KOU G. Three-stage reject inference learning framework for credit scoring using unsupervised transfer learning and three-way decision theory[J]. Decision Support Systems, 2020, 137: 113366. [8] KOZODOI N, JACOB J, LESSMANN S. Fairness in credit scoring: Assessment, implementation and profit implications[J]. European Journal of Operational Research, 2022, 297(3): 1083-1094. [9] MUSHAVA J, MURRAY M. A novel XGBoost extension for credit scoring classimbalanced data combining a generalized extreme value link and a modified focal loss function[J]. Expert Systems with Applications, 2022, 202: 117233. [10] HE H, ZHANG S, SHEN F, et al. A privacy-preserving decentralized credit scoring method based on multi-party information[J]. Decision Support Systems, 2023, 166: 113910. [11] CHATTERJEE S, CORBAE D, NAKAJIMA M, et al. A quantitative theory of the credit score[J]. Econometrica, 2023, 91(5): 1803-1840. [12] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58(1): 267288. [13] FAN J, LI R. Variable selection via nonconcave penalized likelihood and its oracle properties[J]. Journal of the American Statistical Association, 2001, 96(456): 13481360. [14] CUI H, LI R, ZHONG W. Model-free feature screening for ultrahigh dimensional discriminant analysis[J]. Journal of the American Statistical Association, 2015, 110(510): 630-641. [15] 陈秋华, 杨慧荣, 崔恒建. 变量筛选后的个人信贷评分模型与统计学习[J]. 数理统计与管理, 2020, 39(2): 13. [16] 王冠鹏, 秦双燕, 崔恒建. 员工流失的影响因素分析与预测[J]. 系统科学与数学, 2022, 42(6): 1616-1632.1