<?xml version="1.1" encoding="utf-8"?>
<article xsi:noNamespaceSchemaLocation="http://jats.nlm.nih.gov/publishing/1.1/xsd/JATS-journalpublishing1-mathml3.xsd" dtd-version="1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><front><journal-meta><journal-id journal-id-type="publisher-id">ASDS</journal-id><journal-title-group><journal-title>Applied Statistics and Data Science</journal-title></journal-title-group><issn>3066-8433</issn><eissn>3066-8441</eissn><publisher><publisher-name>Art and Design</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.61369/ASDS.2025040019</article-id><article-categories><subj-group subj-group-type="heading"><subject>Article</subject></subj-group></article-categories><title>基于LightGBM模型的信贷违约概率预测研究</title><url>https://artdesignp.com/journal/ASDS/1/4/10.61369/ASDS.2025040019</url><author>黄乐乐,陈林</author><pub-date pub-type="publication-year"><year>2025</year></pub-date><volume>1</volume><issue>4</issue><history><date date-type="pub"><published-time>2025-06-20</published-time></date></history><abstract>&amp;emsp;信用评级是信贷业务的核心,为此各种统计建模方法应运而生.&amp;emsp;随着大数据时代的到来,收集数据的范围显著扩大,可用于信用评级的特征数量也随之增加.这些带来了特征冗余的风险,&amp;emsp;因此特征选择是建模过程中至关重要的一步.本文提出了一种两阶段信用评分建模方法.首先对全部特征进行基于Mean&amp;emsp;Variance的独立性检验,&amp;emsp;进行初步筛选,&amp;emsp;然后采用基于LightGBM的分类模型得到最终的违约概率预测模型.此外,&amp;emsp;我们构建了一个虚拟特征,用于检测模型中是否仍然存在冗余特征.最后,将该方法应用于实际的在线信贷业务数据,以评估该方法的有效性。</abstract><keywords>信用评级,特征冗余,独立性检验,LightGBM</keywords></article-meta></front><body/><back><ref-list><ref id="B1" content-type="article"><label>1</label><element-citation publication-type="journal"><p>[1] KE G, MENG Q, FINLEY T, et al. LightGBM: A highly efficient gradient&amp;nbsp;boosting decision tree[J]. Advances in Neural Information Processing Systems, 2017,&amp;nbsp;30: 3146-3154.&amp;nbsp;[2] BANASIK J, CROOK J, THOMAS L. Sample selection bias in credit scoring&amp;nbsp;models[J]. Journal of the Operational Research Society, 2003, 54(8): 822-832.&amp;nbsp;[3] CHEN G G, &amp;Aring;STEBRO T. Bound and collapse Bayesian reject inference for credit&amp;nbsp;scoring[J]. Journal of the Operational Research Society, 2012, 63(10): 1374-1387.&amp;nbsp;[4] FENG X, XIAO Z, ZHONG B, et al. Dynamic ensemble classification for credit&amp;nbsp;scoring using soft probability[J]. Applied Soft Computing, 2018, 65: 139-151.&amp;nbsp;[5] DIRICK L, CLAESKENS G, JERUSALEM G, et al. Macro-economic factors in&amp;nbsp;credit risk calculations: including time-varying covariates in mixture cure models[J].&amp;nbsp;Journal of Business &amp;amp; Economic Statistics, 2019, 37(1): 40-53.&amp;nbsp;[6] FANG F, CHEN Y. A new approach for credit scoring by directly maximizing the&amp;nbsp;Kolmogorov-Smirnov statistic[J]. Computational Statistics &amp;amp; Data Analysis, 2019,&amp;nbsp;133: 180-194.&amp;nbsp;[7] SHEN F, ZHAO X, KOU G. Three-stage reject inference learning framework for&amp;nbsp;credit scoring using unsupervised transfer learning and three-way decision theory[J].&amp;nbsp;Decision Support Systems, 2020, 137: 113366.&amp;nbsp;[8] KOZODOI N, JACOB J, LESSMANN S. Fairness in credit scoring: Assessment,&amp;nbsp;implementation and profit implications[J]. European Journal of Operational Research,&amp;nbsp;2022, 297(3): 1083-1094.&amp;nbsp;[9] MUSHAVA J, MURRAY M. A novel XGBoost extension for credit scoring classimbalanced data combining a generalized extreme value link and a modified focal loss&amp;nbsp;function[J]. Expert Systems with Applications, 2022, 202: 117233.&amp;nbsp;[10] HE H, ZHANG S, SHEN F, et al. A privacy-preserving decentralized credit&amp;nbsp;scoring method based on multi-party information[J]. Decision Support Systems,&amp;nbsp;2023, 166: 113910.&amp;nbsp;[11] CHATTERJEE S, CORBAE D, NAKAJIMA M, et al. A quantitative theory of&amp;nbsp;the credit score[J]. Econometrica, 2023, 91(5): 1803-1840.&amp;nbsp;[12] TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of&amp;nbsp;the Royal Statistical Society: Series B (Statistical Methodology), 1996, 58(1): 267288.&amp;nbsp;[13] FAN J, LI R. Variable selection via nonconcave penalized likelihood and its oracle&amp;nbsp;properties[J]. Journal of the American Statistical Association, 2001, 96(456): 13481360.&amp;nbsp;[14] CUI H, LI R, ZHONG W. Model-free feature screening for ultrahigh dimensional&amp;nbsp;discriminant analysis[J]. Journal of the American Statistical Association, 2015,&amp;nbsp;110(510): 630-641.&amp;nbsp;[15] 陈秋华, 杨慧荣, 崔恒建. 变量筛选后的个人信贷评分模型与统计学习[J]. 数理统计与管理, 2020, 39(2): 13.&amp;nbsp;[16] 王冠鹏, 秦双燕, 崔恒建. 员工流失的影响因素分析与预测[J]. 系统科学与数学,&amp;nbsp;2022, 42(6): 1616-1632.1</p><pub-id pub-id-type="doi"/></element-citation></ref></ref-list></back></article>
