第 1 篇 原理篇
第 1 章 数据科学思维..............................................................2
1.1 数据科学的工作范式 ........................................................................................2
1.2 数据分析方法和流程 ........................................................................................4
1.2.1 发现问题 ...............................................................................................................6
1.2.2 近因分析 ...............................................................................................................8
1.2.3 根因分析 ...............................................................................................................9
1.2.4 做出预测 .............................................................................................................10
1.2.5 制定方案 .............................................................................................................12
1.2.6 验证方案 .............................................................................................................14
1.3 数据挖掘方法论 ..............................................................................................15
1.3.1 CRISP-DM 方法论 .............................................................................................15
1.3.2 SEMMA 方法论 .................................................................................................16
1.4 金融行业数据挖掘场景 ..................................................................................18
第 2 篇 技术篇
第 2 章 某银行贷款产品精准营销模型........................................24
2.1 数据介绍 ..........................................................................................................24
2.2 商业分析 ..........................................................................................................29
2.2.1 发现问题 .............................................................................................................29
2.2.2 诊断问题 .............................................................................................................30
2.2.3 明确目标 .............................................................................................................31
2.2.4 定性分析 .............................................................................................................31
2.3 数据理解 ..........................................................................................................35
2.3.1 建立特征体系 .....................................................................................................35
2.4 数据准备 ..........................................................................................................39
2.4.1 提取被解释变量 .................................................................................................39
2.4.2 提取静态特征和时点特征 .................................................................................40
2.4.3 提取时期特征 .....................................................................................................40
2.4.4 提取预测用数据宽表 .........................................................................................41
2.5 建模和评估 ......................................................................................................42
2.5.1 定量客户画像与数据清洗 .................................................................................42
2.5.2 建立逻辑回归模型 .............................................................................................45
2.5.3 评估模型 .............................................................................................................47
2.6 模型运用的准备工作 ......................................................................................48
2.7 流程回顾 ..........................................................................................................49
第3章 多维特征的客户细分...................................................51
3.1 客户细分 ..........................................................................................................51
3.1.1 客户细分定义 .....................................................................................................51
3.1.2 客户细分类型 .....................................................................................................51
3.1.3 案例:银行多维度客户画像 .............................................................................54
3.2 预处理 ..............................................................................................................57
3.2.1 填补缺失值 .........................................................................................................57
3.2.2 修订错误值 .........................................................................................................58
3.2.3 处理离散变量 .....................................................................................................58
3.2.4 正态化与标准化 .................................................................................................61
3.3 维度分析 ..........................................................................................................64
3.4 聚类 ..................................................................................................................72
3.5 簇特征的解释 ..................................................................................................75
第4章 信用风险预测模型......................................................81
4.1 信贷全生命周期风险管理 ..............................................................................81
4.1.1 贷前阶段 .............................................................................................................81
4.1.2 贷中阶段 .............................................................................................................83
4.1.3 贷后阶段 .............................................................................................................84
4.2 ABC卡简介 .....................................................................................................86
4.2.1 信用评分卡简介 .................................................................................................86
4.2.2 ABC卡的应用 ...................................................................................................87
第5章 贷前信用风险预测模型(A卡).....................................90
5.1 智能信贷审批基本框架 ..................................................................................90
5.1.1 申请人识别 .........................................................................................................91
5.1.2 信贷准入 .............................................................................................................92
5.1.3 申请评分卡 .........................................................................................................97
5.1.4 全样本建模与抽样建模 ...................................................................................106
5.2 特征工程 ........................................................................................................107
5.2.1 数据来源 ...........................................................................................................107
5.2.2 数据加工 ...........................................................................................................109
5.3 模型构建与评估 ............................................................................................121
5.3.1 Logistic回归模型.............................................................................................121
5.3.2 评分刻度与分值分配 .......................................................................................123
5.3.3 模型评估 ...........................................................................................................125
5.4 模型监控 ........................................................................................................129
5.4.1 前端监控 ...........................................................................................................129
5.4.2 后端监控 ...........................................................................................................134
5.5 拒绝推断 ........................................................................................................138
5.5.1 外部数据推断 ...................................................................................................138
5.5.2 模型推断 ...........................................................................................................139
5.5.3 拒绝推断结果的验证 .......................................................................................142
5.6 案例 1:某消费金融公司申请评分卡构建 .................................................143
5.6.1 场景介绍 ...........................................................................................................143
5.6.2 数据清洗 ...........................................................................................................143
5.6.3 特征初筛 ...........................................................................................................148
5.6.4 分箱与 WoE 编码 .............................................................................................149
5.6.5 相关性分析与特征聚类 ...................................................................................151
5.6.6 逐步回归 ...........................................................................................................151
5.6.7 模型评估 ...........................................................................................................153
5.6.8 评分卡的制作 ...................................................................................................155
5.6.9 模型文档 ...........................................................................................................158
5.7 案例 2:制作 Vintage 报告 ...........................................................................159
5.7.1 Vintage 相关业务报表 .....................................................................................159
5.7.2 Vintage 报告的制作 .........................................................................................160
5.8 申请评分卡应用 ............................................................................................166
5.8.1 模型及决策流 ...................................................................................................166
5.8.2 风险策略 ...........................................................................................................167
5.8.3 额度策略 ...........................................................................................................169
第6章 贷中信用风险预测模型(B卡)...................................171
6.1 行为评分卡 ....................................................................................................171
6.1.1 业务理解 ...........................................................................................................171
6.1.2 数据理解 ...........................................................................................................172
6.1.3 特征工程 ...........................................................................................................173
6.1.4 模型构建与评估 ...............................................................................................17
46.2 案例:某信用卡业务行为评分卡构建 ........................................................174
6.2.1 场景介绍 ...........................................................................................................174
6.2.2 数据整理与特征工程 .......................................................................................175
6.2.3 数据清洗与特征初筛 .......................................................................................185
6.2.4 分箱与 WoE 编码 .............................................................................................187
6.2.5 相关性筛选 .......................................................................................................187
6.2.6 逐步回归建模 ...................................................................................................187
6.2.7 模型评估 ...........................................................................................................188
6.3 行为评分卡的应用 ........................................................................................190
6.3.1 额度管理 ...........................................................................................................190
6.3.2 续卡或续贷策略 ...............................................................................................191
6.3.3 客户留存分析和挽留 .......................................................................................191
6.3.4 风险监控 ...........................................................................................................192
第7章 贷后催收模型(C卡)..............................................193
7.1 催收评分卡 ....................................................................................................193
7.1.1 业务理解 ...........................................................................................................193
7.1.2 数据理解 ...........................................................................................................195
7.1.3 特征工程与模型构建 .......................................................................................196
7.2 催收评分卡的应用 ........................................................................................197
7.2.1 预催收阶段 .......................................................................................................198
7.2.2 早期催收阶段 ...................................................................................................199
第8章 申请反欺诈模型.......................................................200
8.1 业务理解 ........................................................................................................200
8.1.1 申请欺诈产生的背景 .......................................................................................200
8.1.2 申请欺诈的分类 ...............................................................................................201
8.1.3 申请欺诈的应对 ...............................................................................................203
8.2 案例:申请反欺诈模型 ................................................................................205
8.2.1 异常特征构造 ...................................................................................................207
8.2.2 网络特征提取 ...................................................................................................214
8.2.3 构建识别模型 ...................................................................................................235
第9章 算法工程化.............................................................248
9.1 构建合理的项目结构 ....................................................................................248
9.1.1 为什么要构建合理的项目结构 .......................................................................248
9.1.2 什么是一个数据科学项目应有的项目结构 ...................................................250
9.2 如何编写规范的数据工程代码 ....................................................................254
9.2.1 代码可读性 .......................................................................................................254
9.2.2 数据处理性能 ...................................................................................................259
展开