A Hybrid Deep Learning Model For Consumer Credit Scoring: Bing Zhu, Wenchuan Yang, Huaxuan Wang, Yuan Yuan
A Hybrid Deep Learning Model For Consumer Credit Scoring: Bing Zhu, Wenchuan Yang, Huaxuan Wang, Yuan Yuan
A Hybrid Deep Learning Model For Consumer Credit Scoring: Bing Zhu, Wenchuan Yang, Huaxuan Wang, Yuan Yuan
Abstract—Consumer credit scoring is an essential part of language processing, where they get superior performance
credit risk management in the fast-growing consumer finance compared to traditional machine learning and statistical
industry and various data mining techniques have been techniques [4].
proposed and used on it. Recently, deep learning techniques In this paper, we introduce convolutional neural network
have gained significant popularity and shown excellent (CNN) which is a representative technique in deep learning
performance in many fields such as image recognition, to the consumer credit scoring. Convolutional neural
computer vision and so on. In this paper, we try to take the network first appeared in the paper of Yann Lecun, Leon
advantage of deep learning and introduce it into consumer Bottou, Yoshua Bengio, and Patrick Haffner [5], which is
credit scoring. We propose a hybrid model that combines the
designed to handle the variability of data in 2D shape. Since
well-known convolutional neural network with the feature
then, CNN has been widely applied to many image
selection algorithm Relief. Experiments are carried on a
real-world dataset from a Chinese consumer finance company,
processing tasks. Considering its outstanding ability, we
and the results show that the proposed model gets superior decide to apply it to consumer credit scoring to see whether
performance in comparison with other benchmark models it still works well. We propose a hybrid credit scoring model
such as logistic regression and random forest. that combines the convolutional neural network with the
well-known feature selection algorithm Relief. To our
Keywords-consumer credit scoring; hybrid model; deep knowledge, this is the first attempt to apply the convolutional
learning; convolutional neural network; relief algorithm neural network to consumer credit scoring. In the hybrid
model, the application of Relief algorithm is meant to reduce
I. INTRODUCTION the computational burden of the convolutional neural
Nowadays, due to the rapid development of information network. In order to verify the performance of the proposed
technology, consumer credit business, especially the online hybrid model, we carry out an empirical experiment and
one, grows vigorously worldwide. The scale of online compare it with logistic regression and random forest on a
consumer finance transactions in China grows from 6 billion dataset collected from a Chinese consumer finance company.
in 2013 to 436.7 billion in 2016. Meanwhile, the amount of The results show that our model outperforms both logistic
consumer credit held by banks reached $1132 billion in 2013 regression and random forest. The rest of our paper is
in US [1]. The increasing demand for consumer credit organized as follows. Section 2 briefly reviews the existing
provides great opportunities as well as risks. This results in techniques used in credit scoring. Section 3 introduces the
credit scoring being developed as an indispensable part of proposed hybrid model. Section 4 presents the empirical
credit risk management. experiment procedures and results. Section 5 gives the
Credit scoring is often treated as a binary classification conclusion and directions for our future work.
task. The idea of credit scoring model is trying to use
characteristics of consumers like age, gender, saving amount, II. RELATED WORK
employment status and so on, to determine whether Credit scoring, proposed by Durand [6] about 70 years
customers are credit-worthy or not [2]. A broad range of ago, has become an essential part for credit risk management.
techniques has been applied to solve the credit scoring During the past few years, many classification models are
problem. Basically, those methods can be divided into two developed to tackle with the credit scoring problem. Logistic
groups: statistical methods (e.g. logistic regression, regression [7] and decision trees [8] are the most
discriminant analysis) and machine learning techniques (e.g. widely-used models in credit scoring. Both two models are
support vector machine, k-nearest neighbor, decision tree, proven to be simple and efficient. More complicate machine
neural network) [3]. However, the researchers haven’t come learning techniques appear such as support vector machine
across any conclusive proof that one method is irrefutably (SVM) and neural network, also broadly applied to credit
superior over another. During the past few years, deep scoring [1]. Moreover, ensemble methods which combine
learning techniques emerge along with the evolution of the advantages of various single classifiers are developing fast
computing power. Deep learning has achieved successes in recently. For example, Maher Alaraj and Maysam F. Abbod
various areas, such as computer vision, pattern recognition, [9] developed the multiple classifier system that employs
speech recognition, emotion recognition and natural neural networks, support vector machine, decision trees and
206
continuous variables into categorical ones with k values. set of learnable filters that slide over the image to extract
Then we reshape every into a binary value vector features. Pooling layer is set to reduce the spatial size of
{ 1 , 2 , ⋯ , }. representation as well as the number of parameters and the
amount of computation in the network, hence improve the
ℎ
1, ℎ ℎ model efficiency and control overfitting.
= (2)
0, ℎ Compared to traditional neural network, convolutional
neural network replaces general matrix multiplication with
After the transformation, every observation correspond to convolution, which reduces the number of weights used in
a labeled gray image with the dimension of k×s. the network and allows the image to be imported directly.
0 ⋯ 1 Another important characteristic of CNN is parameter
y + ⋮ ⋱ ⋮ (3) sharing. The basic idea of parameter sharing is to learn one
1 ⋯ 0 ×
set of parameters through the whole process instead of
learning different parameters sets at each location. This
C. Convolutional Neural Networks Category unique feature improves the efficiency of whole network. In
Convolutional neural network is a powerful deep our paper, we decide to use the CNN with the structure
learning architecture for processing two-dimensional image proposed by Alex et al. [16], which included the dropout
data. The basic structure of CNN is shown in Figure 2. As technique to reduce complex co-adaptations of neurons and
shown in Figure 2, CNN contains two special types of layers prevent overfitting.
called convolutional layer and pooling layer. Convolutional
layer is the core building block of the CNN. It consists of a
207
Characteristics curve, and it measures the distinguishing ACKNOWLEDGMENT
ability of the classification model. The K-S statistic is This work is supported by the National Natural Science
calculated as the maximum difference between the curves Foundation of China (Grant No. 71401115) and funded by
generated by the true positive and false positive rates. It is a Sichuan University (Grant No. skqy201742).
commonly used metric of classifier performance in the
credit-scoring application domain. REFERENCES
From the results, we can see that our hybrid model [1] Lessmann, S., Baesens, B., Seow, H.V., and Thomas, L.C., 2015.
Relief-CNN gets a much better AUC value of 0.6989 than Benchmarking state-of-the-art classification algorithms for credit
other two models, random forest (0.601) and logistic scoring: An update of research. European Journal of Operational
regression (0.5221). For K-S statistic, the Relief-CNN gets a Research 247, 1, 124-136.
value of 0.312, followed by random forest (0.235) and [2] Crook, J.N., Edelman, D.B., and Thomas, L.C., 2007. Recent
logistic regression (0.064). It indicates that the hybrid deep developments in consumer credit risk assessment. European Journal
of Operational Research 183, 3, 1447-1465.
learning model can better distinguish the credit-worthy
instances from un-credit-worthy instances. Besides AUC and [3] Hooman, A., Marthandan, G., Wan, F.W.Y., OMID, M., and
Karamizadeh, S., 2016. Statistical and data mining methods in credit
K-S statistic, it can be seen that the accuracy rate of scoring. Journal of Developing Areas 50 (5), 371-381.
Relief-CNN (91.6%) is slightly higher than random forest [4] Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., and Alsaadi, F.E., 2016.
(91.4%), and the logistic regression with 85.8%. Based on all A survey of deep neural network architectures and their applications.
the three measures, we conclude that Relief-CNN yields Neurocomputing 234, 11-26.
significantly better results than other two benchmark [5] Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P., 1998.
classification techniques. Gradient-based learning applied to document recognition.
Proceedings of the IEEE 86, 11, 2278-2324.
[6] Durand, D., 1941. Risk Elements in Consumer Instalment Financing.
NBER Books.
[7] Bensic, M., Sarlija, N., and Zekic-Susac, M., 2005. Modelling
small-business credit scoring by using logistic regression, neural
networks and decision trees. Intelligent Systems in Accounting
Finance & Management 13, 3, 133–150.
[8] Nie, G., Wei, R., Zhang, L., Tian, Y., and Shi, Y., 2011. Credit card
churn forecasting by logistic regression and decision tree. Expert
Systems with Applications 38, 12, 15273-15285.
[9] Alaraj, M. and Abbod, M.F., 2016. Classifiers consensus system
approach for credit scoring. Knowledge-Based Systems. 104, 89-105.
[10] West, D., 2000. Neural network credit scoring models. Computers &
Operations Research 27, 11, 1131-1152.
[11] Bellotti, T. and Crook, J., 2009. Support vector machines for credit
Figure 3. The results of AUC, K-S statistic and accuracy. scoring and discovery of significant features. Expert Systems with
Applications 36, 2, 3302-3308.
V. CONCLUSION [12] Niimi, A., 2015. Deep learning for credit card data analysis. In
Proceeding of World Congress on Internet Security, 73-77.
In the consumer finance industry, credit risk now is a [13] Luo, C., Wu, D., and Wu, D., 2016. A deep learning approach for
necessary factor to succeed. Credit scoring plays an credit scoring using credit default swaps. Engineering Applications of
important role in managing credit risk. In this paper, we Artificial Intelligence 65, 465-470.
develop a new tool for credit scoring by proposing a hybrid [14] Tran, K., Duong, T., and Ho, Q., 2017. Credit scoring model: A
deep learning model which combines the convolutional combination of genetic programming and deep learning. In
neural network (CNN) with Relief algorithm. Our Proceeding of Future Technologies Conference, 145-149.
experiments compared the hybrid model with logistic [15] Kira, Kenji, Rendell, and A, L., 1992. A practical approach to feature
regression and random forest on a real-world dataset from a selection. In Proceedings of the Ninth International Conference on
Machine learning. 249-256.
Chinese consumer finance company. The results clearly
[16] Alex, Sutskever, I., and Hinton, G.E., 2012. ImageNet classification
show that our hybrid model Relief-CNN is superior to the with deep convolutional neural networks. In Proceeding of
benchmark algorithms. We strongly believe that deep International Conference on Neural Information Processing Systems,
learning techniques can provide strong support for credit 1097-1105.
scoring. As a direction of further research, we will explore
other ways to reshape data, and work on the structure
optimization of the convolutional neural network.
208