Claims Fraud Predictive Model
Claims Fraud Predictive Model
Claims Fraud Predictive Model
Abstract
Insurance Industry is a rapidly growing fast industry in terms of large
amount of data. The most critical issue in insurance industry is fraudulent
claims. Fraud is nothing but wrongful or criminal trick planned to result in
financial or personal gains. As the size of data increases, the traditional
approach will not work and it will be tedious job to identify the fraudulent
claims. Moreover, new types of claim will emerge and hence it will be
difficult to predict the fraudulent claims. This paper depicts an overview of
Fraud analytics, prediction, and Data Science algorithms based predictions
in insurance industry.
755
International Journal of Pure and Applied Mathematics Special Issue
1. Introduction
Fraud analytics is a type of data analytics where data analysis is done on the
fraudulent behaviour. There are several domains where fraud may happen like
Credit card fraud, telecommunication fraud, Insurance fraud, Healthcare fraud,
tax evasion etc. Credit card fraud is one of the fraud types which is surveyed
widely in the domain of fraud detection.[34],[35,[36].Due to the popular mode
of payment transaction, both online and offline, the fraud associated with it is
also increasing. There are multiple techniques to detect credit card fraud like
Neural Network [10-11],Group Method of Data Handling [4-5], Bagging[6].
Some other popular models of credit card fraud detection are Hidden Markov
Model [2-3], Bayesian learning [7-9], K-means Clustering [1].The credit card
fraud was categorised [35] as two categories namely behavioural frauds and
Application frauds. Application frauds happen whenever fraudsters[33] acquire
new cards by providing false data to issuing companies[33]. Behavioural frauds
include four types: mail theft, fake cards, stolen/lost cards. Several algorithms
[43] in credit card fraud prediction were compared and derived that Bagging
ensemble classifier is the best method.
756
International Journal of Pure and Applied Mathematics Special Issue
ii) Diagnostic – Based on the previous data, data analysis will be done on why it
is has happened. Identifying and analysing the reason for poor sales in the
previous year is an example of diagnostic data analysis.
iii) Predictive–This type of analysis will suggest[27] what will happen in the
future. It predicts the futuristic scenario based on past historical data. For
example, identifying the area that is likely to perform better sales in the current
year based on past data.
iv)Prescriptive – This type of analysis will suggest what action should be taken.
Basically, how we can make it happen. It gives recommendation on what needs
to be done. For example, how to achieve the best outcome in sales, and strategy
to retain key customers.
3. Predictive Analytics
This paper discusses on predictive analytics and the techniques used for
prediction. Supervised learning and Unsupervised learning are the [28]
techniques used for predictive analytics. Supervised Learning will have a target
757
International Journal of Pure and Applied Mathematics Special Issue
variable. Target variable is the output that is predicted using other relevant
features. Unsupervised Learning does not have a target variable. Following
supervised learning techniques are used for predicting fraudulent claims since it
has a target variable.
Decision tree
Random Forest
Support Vector Machines
Neural Network
XGBoost
Decision tree gives a visualisation view in the form of graph. The sample set is
divided into subset of trees which represent choices and their results. Each node
of a tree represents a choice and the edges represent the decision. The sample
dataset is categorised into training dataset and test dataset. A model is created
with training dataset which gives the prediction accuracy. This model is applied
on the test dataset and the accuracy of prediction is validated. For each predictor
variable, this model can be used to decide on the category(Yes/No, Spam/not
spam-) of the data.Decision tree can deal with continuous data through various
method of decision tree like ID3 method and C4.5.
758
International Journal of Pure and Applied Mathematics Special Issue
Random Forest
In the random forest technique, multiple decision trees are created. A random
subset of the training data is used to create a single decision tree. [16] The
common result of each random subset is taken as the final tree output. A new
study is fed into all the trees and majority vote for each classification was taken
in this model. Missing values and outliers are taken care in random forest
model.
The predictive algorithm which uses this technique will try to imitate the
relationship between input and output variable. This algorithm provides
excellent accuracy and it runs very effectively on large datasets. This
algorithm[14] is widely used for large number of input. Moreover, it has
methods for maintaining balance for the unbalanced datasets
It is identified that for the aggregated model random forest gives better results
than Naïve Bayes. Where as in the personalized models Naïve Bayes gives
better results. In online shopping [15] when large number of discounts are
announced, it paves way for unusual activities in purchasing products and
services. This paper uses random forest algorithm to detect faults using R
language. Prediction can be done using Random Forest technique to identify
customer’s preference regarding the choice of insurance policy options. [12].
Following is the algorithm:
Input: Training dataset
Output: To create “n” of Trees
Step 1: Randomly select “k” features from total “m” features Where k << m
Step 2: Among the “k” features, calculate the node “d” using the best split point.
Step3: Split the node into daughter nodes using the best split.
Step4: Repeat 1 to 3 steps until “l” number of nodes has been reached.
Step 5: Build forest by repeating steps 1 to 4 for “n” number times to create “n” number of
trees.
Neural Networks
759
International Journal of Pure and Applied Mathematics Special Issue
taking unit.
760
International Journal of Pure and Applied Mathematics Special Issue
n dimensional space to classify the data based on target class. The SVM
separates into different classes through a hyperplane or multiple hyperplane.
The hyperplane separates the data points and sometimes it is difficult to separate
the data point through a single hyperplane. The distance between the data point
and hyperplane represents a margin.
4. Discussion
A comparative study is done on the Supervised Technique. Each technique has
its own merits and demerits. Based on the application area and data technique
can be chosen and analytics can be done on that. The merits and demerits are
discussed below as follows:
As a first step the business problem must be clearly identified. Next step is to
identify the data source which is a very important task in data analysis model.
[29] Then subsequently all the data is gathered in one single area which could
be a data mart or data warehouse. Then the data is cleaned up re, inconsistent,
761
International Journal of Pure and Applied Mathematics Special Issue
762
International Journal of Pure and Applied Mathematics Special Issue
6. Conclusion
Like Insurance fraud detection, several fraudulent behaviours are available like
Intrusion detection fraud, credit card fraud, telecommunication fraud etc. It is
prominent that health insurance[21] fraud is viable since it brings heavy loss
overall. By integrating big data technology these claims can be predicted for
large volume of data as well as different variety of data .
References
[1] Srivastava A., Kundu A., Sural S., Majumdar A., Credit Card
Fraud Detection Using Hidden Markov Model, IEEE Transactions
On Dependable And Secure Computing 5(1) (2008), 37-48.
[2] Bhusari V., Patil S., Study of Hidden Markov Model in Credit
Card Fraudulent Detection, International Journal of Computer
Applications 20(5) (2011).
[3] Ivakhnenko A.G., The group method of data handling in
prediction problems, Sov Autom Control 9(6) (1976), 21–30.
[4] Mueller J.A., Lemke F., Self-organising data mining: an intelligent
approach to extract knowledge from data, Script Software
International, Berlin (2009).
[5] Singh S.P., Shukla S.S.P., Rakesh N., Tyagi V., Problem
Reduction In Online Payment System Using Hybrid Model,
International Journal of Managing Information Technology 3(3)
(2011).
[6] Zreapoor M., Shamsolmoali P., Application of Credit Card Fraud
Detection: Based on Bagging Ensemble Classifier, International
Conference on Computer, Communication and Convergence
(2015).
[7] Benson Edwin Raj S., Annie Portia A., Analysis on Credit Card
Fraud Detection Methods, International Conference on
Computer, Communication and Electrical Technology (2011).
[8] Panigrahi S., Kundu A., Sural S., Majumdar A.K., Credit card
fraud detection: A fusion approach using Dempster-Shafer theory
and Bayesian learning, Special Issue on Information Fusion in
Computer Security 10(4) (2009), 354-363
[9] Chang R.I., Lai L.B., Su W.D., Wang J.C., Kouh, J.S., Intrusion
Detection by Backpropagation Neural Networks with Sample-
Query and Attribute-Query, Research India Publications (2006).
[10] Patidar R., Sharma L., Credit Card Fraud Detection Using Neural
Network, International Journal of Soft Computing and
Engineering 1 (2011).
763
International Journal of Pure and Applied Mathematics Special Issue
[11] Guo T., Li G.Y., Neural Data Mining For Credit Card Fraud
detection, Proceedings of the Seventh International Conference
on Machine Learning and Cybernetics (2006).
[12] Lata L.N., Koushika I.A., Hasan S.S., A Comprehensive Survey
of Fraud Detection Techniques, International Journal of Applied
Information Systems 10(2) (2015).
[13] Quinlan J., Learning with continuous classes, 5th Australian joint
conference on artificial intelligence 92 (1992).
[14] Alshamsi A.S., Predicting car insurance policies using random
forest, 10th International Conference on Innovations in
Information Technology (2014), 128-132.
[15] Viaenea S., Auto claim fraud detection using Bayesian learning
neural networks, Elsevier (2005).
[16] Eesha Goel, Abhilasha, Ankit Agarwal, Fraud Detection Using
Random Forest Algorithm, International Journal of Computer
Science Engineering 5(05) (2016).
[17] Salama A.S., Omar A.A., A Back Propagation Artificial Neural
Network based Model for Detecting and Predicting Fraudulent
Financial Reporting, International Journal of Computer
Applications 106(2) (2014).
[18] Fanning K., Cogger K.O., Srivastava R., Detection of
management fraud: A neural network approach. Intelligent
Systems in Accounting, Finance and Management 4(2) (1995),
113-126.
[19] Kirlidog M., Asuk C., A fraud detection approach with data mining
in health insurance, Procedia-Social and Behavioral Sciences 62
(2012), 989-994.
[20] Pai P.F., A support vector machine-based model for detecting
top management fraud, Knowledge-Based Systems 24 (2011),
314–321.
[21] Rawte V., Anuradha G., Fraud Detection in Health Insurance
using Data Mining Techniques, Communication, Information &
Computing Technology (2015).
[22] Peng Y., Kou G., Sabatka A., Chen Z., Khazanchi D., Shi Y.,
Application of clustering methods to health insurance fraud
detection, International Conference on Service Systems and
Service Management 1 (2006), 116-120.
[23] Thornton D., Mueller R.M., Schoutsen P., van Hillegersberg J.,
Predicting healthcare fraud in medicaid: a multidimensional data
model and analysis techniques for fraud detection, Procedia
technology 9 (2013), 1252-1264.
764
International Journal of Pure and Applied Mathematics Special Issue
[24] Lin F., Yeh C.C., Lee M.Y., The use of hybrid manifold learning
and support vector machines in the prediction of business failure,
Knowl. based Syst. (2010), 95–101.
[25] Tang X., Zhuang L., Cai J., Li C., Multi-fault classification based
on support vector machine trained by chaos particle swarm
optimization, Knowl. based Syst. 23(5) (2010), 486–490.
[26] Wan S., Lei, T.C., A knowledge-based decision support system
to analyze the debris-flow problems at Chen-Yu-Lan River,
Taiwan, Knowledge-Based Systems 22(8) (2009), 580-588.
[27] Hafiz K.T., Aghili S., Zavarsky P., The use of predictive analytics
technology to detect credit card fraud in Canada, 11th Iberian
Conference on Information Systems and Technologies (2016),
1-6.
[28] Alfred R., The rise of machine learning for big data analytics, 2nd
International Conference on Science in Information Technology
(2016).
[29] Banarescu A., Detecting and Preventing Fraud with Data
Analytics, Elsevier (2015).
[30] Thornton D., Brinkhuis M., Amrit C., Aly R., Categorizing and
Describing the Types of Fraud in Healthcare, Procedia Computer
Science 64 (2015), 713-720.
[31] Lata L.N., Koushika I.A., Hasan S.S., A Comprehensive Survey
of Fraud Detection Techniques, International Journal of Applied
Information Systems (2015).
[32] Dal Pozzolo A., Caelen O., Le Borgne Y.A., Waterschoot S.,
Bontempi G., Learned lessons in credit card fraud detection from
a practitioner perspective, Expert systems with applications
41(10) (2014), 4915-4928.
[33] Mahmoudi N., Duman E., Detecting credit card fraud by Modified
Fisher Discriminant Analysis, Expert Systems with Applications
42(5) (2014), 2510-2516.
[34] Chan P.K., Fan W., Prodromidis A.L., Stolfo S.J., Distributed
data mining in credit card fraud detection, IEEE Intelligent
Systems and Their Applications 14(6) (1999), 67-74.
[35] Bolton R., Hand D., Unsupervised Profiling Methods for Fraud
Detection, Credit Scoring and Credit Control VII (2001).
[36] Brause R.W., Langsdorf T.S., Hepp H.M., Credit card fraud
detection by adaptive neural data mining, Internal Report 7/99 (J.
W. Goethe-University, Computer Science Department, Frankfurt,
Germany) (1999).
765
International Journal of Pure and Applied Mathematics Special Issue
[37] Gill K.M., Woolley, K.A., Gill M., Insurance fraud: The business
as a victim. In M. Gill (Ed.), Crime at work, Leicester: Perpetuity
Press (1994).
[38] Frieden J., Fraud Squads Target Suspect Claims, Business &
Health 9(4) (1991), 21-33.
[39] Guzzi R., Furious About Fraud, Best's Review-Life/Health
Insurance Edition (1989).
[40] Sahin Y., Bulkan S., Duman E., A cost-sensitive decision tree
approach for fraud detection, Expert Systems with Applications
40(15) (2013), 5916-5923.
[41] Vapnik V.N., Estimation of Dependences Based on Empirical
Data, Addendum 1, New York: Springer-Verlag (1982).
[42] Reilly D.L., Cooper L.N., Elbaum C., A neural model for category
learning, Biological Cybernetics 45(1) (1982), 35-41.
[43] Zareapoor M., Application of Credit Card Fraud Detection: Based
on Bagging classifier, Elsevier (2015).
766
767
768