Credit Cards Fraud Detection by Negative Selection Algorithm On Hadoop (To Reduce The Training Time)
Credit Cards Fraud Detection by Negative Selection Algorithm On Hadoop (To Reduce The Training Time)
Credit Cards Fraud Detection by Negative Selection Algorithm On Hadoop (To Reduce The Training Time)
net/publication/260729504
CITATIONS READS
8 202
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Elham Hormozi on 02 August 2018.
Abstract—This paper proposed a model for credit card fraud algorithms have a long training time, the proposed model has
detection system, which is aimed to improve the current risk been implemented using Apache Hadoop and MapReduce
management by adding an Artificial Immune System’s algorithm paradigm. With cloud computing, the organizations can access
to fraud detection system. For achieving to this goal, we to a common data base of several types of frauds. Cloud
parallelize the negative selection algorithm on the cloud platform Computing providing computing power and it can process
such as apache hadoop and mapreduce. The algorithm execute
high volume of transactions. The remainder of this paper is
with three detectors set. The experiments show that by
implement our fraud detection system on the cloud, the training structured as follows: Section 2 describes fraud and fraud
time of algorithm in proportion to basic algorithm significantly detection. Section 3 provides an overview of Artificial
decreases. Immune System. Section 4 presents proposed model. Section 5
Keywords-credit card; fraud detection; artificial immune provides the results and the paper closes with a conclusion.
system; apache hadoop; mapreduce.
II. CREDIT CARD FRAUD DETECTION
I. INTRODUCTION
Credit card fraud is a main issue in the financial industry. It is
Cloud computing is an up-and-coming architecture with responsible for billions of dollars in losses per annum globally.
strengths and room for improvement. Cloud vendors use a It is well known that credit card as a method of payment
large number of identically configured, low-end servers to increases the quantum of spending [2]. So, credit card
scale the computing supply. Cloud systems can provide payment systems must be supported by efficient fraud
much of today’s commerce depends on plastic. Also, People detection capability for minimizing unwanted activities by
use their credit cards and debit cards to purchase products, get adversaries. Credit card fraud detection has drawn very large
cash, and pay bills. Because of quick development in the interest from the research community and a number of
electronic commerce technology, the use of credit cards has methods have been offered to counter fraud in this field [3].
dramatically increased. Credit and debit cards are issued by Fraud detection is interesting for financial institutions. The
banks. Fraud detection methods come into action when appearance of new technologies as telephone, internet,
security approaches fail to stop it [1]. automated teller machines (ATMs) and credit card systems
We have addressed credit card fraud detection in this paper. have amplified the amount of fraud loss for many banks.
Fraud detection is the act of recognizing illegal activity and Analyzing whether each transaction is legitimate or not is very
stopping it as soon as possible, before the transaction is expensive. Applying whether a transaction was done by a
accomplished. So, the real time fraud detection system is client or a fraudster by phoning all card holders is cost
necessary for financial institutes [1, 2]. Therefore, the paper prohibitive if we check them in all transactions [4]. Before the
discusses the use of AIS on one aspect of security transaction is accomplished, fraud detection should recognize
management, viz. the detection of credit card fraud and AIS fraud cases and stopping them as soon as possible.
has similarities to fraud detection system. This paper
implement a fraud detection system based on AIS and
Negative Selection Algorithm has selected. But due to the AIS
41
Detector Function). So, Affinity iss calculated using Figure 5show that,, the long time of running serial
Euclidean distance. For matchinng detectors with algorithm is abbout 23,820s, while in the
strings the NSA needs a threshold. In training phase, parallelized NSAA on the hadoop and running
if the distance between detector andd records was less with several of Mapper
M that reduced to 78s.
than threshold, the detector has been removed
because it detects a self recordd, otherwise, the
detector detects a nonself recordd and the system
keeps that for fraud detection.
After that, the detector set is geneerated and NSA is
used to ensure that no detector matches any self
pattern. So, reducer shuffle and sorrts mapper output.
At least, in testing phase, new dataa from system can
be matched against these detectorss to detect frauds.
The testing phase is done serial. Buut in testing phase
if the distance was less than thresshold, the detector
detects a nonself record.
V. EXPERIMENTS
a. Data set Figure 5. Time with
w Threshold= 1.5
We have obtained our database from a large Brazilian
bank, with registers within time window w between
Jul/14/2004 through Sep/12/2004. The dataset
d consists of - Threshold=2
300,000 records. Each register represeents a credit card In the Figure 6 ass you can see, time reduce to
authorization, with only approved transsactions excluding 3,084 in compaarison to the time of based
the denied transactions. All data fields are considered in algorithm that is 80,760.
8 So whatever
% of dataset for
numerical form. Totally, we consider %70
train and %30 for test.
b. Time (second)
We executed the NSA on the cloud witth three thresholds
amounts. By the running NSA on the clloud platform with
the several of Mapper, time significanntly reduce. Also,
when the threshold goes to higher value, the time of
training phase increases too.
- Threshold=1
As you see in Figure 6, the timee of training phase
in based algorithm is about 10,,620s, while in the
parallelized NSA on the haddoop and running
with several of Mapper reducedd to 74s. Figure 6. Time with
w Threshold= 2
CONC
CLUSION
The purpose of this paper is i decrease the time of fraud
detection system for credit caard fraud. For achieving to this
aim, we implement one of the AIS algorithms on the Hadoop
by MapReduce programmingg model. Because of the AIS
algorithms have the long timee training time, we parallelized
NSA on the cloud with severaal of Mapper. Two classes that
called Map function and Reduuce function added to the NSA.
The result shown that the tim
me of training time dramatically
decreases. That means, our creedit card fraud detection system
can detect frauds quickly. Ass you seen in the results, with
increasing of threshold amounnt, the time of algorithm rising
too.
Figure 4. Time with Threshold= 1
- Threshold=1.5
42
ACKNOWLEDGMENT
The authors would like to thank all those who contributed to
this paper. Further to this, we gratefully acknowledge those in
the cloud computing team at the Department of Computer
engineering and Information Technology, Amirkabir
University, IRAN and Mazandaran University of Science and
Technology, Babol, IRAN.
REFERENCES
[1] A. Srivastava, A. Kundu, and et al, "Credit Card Fraud Detection Using
Hidden Markov Model", IEEE Transactions on dependable and secure
computing, Vol. 5, No. 1, pp. 37-48, 2008.
[2] A. Richard, “Credit Cards as Spending Facilitating Stimuli: A
Conditioning. Interpretation,” J. Consumer Research, vol. 13, no. 3, pp.
348-356, 1986.
[3] S. Ghosh and D.L. Reilly, “Credit Card Fraud Detection with a Neural-
Network,” Proc. Int’l Conf. System Science, pp. 621-630, 1994.
[4] M. Gadi, X. Wang, A. Lago, “Credit Card Fraud Detection with
Artificial Immune System”, Springer, 2008 K. Elissa.
[5] L. de Castro and J. Timmis. Artificial Immune Systems: A New
Computational Approach. Springer-Verlag, London. UK., September
2002.
[6] L. N. de Castro and F. J. Von Zuben. (1999) Artificial Immune Systems:
Part I—Basic Theory and Applications. FEEC/Univ. Campinas
Brazil. [Online]. Available: http://www.dca.fee.unicamp.br/~lnunes/immune.h
tml.
[7] S. Forrest, A. Perelson, L. Allen, and R. Cherukuri. Self-nonself
discrimination in a computer. In Proc. of the IEEE Symposium on
Security and Privacy, pages 202–209, IEEE Computer Society, 1994.
[8] S. Forrest, S. Hofmeyr, A. Somayaji, and T. Longstaff. A sense of self
for unix processes. In Proc. of the IEEE Symposium on Research in
Security and Privacy, pages 120–128. IEEE Computer Society Press,
1996.
[9] Kim J, Bentley P (2001), Evaluating Negative Selection in an AIS for
Network Intrusion Detection, Genetic and Evolutionary Computation
Conference 2001, 1330-1337.
[10] Apache Hadoop project: http://hadoop.apache.org.
[11] Colin White, “MapReduce and the Data Scientist,” BI Research January
2012.
[12] Dean, J. and Ghemawat, S.: MapReduce: Simplified data processing on
large clusters. In Proceedings of the 6th Conference on Symposium on
Opearting Systems Design & Implementation, USENIX Association,
Volume 6, pp. 10-10, December 2004.
[13] M. Miller, Cloud Computing: Web-Based Applications That Change the
Way You Work and Collaborate Online: Que, Aug. 2008.
43