Credit Cards Fraud Detection by Negative Selection Algorithm On Hadoop (To Reduce The Training Time)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/260729504

Credit cards fraud detection by negative selection algorithm on hadoop (To


reduce the training time)

Conference Paper · May 2013


DOI: 10.1109/IKT.2013.6620035

CITATIONS READS

8 202

4 authors, including:

Hadi Hormozi Elham Hormozi


The University of Queensland Queensland University of Technology
5 PUBLICATIONS   48 CITATIONS    9 PUBLICATIONS   54 CITATIONS   

SEE PROFILE SEE PROFILE

Morteza Sargolzaei Javan


Amirkabir University of Technology
13 PUBLICATIONS   98 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Energy Efficiency in Cloud Data Centres View project

Big Data View project

All content following this page was uploaded by Elham Hormozi on 02 August 2018.

The user has requested enhancement of the downloaded file.


2013 5th Conference on Information and Knowledge Technology (IKT)

Credit Cards Fraud Detection by Negative Selection


Algorithm on Hadoop
(To Reduce the Training Time)

Hadi Hormozi Elham Hormozi


Computer Engineering and Information Computer Engineering and Information Technology
Technology Arak University, Mazandaran University of Science and Technology,
Arak, IRAN Babol, IRAN
h.hormozi@qazd.ir e.hormozi@ustmb.ac.ir

Mohammad Kazem Akbari Morteza Sargolzaei Javan


Computer Engineering and Information Computer Engineering and Information
Technology Amirkabir University of Technology Technology Amirkabir University of Technology
(Tehran Polytechnic) Tehran, IRAN (Tehran Polytechnic) Tehran, IRAN
akbarif@aut.ac.ir msjavan@aut.ac.ir

Abstract—This paper proposed a model for credit card fraud algorithms have a long training time, the proposed model has
detection system, which is aimed to improve the current risk been implemented using Apache Hadoop and MapReduce
management by adding an Artificial Immune System’s algorithm paradigm. With cloud computing, the organizations can access
to fraud detection system. For achieving to this goal, we to a common data base of several types of frauds. Cloud
parallelize the negative selection algorithm on the cloud platform Computing providing computing power and it can process
such as apache hadoop and mapreduce. The algorithm execute
high volume of transactions. The remainder of this paper is
with three detectors set. The experiments show that by
implement our fraud detection system on the cloud, the training structured as follows: Section 2 describes fraud and fraud
time of algorithm in proportion to basic algorithm significantly detection. Section 3 provides an overview of Artificial
decreases. Immune System. Section 4 presents proposed model. Section 5
Keywords-credit card; fraud detection; artificial immune provides the results and the paper closes with a conclusion.
system; apache hadoop; mapreduce.
II. CREDIT CARD FRAUD DETECTION
I. INTRODUCTION
Credit card fraud is a main issue in the financial industry. It is
Cloud computing is an up-and-coming architecture with responsible for billions of dollars in losses per annum globally.
strengths and room for improvement. Cloud vendors use a It is well known that credit card as a method of payment
large number of identically configured, low-end servers to increases the quantum of spending [2]. So, credit card
scale the computing supply. Cloud systems can provide payment systems must be supported by efficient fraud
much of today’s commerce depends on plastic. Also, People detection capability for minimizing unwanted activities by
use their credit cards and debit cards to purchase products, get adversaries. Credit card fraud detection has drawn very large
cash, and pay bills. Because of quick development in the interest from the research community and a number of
electronic commerce technology, the use of credit cards has methods have been offered to counter fraud in this field [3].
dramatically increased. Credit and debit cards are issued by Fraud detection is interesting for financial institutions. The
banks. Fraud detection methods come into action when appearance of new technologies as telephone, internet,
security approaches fail to stop it [1]. automated teller machines (ATMs) and credit card systems
We have addressed credit card fraud detection in this paper. have amplified the amount of fraud loss for many banks.
Fraud detection is the act of recognizing illegal activity and Analyzing whether each transaction is legitimate or not is very
stopping it as soon as possible, before the transaction is expensive. Applying whether a transaction was done by a
accomplished. So, the real time fraud detection system is client or a fraudster by phoning all card holders is cost
necessary for financial institutes [1, 2]. Therefore, the paper prohibitive if we check them in all transactions [4]. Before the
discusses the use of AIS on one aspect of security transaction is accomplished, fraud detection should recognize
management, viz. the detection of credit card fraud and AIS fraud cases and stopping them as soon as possible.
has similarities to fraud detection system. This paper
implement a fraud detection system based on AIS and
Negative Selection Algorithm has selected. But due to the AIS

978-1-4673-6490-4/13/$31.00 ©2013 IEEE 40


III. ARTIFICIAL IMMUNE SYSTEM
Artificial Immune Systems (AIS) [5] are algorithms and
systems that use the human immune system as inspiration. The
algorithms typically operate the immune system's
characteristics of learning and memory to solve a problem.
The biological immune system is a highly parallel, distributed,
and adaptive system. It uses learning, memory, and associative
retrieval to solve recognition and classification tasks. In Figure 2. Detection of nonself patterns by Detector set [9]
particular, it learns to recognize relevant patterns, remember
patterns that have been seen previously, and use combinatorics IV. METODOLOGY
to construct pattern detectors efficiently [6]. These remarkable One of the big problems for running AIS algorithms is having
information processing abilities of the immune system provide long time of training phase for generating detectors (Low
important aspects in the field of computation. speed of training phase) [8]. So, for resolving this issue we
The immune system (IS) is a complex of cells, molecules and implement our work on Hadoop Platform. The Apache
organs that represent an identification mechanism capable of Hadoop software library is a framework that allows for the
perceiving and struggling dysfunction from our self cells and distributed processing of big data sets across clusters of
the action of exogenous nonself cells. The focal root of computers using MapReduce programming model. Rather
immunology was self-nonself discrimination through the than rely on hardware to deliver high-availability, the library
principles of negative selection and clonal expansion. The first itself is designed to detect and handle failures at the
example of implemented AIS performing a useful application layer, so delivering a highly-available service on
computational task was an incarnation of a self-nonself top of a cluster of computers, each of which may be prone to
discrimination system, used for the detection of computer failures.
virus executables [7]. The self-nonself discrimination system The Apache Hadoop project [10] develops open-source
involved creating a behavior profile of sequences of system software for scalable, reliable, distributed computing. Also,
calls on a computer network during a period of normal Hadoop Map/Reduce is a programming model for easily
function. To help in detecting malicious intruders, any writing applications which process large amounts of data in-
subsequent sequences were matched against the normal parallel on large clusters of commodity hardware in a reliable,
profile, and any deviations reported as a possible intrusion [8]. fault-tolerant way [11]. A Map/Reduce job splits the input
Forrest et al in [7] proposed a NSA for several anomaly data-set into independent chunks which are processed by
detection problems. This algorithm defines ‘self’ by building the map tasks in a completely parallel method. The framework
the normal behavior patterns of a monitored system. It sorts the outputs of the maps, which are then input to
generates a number of random patterns that are compared to the reduce tasks. Usually, both the input and the output of the
each self pattern defined. If any randomly generated pattern job are stored in a file-system. The framework takes care of
matches a self pattern, this pattern fails to become a detector scheduling tasks, monitoring them and re-executes the failed
and thus it is removed (Figure 1). So, it becomes a ‘detector’ tasks [12, 13]. Typically the compute nodes and the storage
pattern and monitors subsequent profiled patterns of the nodes are the same, that is, the Map/Reduce framework and
monitored system. Consequently, if a ‘detector’ pattern the Hadoop Distributed File System (HDFS) are running on
matches any newly profiled pattern, it is then considered that the same set of nodes (Figure 3).
new anomaly must have occurred in the monitored system
(Figure 2).

Figure 1. Detector Set Generation of NSA [9]


Figure 3. Apache Hadoop [11]

a. Training and Testing Phase


In training phase, we try to normalize input data and
prepare the algorithm. Afterwards, generate normal
detectors randomly for each mapper (by the Random

41
Detector Function). So, Affinity iss calculated using Figure 5show that,, the long time of running serial
Euclidean distance. For matchinng detectors with algorithm is abbout 23,820s, while in the
strings the NSA needs a threshold. In training phase, parallelized NSAA on the hadoop and running
if the distance between detector andd records was less with several of Mapper
M that reduced to 78s.
than threshold, the detector has been removed
because it detects a self recordd, otherwise, the
detector detects a nonself recordd and the system
keeps that for fraud detection.
After that, the detector set is geneerated and NSA is
used to ensure that no detector matches any self
pattern. So, reducer shuffle and sorrts mapper output.
At least, in testing phase, new dataa from system can
be matched against these detectorss to detect frauds.
The testing phase is done serial. Buut in testing phase
if the distance was less than thresshold, the detector
detects a nonself record.

V. EXPERIMENTS
a. Data set Figure 5. Time with
w Threshold= 1.5
We have obtained our database from a large Brazilian
bank, with registers within time window w between
Jul/14/2004 through Sep/12/2004. The dataset
d consists of - Threshold=2
300,000 records. Each register represeents a credit card In the Figure 6 ass you can see, time reduce to
authorization, with only approved transsactions excluding 3,084 in compaarison to the time of based
the denied transactions. All data fields are considered in algorithm that is 80,760.
8 So whatever
% of dataset for
numerical form. Totally, we consider %70
train and %30 for test.

b. Time (second)
We executed the NSA on the cloud witth three thresholds
amounts. By the running NSA on the clloud platform with
the several of Mapper, time significanntly reduce. Also,
when the threshold goes to higher value, the time of
training phase increases too.

- Threshold=1
As you see in Figure 6, the timee of training phase
in based algorithm is about 10,,620s, while in the
parallelized NSA on the haddoop and running
with several of Mapper reducedd to 74s. Figure 6. Time with
w Threshold= 2

CONC
CLUSION
The purpose of this paper is i decrease the time of fraud
detection system for credit caard fraud. For achieving to this
aim, we implement one of the AIS algorithms on the Hadoop
by MapReduce programmingg model. Because of the AIS
algorithms have the long timee training time, we parallelized
NSA on the cloud with severaal of Mapper. Two classes that
called Map function and Reduuce function added to the NSA.
The result shown that the tim
me of training time dramatically
decreases. That means, our creedit card fraud detection system
can detect frauds quickly. Ass you seen in the results, with
increasing of threshold amounnt, the time of algorithm rising
too.
Figure 4. Time with Threshold= 1

- Threshold=1.5

42
ACKNOWLEDGMENT
The authors would like to thank all those who contributed to
this paper. Further to this, we gratefully acknowledge those in
the cloud computing team at the Department of Computer
engineering and Information Technology, Amirkabir
University, IRAN and Mazandaran University of Science and
Technology, Babol, IRAN.
REFERENCES

[1] A. Srivastava, A. Kundu, and et al, "Credit Card Fraud Detection Using
Hidden Markov Model", IEEE Transactions on dependable and secure
computing, Vol. 5, No. 1, pp. 37-48, 2008.
[2] A. Richard, “Credit Cards as Spending Facilitating Stimuli: A
Conditioning. Interpretation,” J. Consumer Research, vol. 13, no. 3, pp.
348-356, 1986.
[3] S. Ghosh and D.L. Reilly, “Credit Card Fraud Detection with a Neural-
Network,” Proc. Int’l Conf. System Science, pp. 621-630, 1994.
[4] M. Gadi, X. Wang, A. Lago, “Credit Card Fraud Detection with
Artificial Immune System”, Springer, 2008 K. Elissa.
[5] L. de Castro and J. Timmis. Artificial Immune Systems: A New
Computational Approach. Springer-Verlag, London. UK., September
2002.
[6] L. N. de Castro and F. J. Von Zuben. (1999) Artificial Immune Systems:
Part I—Basic Theory and Applications. FEEC/Univ. Campinas
Brazil. [Online]. Available: http://www.dca.fee.unicamp.br/~lnunes/immune.h
tml.
[7] S. Forrest, A. Perelson, L. Allen, and R. Cherukuri. Self-nonself
discrimination in a computer. In Proc. of the IEEE Symposium on
Security and Privacy, pages 202–209, IEEE Computer Society, 1994.
[8] S. Forrest, S. Hofmeyr, A. Somayaji, and T. Longstaff. A sense of self
for unix processes. In Proc. of the IEEE Symposium on Research in
Security and Privacy, pages 120–128. IEEE Computer Society Press,
1996.
[9] Kim J, Bentley P (2001), Evaluating Negative Selection in an AIS for
Network Intrusion Detection, Genetic and Evolutionary Computation
Conference 2001, 1330-1337.
[10] Apache Hadoop project: http://hadoop.apache.org.
[11] Colin White, “MapReduce and the Data Scientist,” BI Research January
2012.
[12] Dean, J. and Ghemawat, S.: MapReduce: Simplified data processing on
large clusters. In Proceedings of the 6th Conference on Symposium on
Opearting Systems Design & Implementation, USENIX Association,
Volume 6, pp. 10-10, December 2004.
[13] M. Miller, Cloud Computing: Web-Based Applications That Change the
Way You Work and Collaborate Online: Que, Aug. 2008.

43

View publication stats

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy