Eti Report
Eti Report
REPORT
MICROPROJECT
On
“CREDIT CARD FRAUD DETECTION
USING DATA MINING”
SEMESTER 6TH
RESOURCES USED
1. Introduction .................................................................................................................. 1
7. Conclusion……………………………………………………………………………7
8. Reference…………………………………………………………………………7
1.Introduction:-
The first use of Data Mining comes from service providers in the mobile phone and
utilities industries. Mobile phone and utilities companies use Data Mining and Business Intelligence
to predict ‘churn’, the term they use for when a customer leaves their company to get their
phone/gas/broadband from another provider.They collate billing information, customer services
interactions, website visits and other metrics to give each customer a probability score, then target
offers and incentives to customers whom they perceive to be at a higher risk of churning. Retailers
segment customers into ‘Recency, Frequency, Monetary’ (RFM) groups and target marketing an
promotions to those different groups. A customer who spends little but often and last did so
recently will be handled differently to a customer who spent big but only once, and also some time
ago. The former may receive a loyalty, upsell and cross-sell offers, whereas the latter may be offered
a win-back deal, for instance.
2. Aim of Microproject :-
This Micro-Project aims at developing a case- study for “Credit card fraud detection” using
data mining.
The Credit Card Fraud Detection Project in Data Mining involves building a system that
accurately identifies and flags fraudulent credit card transactions. Credit card fraud is a common
problem in the financial industry, and it can result in significant financial losses for both the card
issuer and the cardholder.
The goal of this paper is to propose a model for detecting credit card fraud and making a false
alarm. For this research purpose, we have used four data mining techniques
Data mining for fraud detection is a technique used to find unusual behaviour in data. It can
help businesses identify fraudulent activities and protect their assets more effectively.
3. Types of Data Mining:-
A database is also called a database management system or DBMS. Every DBMS stores data that are related
to each other in a way or the other. It also has a set of software programs that are used to manage data and
provide easy access to it. These software programs serve a lot of purposes, including defining structure for
database, making sure that the stored information remains secured and consistent, and managing different
types of data access, such as shared, distributed, and concurrent A relational database has tables that have
different names, attributes, and can store rows or records of large data sets. Every record stored in a table has
a unique key. Entity-relationship model is created to provide a representation of a relational database that
features entities and the relationships that exist between them.
2. Data Warehouse:-
A data warehouse is a single data storage location that collects data from multiple sources and then
stores it in the form of a unified plan. When data is stored in a data warehouse, it undergoes
cleaning, integration, loading, and refreshing. Data stored in a data warehouse is organized in several
parts. If you want information on data that was stored 6 or 12 months back, you will get it in the
form of a summary.
3. Transactional data:-
Transactional database stores record that are captured as transactions. These transactions include
flight booking, customer purchase, click on a website, and others. Every transaction record has a
unique ID. It also lists all those items that made it a transaction.
We have a lot of other types of data as well that are known for their structure, semantic meanings,
and versatility. They are used in a lot of applications. Here are a few of those data types: data
streams, engineering design data, sequence data, graph data, spatial data, multimedia data, and more.
4. Data Mining techniques:-
1.Association:-
It is one of the most used data mining techniques out of all the others. In this technique, a transaction
and the relationship between its items are used to identify a pattern. This is the reason this technique
is also referred to as a relation technique. It is used to conduct market basket analysis, which is done
to find out all those products that customers buy together on a regular basis This technique is very
helpful for retailers who can use it to study the buying habits of different customers. Retailers can
study sales data of the past and then lookout for products that customers buy together. Then they can
put those products in close proximity of each other in their retail stores to help customers save their
time and to increase their sales.
2. Clustering:-
This technique creates meaningful object clusters that share the same characteristics. People often
confuse it with classification, but if they properly understand how both these techniques work, they
won’t have any issue. Unlike classification that puts objects into predefined classes, clustering puts
objects in classes that are defined by it.Let us take an example. A library is full of books on different
topics. Now the challenge is to organize those books in a way that readers don’t have any problem
in finding out books on a particular topic. We can use clustering to keep books with similarities in
one shelf and then give those shelves a meaningful name. Readers looking for books on a particular
topic can go straight to that shelf. They won’t be required to roam the entire library to find their
book.
3. Classification:-
This technique finds its origins in machine learning. It classifies items or variables in a data set into
predefined groups or classes. It uses linear programming, statistics, decision trees, and artificial
neural network in data mining, amongst other techniques. Classification is used to develop software
that can be modelled in a way that it becomes capable of classifying items in a data set into different
classes. For instance, we can use it to classify all the candidates who attended an interview into two
groups – the first group is the list of those candidates who were selected and the second is the list
that features candidates that were rejected. Data mining software can be used to perform this
classification job.
4. Prediction:-
This technique predicts the relationship that exists between independent and dependent variables as
well as independent variables alone. It can be used to predict future profit depending on the sale.Let
us assume that profit and sale are dependent and independent variables, respectively. Now, based on
what the past sales data says, we can make a profit prediction of the future using regression curve.
5. Sequential Patterns:-
This technique aims to use transaction data, and then identify similar trends, patterns, and events in
it over a period of time. The historical sales data can be used to discover items that buyers bought
together at different times of the year. Business can make sense of this information by
recommending customers to buy those products at times when the historical data doesn’t suggest
they would. Businesses can use lucrative deals and discounts to push through this recommendation.
5. Data Mining on Credit Card Fraud Detection:-
This system implements the supervised anomaly detection algorithm of Data mining to detect fraud
in a real time transaction on the internet, and thereby classifying the transaction as legitimate,
suspicious fraud and illegitimate transaction. The anomaly detection algorithm is designed on the
Neural Networks which implements the working principal of the human brain (as we humans learns
from past experience and then make our present day decisions on what we have learned from our
past experience).
The most cost effective approach for fraud detection is to “tease out possible evidences of fraud
from the available data using mathematical algorithms”. Data mining techniques, which make use of
advanced statis-tical methods, are divided in two main approaches: supervised and unsupervised
methods. Both of these approaches are based on training an algorithm with a record of observations
from the past. Supervised methods require that each115of those observations used for learning has a
label about which class it belongs to. In the context of fraud detection, this means that for each
observation we know if it belongs to the class “fraudulent” or to the class “legitimate”. Often we do
not know which class an observation belongs to. For example, take the case of an online order
whose payment was rejected. One will never know whether this was a legitimate order or whether it
had been correctly rejected. Such occurrences favour the use of unsupervised methods, which do not
require data to be labelled. These methods look120for extreme data occurrences or outliers. In order
to get the best of two worlds, some solutions combine supervised and unsupervised techniques. A
few authors have studied unsupervised methods for fraud detection, explored the use of graph
analysis for fraud detection in a telecommunications setting proposed a mixed approach with the use
of a selforganising map which feeds a Neural Network if a transaction does not fall into an
identified normal behaviour for the given cus125tomer. compared supervised and unsupervised
Neural Networks. According to their experiment the unsupervised method performed far below the
supervised one. Supervised methods have dominated the fraud detection literature. In general, the
emphasis of research in the late 90s and early 2000s was on Neural Networks. proposed the use of a
Neural Network for fraud detection at a commercial bank. studied the use of a profiling approach to
telecommunications fraud. discussed the combi-130nation of multiple classifiers in an attempt to
create scalable systems which would be able to deal with large volumes of data. More recently,
some other works have been published, making use of newer classification techniques. built a model
based on a Hidden Markov Model, with focus on fraud detection for creditcard issuing banks. also
worked on credit-card fraud detection with data from a bank, in particular addressing the way of
pre-processing the data. They studied the use of aggregation of transactions when using Random
Forests, Support Vector Machines, Lo-135gistic Regression and K- Nearest Neighbour techniques.
compared the performance of Random Forests, Support Vector Machines and Logistic.
Regression for detecting fraud of credit-card transactions in an international financial. The pinpoint
two criticisms to the data mining studies of fraud detection: the lack of publicly available data and
the lack of published literature on the topic. Most literature on credit-card fraud detection has
focused on classification140models with data from banks. Such data invariably consists of
transaction registries, where it is possible to find fraud evidence such as “collision” or “high
velocity” events, i.e. transactions happening at the same time in different locations.
Some authors have also addressed the techniques for finding the best derived features. proved that
transaction aggregation improved performance in some situations, with the aggregation period being
an important parameter. However, none of these particularities seems to apply to a case of detecting
fraud with data from one 145 single merchant as in our case. In this study, we chose to use methods
of supervised learning for the classification problem, because it is common for fraud detection
applications to have labelled data for training. We chose to test three different models. Logistic
regression because of its popularity, and Random Forests and Support Vector Machines, which have
been used in a variety of applications showing superior performance, showed that Support Vector
Machines perform well150in classification problems.
6. Skills Developed Out Of This Micro-project:-
c) We learned new computer science technologies like artificial intelligence, data mining,
Internet of things, data analytics and much more.
e) Work independently and as part of team –By working independently project completed
accurately.
f) Show a positive attitude- Positive attitude can developed by this course and micro-project.
g) Decisions making –By performing this project decision making ability is increased.
h) Leadership and management – this project had given ability for how to leading and
managing project.
7.Conclusion: -
Thus, we prepared a report on credit card fraud detection using data mining techniques with
implementation of emerging trends in computer technology.
Fraud detection is a set of processes and analyses that allow businesses to identify and prevent
unauthorized financial activity. This can include fraudulent credit card transactions, identify theft,
cyber hacking, insurance scams, and more.
Their conclusion of the project presents the best classifier by training and testing supervised
techniques in term of their work.
It involves using various techniques and technologies to identify potentially fraudulent transactions
in real-time or post-transaction analysis. The goal is to minimize financial losses for both
cardholders and card issuers by quickly identifying and stopping unauthorized or suspicious
transactions
8.Reference:-
www.iberdola.com
www.tutorialspoint.com
www.simplilearn.com
www.google.com