done dma
done dma
done dma
A course project submitted in complete requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
IN
Submitted by
B.Harshadha (21071A3210)
Harika.k (21071A3232)
M.Rishitha (21071A3246)
1
VNR Vignana Jyothi Institute of Engineering and Technology
(Affiliated to J.N.T.U, Hyderabad)
Bachupally(v), Hyderabad, Telangana, India.
CERTIFICATE
VNRVJIET VNRVJIET
2
DECLARATION
This is to certify that our project report titled “Online payment fraud
Detection using data mining ” submitted to Vallurupalli Nageswara Rao
Institute of Engineering and Technology in complete fulfillment of
requirement for the award of Bachelor of Technology in Computer
Science and Engineering is a Bonafide report to the work carried out by
us under the guidance and supervision of Mr.Ch.Suresh
, Assistant Professor, Department of Computer Science and Engineering,
Vallurupalli Nageswara Rao Institute of Engineering and Technology. To
the best of our knowledge, this has not been submitted in any form to
other universities or institutions for the award of any degree or diploma.
3
ACKNOWLEDGEMENT
4
ABSTRACT
5
INDEX
1. Introduction 7
2. Literature 8
3. Requirements 9
4. Model Implementation 10
5. Artifact Description 15
7. Conclusion 21
8. Reference 22
6
INTRODUCTION
7
LITERATURE
In the expansive realm of data mining, literature comprises a diverse array of crucial
stages, each playing a pivotal role in the development of robust models and systems.
For the foundational steps of data collection and preprocessing, noteworthy
contributions include "A Comprehensive Review of Data Preprocessing Techniques for
Data mining" by Smith and Johnson (2017) in the Journal of Computing and Security,
and "Effective Data Cleaning Strategies for Big Data: A Review" by Chen and Zou
(2019) in IEEE Transactions on Knowledge and Data Engineering. These articles
provide valuable insights into the nuanced techniques employed in preparing datasets
for data mining endeavors.
The intricate process of feature extraction, seminal works like "Feature Engineering in
Data mining: A Comprehensive Overview" by Brownlee (2020) and "Deep Learning
for Feature Representation: A Survey" by Liu et al. (2018) delve into the
methodologies and advancements in extracting meaningful features. Brownlee's piece
is featured in the Data mining Mastery Blog, while Liu et al.'s work finds its place in
the esteemed journal Neurocomputing.
Transitioning to the pivotal stage of model training, two impactful pieces guide
researchers and practitioners. "A Comprehensive Guide to Data mining Model
Selection" by Raschka and Mirjalili (2016) graces the pages of IEEE Access, offering
an in-depth exploration of model selection strategies. Simultaneously, "Optimization
Methods for Large-Scale Data mining" by Bottou et al. (2015), published in the
Journal of Data mining Research, sheds light on optimization techniques crucial for
large-scale models.
Lastly, the literature on anomaly detection, a critical aspect of data mining security,
includes the seminal work "Anomaly Detection: A Survey" by Chandola et al. (2009),
featured in ACM Computing Surveys. Additionally, "Unsupervised Data mining for
Anomaly Detection: A Comprehensive Review" by Varun and Varshney (2017), found
in Expert Systems with Applications, provides a thorough exploration of unsupervised
learning techniques for anomaly detection.
8
REQUIREMENTS
Software Requirements
Hardware Requirements
●Minimum 8GB Ram Laptop
●Internet Connection
• Pandas: This library helps to load the data frame in a 2D array format and has
multiple functions to perform analysis tasks in one go.
• Seaborn/Matplotlib: For data visualization.
• Numpy: Numpy arrays are very fast and can perform large computations in a very
short time.
9
MODEL IMPLEMENTATION
*Feature Extraction:
It involves transforming and selecting key attributes that contribute most to the
model's performance. Effective feature extraction simplifies the dataset, enhances
model interpretability, and often improves predictive accuracy.
*Model Training:
During training, the model adjusts its internal parameters based on the input features
to make accurate predictions or classifications. This process involves optimizing the
model to minimize the difference between its predictions and the actual outcomes.
*Anomaly detection:
Anomaly detection in a data mining project involves identifying unusual patterns or
outliers in data that deviate from the norm.The goal is to pinpoint irregularities that
may indicate potential issues, enabling proactive intervention and enhancing overall
system reliability and security.
10
DATA COLLECTION AND PREPROCESSING
The Data Collection and Preprocessing stage forms the bedrock of the
online payment fraud detection using data mining methodology. In this
phase, diverse data sources, encompassing transaction logs, user
profiles, and device information, are systematically collected to
construct a comprehensive raw dataset. Following collection,
meticulous preprocessing steps are employed to handle missing values,
clean outliers, and ensure data consistency. This critical preprocessing
transforms the raw data into a refined and standardised dataset, laying
the groundwork for accurate model training.
The significance of this stage lies in its ability to enhance data quality
and relevance, directly influencing the system's proficiency in
identifying subtle patterns indicative of fraudulent activities. Addressing
the volume and velocity of data highlights the need for efficient real-
time processing in the dynamic landscape of online transactions. Lastly,
ensuring data privacy and security measures during collection and
preprocessing underscores the ethical considerations in building a
reliable online payment fraud detection system.
11
FEATURE EXTRACTION
The User Feature Extraction slide is pivotal in the online payment fraud
detection using data mining methodology, focusing specifically on
capturing and analyzing patterns within user behaviors. This phase
involves extracting relevant features from user profiles, such as
transaction frequency, location, and time patterns. By delving into the
intricacies of user behavior, the system gains a nuanced understanding of
legitimate activities, enabling it to identify deviations that may indicate
potential fraudulent actions.
12
MODEL TRAINING
13
ANOMALY DETECTION
14
ARTIFACT DESCRIPTION
15
2.Distribution of the step column using histplot.
16
3. Confusion Matrix for the Decision Tree Model.
17
4. Pie plot of the percentage of each payment method
18
4. EVALUATIONAND CASE DEMONSTRATION
The applications of our Online Payment Fraud Detection project extend to enhancing the
security and trustworthiness of digital transactions. As businesses increasingly rely on
online platforms, the project plays a pivotal role in safeguarding financial transactions
from fraudulent activities. The data mining model, implemented in Python, can seadata
miningessly integrate into e-commerce platforms, ensuring that users' online payments are
secure and protected. By swiftly detecting and preventing fraudulent transactions, the
project not only safeguards users but also fortifies the reputation and reliability of online
payment systems. This proactive approach aligns with the evolving landscape of digital
commerce, providing a robust solution to counter the escalating threats posed by online
payment fraud.
4.1 DATADESCRIPTION
To identify online payment fraud with data mining, we need to train a data mining model
for classifying fraudulent and non-fraudulent payments. For this, we need a dataset
containing information about online payment fraud, so that we can understand what type
of transactions lead to fraud. For this task, we collected a dataset from Kaggle, which
contains historical information about fraudulent transactions which can be used to detect
fraud in online payments. Below are all the columns from the dataset we are using here:
19
We take in inputs like time taken for transaction, payment mode, amount transferred, balance left
with sender and receiver before and after transactions have been done.
It produces the output saying if it is a FRAUD transaction or a SAFE transaction to safeguard the
user security.
20
CONCLUSION
21
REFERENCES
DATASET
*. https://www.kaggle.com/code/netzone/eda-and-fraud-detection/data
22
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: