0% found this document useful (0 votes)
48 views2 pages

Online Payment Fraud Detection Using Machine Learning

This document discusses using machine learning models to detect online payment fraud. It loads and explores transaction data, engineers features, and trains and evaluates several classifiers, including logistic regression, XGBoost, SVC, and random forest. It finds that XGBoost achieves the best performance at detecting fraud with an AUC score of around 85% on both training and test data. A confusion matrix for XGBoost also shows its predictions.

Uploaded by

Dev Ranjan Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views2 pages

Online Payment Fraud Detection Using Machine Learning

This document discusses using machine learning models to detect online payment fraud. It loads and explores transaction data, engineers features, and trains and evaluates several classifiers, including logistic regression, XGBoost, SVC, and random forest. It finds that XGBoost achieves the best performance at detecting fraud with an AUC score of around 85% on both training and test data. A confusion matrix for XGBoost also shows its predictions.

Uploaded by

Dev Ranjan Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Online Payment Fraud Detection Using Machine

Learning

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data = pd.read_csv('new_data.csv')
data.head()
data.info()

data.describe()

obj = (data.dtypes == 'object')


object_cols = list(obj[obj].index)
print("Categorical variables:", len(object_cols))

int_ = (data.dtypes == 'int')


num_cols = list(int_[int_].index)
print("Integer variables:", len(num_cols))

fl = (data.dtypes == 'float')
fl_cols = list(fl[fl].index)
print("Float variables:", len(fl_cols))

sns.countplot(x='type', data=data)

sns.barplot(x='type', y='amount', data=data)

data['isFraud'].value_counts()

plt.figure(figsize=(15, 6))
sns.distplot(data['step'], bins=50)
plt.figure(figsize=(12, 6))
sns.heatmap(data.corr(),
cmap='BrBG',
fmt='.2f',
linewidths=2,
annot=True)

type_new = pd.get_dummies(data['type'], drop_first=True)

data_new = pd.concat([data, type_new], axis=1)

data_new.head()

X = data_new.drop(['isFraud', 'type', 'nameOrig', 'nameDest'], axis=1)


y = data_new['isFraud']

X.shape, y.shape

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42)
from xgboost import XGBClassifier
from sklearn.metrics import roc_auc_score as ras
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
models = [LogisticRegression(), XGBClassifier(),
SVC(kernel='rbf', probability=True),
RandomForestClassifier(n_estimators=7,
criterion='entropy',
random_state=7)]

for i in range(len(models)):
models[i].fit(X_train, y_train)
print(f'{models[i]} : ')

train_preds = models[i].predict_proba(X_train)[:, 1]
print('Training Accuracy : ', ras(y_train, train_preds))

y_preds = models[i].predict_proba(X_test)[:, 1]
print('Validation Accuracy : ', ras(y_test, y_preds))
print()
from sklearn.metrics import plot_confusion_matrix

plot_confusion_matrix(models[1], X_test, y_test)


plt.show()

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy