0% found this document useful (0 votes)
102 views

Fraud Detection in E-Commerce Using Machine Learning

A rise in transactions is being caused by an increase in online customers. We observe that the prevalence of misrepresentation in online transactions is also increasing. Device learning will become more widely used to avoid misrepresentation in online commerce. The goal of this investigation is to identify the best device learning calculation using decision trees, naive Bayes, random forests, and neural networks. The realities to be utilized have not yet been modified. Engineered minority over-t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

Fraud Detection in E-Commerce Using Machine Learning

A rise in transactions is being caused by an increase in online customers. We observe that the prevalence of misrepresentation in online transactions is also increasing. Device learning will become more widely used to avoid misrepresentation in online commerce. The goal of this investigation is to identify the best device learning calculation using decision trees, naive Bayes, random forests, and neural networks. The realities to be utilized have not yet been modified. Engineered minority over-t
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

BOHR International Journal of Advances in Management Research

2022, Vol. 1, No. 1, pp. 7–14


https://doi.org/10.54646/bijamr.002
www.bohrpub.com

Fraud Detection in E-Commerce Using Machine Learning


Samrat Ray

ISMS Sankalp Business School, Pune, India


E-mail: samratray@rocketmail.com

Abstract. A rise in transactions is being caused by an increase in online customers. We observe that the prevalence of
misrepresentation in online transactions is also increasing. Device learning will become more widely used to avoid
misrepresentation in online commerce. The goal of this investigation is to identify the best device learning calculation
using decision trees, naive Bayes, random forests, and neural networks. The realities to be utilized have not yet
been modified. Engineered minority over-testing stability information is made utilizing the strategy framework.
The precision of the brain not entirely settled by the disarray network appraisal is 96%, trailed by naive Bayes (95%),
random forest (95%), and decision tree (92%).
Keywords: AI, fraud identification, algorithms, matrix, web-based.

INTRODUCTION The technology for detecting credit card fraud has


advanced quickly, moving from machine learning to deep
According to research on web clients in Indonesia pub- learning [6]. But regrettably, the amount of research on
lished in the October 2019 issue of Free Marketeers Maga- e-commerce fraud detection is still tiny, and it is only now
zine, the country’s 132 million web clients in 2019 alone focused on identifying the traits or qualities [7] that will
represented an increase from the 142.3 million clients be used to identify whether an e-commerce transaction is
depicted in Figure 1 from the previous year. There were fraudulent or not.
far too many people using the web-based system and The datasets used in this study had a combined 140,130
conducting web-based transactions during COVID-19, but insights, 11,150 data points, and a 0.093 rate for extortion
where there are inventions, there are also many problems. measures. Datasets with very small proportions produce
There are numerous methods for growing an e-commerce lopsided information. When compared to minority data,
business [1, 3]. irregularity data produces more accurate results that are
Based on information from many datasets, it is predicted more heavily weighted toward bigger portions of insights.
that by 2022, the amount of retail online business transac- The categorization of mainly non-extortion as opposed
tions in Indonesia will expand from its current position to to misrepresentation produced more remarkable findings
134.6% of US$ 15.3 million, or almost 217 trillion. Rapid from the dataset studied. Using the destroyed (synthetic
technical advancements that make it easier for customers minority oversampling) strategy to adapt to data irregu-
to shop are supporting this growth. larities worsens the class outcomes [8, 9].
Numerous e-commerce transactions present a variety of This study aims to identify the most effective model for
challenges and new problems, particularly the e-commerce identifying deception in an online transaction. Extraction
fraud shown in Figure 2. The number of Internet business- is included in recent research on where to find fraud in e-
related scams has also continuously climbed since around commerce [10, 11]. This paper concentrates on fraud detec-
1993. According to a 2013 survey, 5.65 pennies out of every tion in e-commerce. It concentrates on the use of datasets
$100 in web-based business exchanges’ total turnover were from Kaggle, upgrade grouping AI, the use of SMOTE,
overstated. More than 70 trillion dollars will have been and SMOTE utilization taking care of unbalanced records.
stolen by 2019 [4, 5]. Fraud identification is one method to After the use of SMOTE, the dataset will be trained on the
cut down on misrepresentation in online transactions. use of contraption dominating. Decision trees, naive Bayes,

7
8 Samrat Ray

Figure 1. Growth of internet users [2].

Figure 3. Research steps.

Since misrepresentation situations are typically about


2%, the SMOTE technique is useful for reducing the greater
portion of the class in the dataset and addressing infor-
mation discomfort issues. The implications of the SMOTE
dataset exchange misrepresentation cycle will be altered if
the bigger part class causes the grouping to be more coor-
dinated to the larger part class such that the predictions of
the order are not accurate [12, 15].
In the characterization cycle, AI utilized a decision tree,
irregular woodland, counterfeit brain organization, and
credulous Bayes. The web-based firm uses these AI calcu-
lations to take into account and then locate the exchange
dataset’s greatest accuracy outcomes.

Preprocessing Data
Figure 2. Sales of e-commerce, statista.com [4]. New elements that will be employed in the AI compu-
tation cycle are subject to preprocessing, which removes,
modifies, scales, and standardizes them. Unreliable data
irregular woods, and brain network machine examinations are converted into reliable data through preprocessing. The
are used to determine the exactness, correctness, and con- highlights of the PCA preprocessing in this study include
sideration of F1 -rating, and G-mean. extraction, modification, normalization, and scaling.
In order to isolate highlights from information at a
MATERIALS AND METHODS high-layered scale, PCA is a direct modification that is
typically applied in information pressure. Furthermore,
Using computations from decision tree, naive Bayes, ran- PCA can reduce complex information to more modest
dom forest, and neural networks, this study investigates aspects to show obscure parts and improve the construc-
extortion and non-misrepresentation in online business tion of information. PCA computations include compu-
transactions. The cycle has ended, as seen in Figure 3. tations of covariance frameworks to limit decreases and
The dataset’s component determination process serves boost change.
as the starting point for the collection framework.
Decision Tree
Change, normalization, and scale of the characteristics are
employed to express the relationship so that they may be Decision trees are valuable for investigating extortion
used for the game plan once the SMOTE procedure has fin- information and finding secret connections between vari-
ished the depiction cycle. After that, there is no permanent ous likely factors and an objective variable. The decision
setup, which is accomplished by preprocessing data using tree [20] consolidates misrepresentation information inves-
principal component analysis (PCA). The importance of tigation and displaying, so it is generally excellent as the
destroyed is essential for balancing faulty data. most important phase in the displaying system in any
Fraud Detection in E-Commerce 9

Figure 4. Architecture of decision trees.

Figure 5. Architecture of random forest.


event, when utilized as the last model of a few different
procedures [16, 18].
Decision trees are excellent for ordering computations P(A|B): speculation possibility given conditions
and are a type of controlled learning calculation. The (returned opportunity)
decision tree organizes the dataset into a few increasing P (A): probability of the hypothesis (prior possibility)
segments in line with choice principles by emphasizing the P(B|A): Probability—taking into account the specu-
connection between information and result credits. lative conditions
P(B): Possibility A
• Root node: This addresses the whole population or
test, and this is additionally separated into at least The aforementioned equation can be used to access both
two. fraudulent and lawful transactions.
• Parting: This is the most common way of separating
a hub into two or, on the other hand, more sub-hubs. Random Forest
• When a sub-center point splits into a few smaller sub-
center points, the decision node is activated. When a lot of data is required, the random forest (RF)
• Leaf/Terminal node: Unspecified center points are algorithm is used. The classification and regression tree
called leaf or terminal center points. (truck) system has evolved into RF by including the boot-
• Pruning: When a decision’s sub-center point is strap hoarding (firing) method and unexpected element
removed. determination architecture. In Figure 5, the RF is displayed.
• Branch/Sub-Tree: Subdivisions of all trees are called A model called a “random forest” is made up of all intel-
branches or sub-trees. ligent group action fraud trees. The maximum depth call
• Parent and child node: A center point that is divided trees in the e-commerce fraud detection system depends
into sub-centers [19]. on RF and employs a random vector distribution that is
the same across all trees. The decision tree produces the
As shown in Figure 4, the fraud detection employs a top categories, and they are used to select the classification
decision tree with a root hub, inner hub, and leaf hub. method’s category.

Naive Bayes Neural Network


Naive Bayes predicts open doors because of experi- A neural network system with nodes connected, such as
ence [23]. It involves the estimation equation as beneath. the architectural neural network seen in Figure 6, is applied
in the human body as part of the algorithm neural network
P(B|A) ∗ P(A) artificial intelligence technique.
P(A|B) = (1) Before preparing, there were 11 information layers. After
P(B)
preprocessing, there were 17 information layers. The secret
Where layer was decided on the neural network by hereditary
calculations on the secret layer notwithstanding the num-
B: cope with the statistics with obscure training ber of info layers [18]. This forecasting procedure uses the
A: specific splendor is the statistical hypothesis GA-NN [19] algorithm, which is as follows:
10 Samrat Ray

Table 1. Confusion matrix.


Class Predictive Positive Predictive Negative
Actual Positive TP TN
Actual Negative FP FN

whereas True Positive and True Negative count the


number of positively and negatively classed objects,
respectively (FN).
The most popular metric for assessing classification abil-
ities is accuracy, but if you operate in an unequal setting,
this assessment is flawed since the minority class will only
make up a very small portion of the accuracy metric.
The F-1 score, G-mean, and recall evaluation criteria are
advised. The G-mean list is utilized to quantify by and
large execution (in general arrangement execution), though
the F-1 score is utilized to evaluate how minority classes
are ordered in imbalanced classes.
Recall, precision, F-1 score, and G-mean categorization
ability were examined in this study.

TP + TN
Accuracy = (2)
TP + TN + FN + FP
TP
Recall = (3)
Figure 6. Architecture of neural network. TN + FP
TP
Precision = (4)
TP + FP
These predictions are as follows: √
G-Mean = TP − TN (5)
• Initialization count is zero, fitness is one, and there
are no cycles. 2 × Precision × Recall
• Early stages of population growth. Each consecutive F1-Score = (6)
Precision + Recall
gene sequence that makes up chromosome codes for
the input. RESULTS
• Suitable network architecture.
• Give weights.
Dataset
• Train your backpropagation skills. examinations of
fitness metrics and accumulated errors. then assessed This study utilizes a Kaggle-obtained online business fraud
according to the worth of fitness. If the current value dataset. The dataset has 151,112 records. Of these, 14,151
of fitness is greater than the prior value of fitness. records are classified as deceitful movement, and the
• Count = count +1. extent of false action information is 0.094. The extortion
• Selection: A roulette wheel mechanism is used to exchange dataset results in 152,122 full records, 14,152
choose the two mains. Crossover, mutation, and records classified as misrepresentation, and a misrepresen-
reproduction are examples of genetic operations that tation information fraction of 0.094, as shown in Figures 7
create new capabilities. and 8. SMOTE reduces class lopsidedness by blending
• Assuming the number of cycles rises to the count, information.
return to number 4. The image has been oversampled.
• Network guidance with picked attributes.
• Look at execution utilizing test results. Decision Trees
Data that have undergone preprocessing are prepared for
Confusion Matrix
the experimental phase using the decision tree model. Sub-
A technique that may be used to assess categorization sequent to preprocessing, the information will be oversam-
performance is the confusion matrix. A dataset with just pled before an order utilizing a decision tree is performed.
two different class categories is shown in Table 1 [20]. Moreover, the decision tree will likewise be performed
False Positive and False Negative count the number of using information that has not been oversampled. The
positively and negatively categorized objects, respectively, findings of these two experiments will be utilized to
Fraud Detection in E-Commerce 11

Table 4. Confusion matrix Naïve Bayes without SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 40764 229
Actual Negative 1993 2348

Table 5. Confusion matrix Naïve Bayes with SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 40760 233
Actual Negative 1988 2353

Table 6. Confusion matrix random forest without SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 40881 112
Figure 7. Ratio fraud. Actual Negative 1954 2387

information that has not, as well as naive Bayes arrange-


ment will be finished utilizing the two sorts of information.
Through a side-by-side comparison of naive Bayes and the
oversampling approach, the findings of these two research
methods will be utilized to demonstrate the grouping
outcomes.
Without SMOTE generation, naive Bayes recall is 52.1%,
precision is 90.2%, F1-score is 67.9%, and G-mean is 72.3%.
Accuracy is 95%. Table 4 displays the conclusions from the
confusion matrix naive Bayes without SMOTE.
Figure 8. Ratio fraud after over sampling.
Simple Bayes using SMOTE output recall is 53.1%, pre-
cision is 93.8%, F1-score is 95.4%, and G-mean is 72.2%.
Table 2. Confusion matrix decision tree without SMOTE.
Accuracy is 95%. Results from the confusion matrix naive
Class Predictive Positive Predictive Negative
Bayes with SMOTE are shown in Table 5.
Actual Positive 38782 38782
Actual Negative 1746 2595
Random Forest
Table 3. Confusion matrix decision tree with SMOTE. The Random Forest model trial procedure is carried out
Class Predictive Positive Predictive Negative by preparing data that has already been processed dur-
Actual Positive 38651 2342 ing the pretreatment step, the Random Forest model trial
Actual Negative 1724 2617 procedure is carried out. In the wake of preprocessing,
the information will be exposed to arrangement over-
sampling utilizing random forest. Both oversampled and
analyze decision trees and demonstrate the classification non-oversampled data will be used in the random forest
outcomes utilizing the SMOTE oversampling technique. process. Utilizing the SMOTE oversampling approach and
The decision-making process without SMOTE precision the random forest comparison, the classification findings
is 53.2%, F1-score is 56.8%, accuracy is 90%, recall is 57.7%, from these two studies will be shown.
and G-mean is 76.3%. Results from the confusion matrix The random forest result is 54%, precision is 93.3%, F1-
decision tree without SMOTE are shown in Table 2. score is 62.7%, and G-mean is 73.1% without SMOTE gen-
Decision tree that produces SMOTE recall is 61.4%, pre- eration. Accuracy is 95%. The results of a confusion matrix
cision is 90.5%, F1-score is 90.2%, and G-mean is 72.2%. random forest without SMOTE are shown in Table 6.
Accuracy is 90%. Results from the confusion matrix deci- Precision is 80%, F1-score is 94.3%, SMOTE result is
sion tree with SMOTE are shown in Table 3. 58.1%, and G-mean is 75.7%. These results were generated
via random forest. Accuracy is 95%. The results of the con-
Naive Bayes fusion matrix random forest utilizing SMOTE are shown in
Table 7.
Getting ready information that has recently been handled
during preprocessing is the manner in which the naive
Neural Network
Bayes model test is done. Following preprocessing, the
information will be oversampled utilizing the two sorts of Data that have previously undergone preprocessing
information: information that has been oversampled and are prepared for searching using the neural network
12 Samrat Ray

Table 7. Confusion matrix random forest with SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 40383 610
Actual Negative 1820 2521

Table 8. Confusion matrix neural network without SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 41113 24
Actual Negative 1932 2265

Table 9. Confusion matrix neural network with SMOTE.


Class Predictive Positive Predictive Negative
Actual Positive 38566 2539
Actual Negative 9585 31487 Figure 10. Recall result.

Figure 9. Accuracy result.

model. Following preprocessing, classification oversam- Figure 11. Precision result.


pling using a neural network and random forest will be
performed on the data. Neural networks will be used
with oversampled data, while random forests will be trees, random forests, naive Bayes, and brain networks,
used with undersampled data. The findings of these two review values increase more quickly. The neural network
experiments will demonstrate how classification outcomes computation and the SMOTE provided the biggest rise in
were attained utilizing neural network comparison and review values.
the synthetic minority oversampling technique (SMOTE) As displayed in Figure 11, results from tests utilizing
oversampling approach. different calculations show that accuracy values decline
Neural network creation without SMOTE precision is while AI calculations and the SMOTE are utilized rather
96.1%, F1-score is 95.1%, accuracy is 96%, recall is 56%, and than just the commonly used algorithms, which we men-
G-mean is 74.5%. Results from a confusion matrix neural tion in the methodology, with the most noteworthy decline
network without SMOTE are shown in Table 8. happening when neural network calculations and SMOTE
The neural network that generates the SMOTE result are utilized.
has a 76.7% SMOTE, 92.5% precision, 85.1% F1-score, and As can be shown in Figure 12 from experiments using
82.4% G-mean. The accuracy is 85%. Table 9 displays many algorithms, integrating machine-learning algorithms
findings from the disorder framework brain network using with the SMOTE results in higher F1-score values than just
SMOTE. utilizing algorithms alone. The categorization of minor-
The accuracy numbers from experiments employing ity classes into imbalanced classes is evaluated using the
various methods are displayed in Figure 9. The neural F1-score.
network algorithm’s best accuracy rating is 96%. Rather than using just the G-mean calculation to evalu-
Review values are created by tests utilizing different ate in general execution (by and large order execution), the
calculations, as displayed in Figure 10. When AI computa- G-mean value rose while utilizing AI calculation values as
tions and the SMOTE are utilized in place of only decision displayed in Figure 13.
Fraud Detection in E-Commerce 13

and naive Bayes. This shows the viability of the SMOTE


approach in raising the classification of imbalanced infor-
mation execution.
Future research is anticipated to enable the use of addi-
tional computations or in-depth learning for the location of
online business deception as well as other investigation to
increase the accuracy of the brain network employing the
SMOTE approach.

REFERENCES
[1] AsosiasiPenyelenggaraJasa Internet Indonesia, “Magazine
APJI(AsosiasiPenyelenggaraJasa Internet Indonesia)” (2019): 23
April 2018.
Figure 12. F1-score result. [2] AsosiasiPenyelenggaraJasa Internet Indonesia, “Mengawaliintegri-
tas era digital 2019 – Magazine APJI (AsosiasiPenyelenggaraJasa
Internet Indonesia)” (2019).
[3] Laudon, Kenneth C., and Carol GuercioTraver. E-commerce: busi-
ness, technology, society. 2016.
[4] statista.com. retail e-commerce revenue forecast from 2017 to 2023
(in billion U.S. dollars). (2018). Retrieved April 2018, from Indonesia:
https://www.statista.com/statistics/280925/e-commerce-revenuef
orecast-in-indonesia/.
[5] Kiziloglu, M. and Ray, S., 2021. Do we need a second engine for
Entrepreneurship? How well defined is intrapreneurship to handle
challenges during COVID-19?. In SHS Web of Conferences (Vol. 120).
EDP Sciences.
[6] Roy, Abhimanyu, et al. “Deep learning detecting fraud in credit card
transactions.” 2018 Systems and Information Engineering Design
Symposium (SIEDS). IEEE, 2018.
[7] Zhao, Jie, et al. “Extracting and reasoning about implicit behav-
ioral evidences for detecting fraudulent online transactions in
e-Commerce.” Decision support systems 86 (2016): 109–121.
Figure 13. G-mean result. [8] Zhao, Jie, et al. “Extracting and reasoning about implicit behav-
ioral evidences for detecting fraudulent online transactions in
e-Commerce.” Decision support systems 86 (2016): 109–121.
CONCLUSION AND FUTURE WORK [9] Pumsirirat, Apapan, and Liu Yan. “Credit card fraud detection
using deep learning based on auto-encoder and restricted boltzmann
machine.” International Journal of advanced computer science and
A hereditary calculation can be used to determine the applications 9.1 (2018): 18–25.
number of secret hubs and layers, as well as to select the [10] Srivastava, Abhinav, et al. “Credit card fraud detection using hidden
appropriate qualities for brain organizations. The review, Markov model.” IEEE Transactions on dependable and secure com-
F1-score, and G-mean qualities were expanded in the puting 5.1 (2008): 37–48.
analysis while utilizing the SMOTE approach. Memory [11] Lakshmi, S. V. S. S., and S. D. Kavilla. “Machine Learning For Credit
Card Fraud Detection System.” International Journal of Applied
utilizing brain networks rose from 52% to 74.6%, reviews
Engineering Research 13.24 (2018): 16819–16824.
utilizing gullible Bayes rose from 41.2% to 41.3%, reviews [12] Ray, S. and Leandre, D.Y., 2021. How Entrepreneurial University
utilizing arbitrary woodlands rose from 54% to 57%, and Model is changing the Indian COVID–19 Fight?. Entrepreneur’s
reviews utilizing choice trees rose from 57.7% to 62.3%. Guide, 14(3), pp. 153–162.
The value of the F1-score developer has increased for [13] Bouktif, Salah, et al. “Optimal deep learning lstm model for electric
all AI techniques, rising from 69.8% to 85.1% for neural load forecasting using feature selection and genetic algorithm: Com-
parison with machine learning approaches.” Energies 11.7 (2018):
networks, 67.9% to 94.5% for naive Bayes, 69.8% to 94.3%
1636.
for random forest, and 56.8% to 91.2% for decision trees. [14] Xuan, Shiyang, Guanjun Liu, and Zhenchuan Li. "Refined weighted
However, SMOTE increases the value. random forest and its application to credit card fraud detec-
In light of the discoveries of the previously mentioned tion." International Conference on Computational Social Networks.
try, it was resolved that SMOTE had the option to work Springer, Cham, 2018.
on the exhibition of brain organizations, arbitrary timber- [15] Samrat, R., 2021. Why Entrepreuneral University Fails to Solve
Poverty Eradication?. Herald Tuva State University. No. 1 Social and
lands, choice trees, and naive Bayes. Address the web-
Human Sciences, (1), pp. 35–43.
based business misrepresentation dataset’s lopsidedness [16] Zhao, Jie, et al. “Extracting and reasoning about implicit behavioral
by expanding G-mean and F-1 scores in contrast with for detecting fraudulent online transactions in e-Commerce.” Deci-
brain organizations, choice trees, irregular timberlands, sion support systems 86 (2016): 109–121.
14 Samrat Ray

[17] Sharma, Shiven, et al. “Synthetic oversampling with the majority [20] Harrison, Paula A., et al. “Selecting methods for ecosystem service :
class: A new perspective on handling extreme imbalance.” 2018 IEEE A decision tree approach.” Ecosystem services 29 (2018): 481–498.
International Conference on Data Mining (ICDM). IEEE, 2018. [21] Ray, S., 2021. Are Global Migrants At Risk? A Covid Referral Study
[18] Kim, Jaekwon, Youngshin Han, and Jongsik Lee. “Data imbalance of National Identity. In Transformation of identities: the experience
problem solving for smote based oversampling: Study on fault detec- of Europe and Russia (pp. 26–33).
tion prediction model in semiconductor manufacturing process.”
Advanced Science and Technology Letters 133 (2016): 79–84.
[19] Sadaghiyanfam, Safa, and Mehmet Kuntalp. “Comparing the Perfor-
mances of PCA (Principle Component Analysis) and LDA (Linear
Discriminant Analysis) Transformations on PAF (Paroxysmal Atrial
Fibrillation) Patient Detection.” Proceedings of the 2018 3rd Interna-
tional Conference on Biomedical Imaging, Signal Processing. ACM,
2018.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy