Fraud Detection in E-Commerce Using Machine Learning
Fraud Detection in E-Commerce Using Machine Learning
Abstract. A rise in transactions is being caused by an increase in online customers. We observe that the prevalence of
misrepresentation in online transactions is also increasing. Device learning will become more widely used to avoid
misrepresentation in online commerce. The goal of this investigation is to identify the best device learning calculation
using decision trees, naive Bayes, random forests, and neural networks. The realities to be utilized have not yet
been modified. Engineered minority over-testing stability information is made utilizing the strategy framework.
The precision of the brain not entirely settled by the disarray network appraisal is 96%, trailed by naive Bayes (95%),
random forest (95%), and decision tree (92%).
Keywords: AI, fraud identification, algorithms, matrix, web-based.
7
8 Samrat Ray
Preprocessing Data
Figure 2. Sales of e-commerce, statista.com [4]. New elements that will be employed in the AI compu-
tation cycle are subject to preprocessing, which removes,
modifies, scales, and standardizes them. Unreliable data
irregular woods, and brain network machine examinations are converted into reliable data through preprocessing. The
are used to determine the exactness, correctness, and con- highlights of the PCA preprocessing in this study include
sideration of F1 -rating, and G-mean. extraction, modification, normalization, and scaling.
In order to isolate highlights from information at a
MATERIALS AND METHODS high-layered scale, PCA is a direct modification that is
typically applied in information pressure. Furthermore,
Using computations from decision tree, naive Bayes, ran- PCA can reduce complex information to more modest
dom forest, and neural networks, this study investigates aspects to show obscure parts and improve the construc-
extortion and non-misrepresentation in online business tion of information. PCA computations include compu-
transactions. The cycle has ended, as seen in Figure 3. tations of covariance frameworks to limit decreases and
The dataset’s component determination process serves boost change.
as the starting point for the collection framework.
Decision Tree
Change, normalization, and scale of the characteristics are
employed to express the relationship so that they may be Decision trees are valuable for investigating extortion
used for the game plan once the SMOTE procedure has fin- information and finding secret connections between vari-
ished the depiction cycle. After that, there is no permanent ous likely factors and an objective variable. The decision
setup, which is accomplished by preprocessing data using tree [20] consolidates misrepresentation information inves-
principal component analysis (PCA). The importance of tigation and displaying, so it is generally excellent as the
destroyed is essential for balancing faulty data. most important phase in the displaying system in any
Fraud Detection in E-Commerce 9
TP + TN
Accuracy = (2)
TP + TN + FN + FP
TP
Recall = (3)
Figure 6. Architecture of neural network. TN + FP
TP
Precision = (4)
TP + FP
These predictions are as follows: √
G-Mean = TP − TN (5)
• Initialization count is zero, fitness is one, and there
are no cycles. 2 × Precision × Recall
• Early stages of population growth. Each consecutive F1-Score = (6)
Precision + Recall
gene sequence that makes up chromosome codes for
the input. RESULTS
• Suitable network architecture.
• Give weights.
Dataset
• Train your backpropagation skills. examinations of
fitness metrics and accumulated errors. then assessed This study utilizes a Kaggle-obtained online business fraud
according to the worth of fitness. If the current value dataset. The dataset has 151,112 records. Of these, 14,151
of fitness is greater than the prior value of fitness. records are classified as deceitful movement, and the
• Count = count +1. extent of false action information is 0.094. The extortion
• Selection: A roulette wheel mechanism is used to exchange dataset results in 152,122 full records, 14,152
choose the two mains. Crossover, mutation, and records classified as misrepresentation, and a misrepresen-
reproduction are examples of genetic operations that tation information fraction of 0.094, as shown in Figures 7
create new capabilities. and 8. SMOTE reduces class lopsidedness by blending
• Assuming the number of cycles rises to the count, information.
return to number 4. The image has been oversampled.
• Network guidance with picked attributes.
• Look at execution utilizing test results. Decision Trees
Data that have undergone preprocessing are prepared for
Confusion Matrix
the experimental phase using the decision tree model. Sub-
A technique that may be used to assess categorization sequent to preprocessing, the information will be oversam-
performance is the confusion matrix. A dataset with just pled before an order utilizing a decision tree is performed.
two different class categories is shown in Table 1 [20]. Moreover, the decision tree will likewise be performed
False Positive and False Negative count the number of using information that has not been oversampled. The
positively and negatively categorized objects, respectively, findings of these two experiments will be utilized to
Fraud Detection in E-Commerce 11
REFERENCES
[1] AsosiasiPenyelenggaraJasa Internet Indonesia, “Magazine
APJI(AsosiasiPenyelenggaraJasa Internet Indonesia)” (2019): 23
April 2018.
Figure 12. F1-score result. [2] AsosiasiPenyelenggaraJasa Internet Indonesia, “Mengawaliintegri-
tas era digital 2019 – Magazine APJI (AsosiasiPenyelenggaraJasa
Internet Indonesia)” (2019).
[3] Laudon, Kenneth C., and Carol GuercioTraver. E-commerce: busi-
ness, technology, society. 2016.
[4] statista.com. retail e-commerce revenue forecast from 2017 to 2023
(in billion U.S. dollars). (2018). Retrieved April 2018, from Indonesia:
https://www.statista.com/statistics/280925/e-commerce-revenuef
orecast-in-indonesia/.
[5] Kiziloglu, M. and Ray, S., 2021. Do we need a second engine for
Entrepreneurship? How well defined is intrapreneurship to handle
challenges during COVID-19?. In SHS Web of Conferences (Vol. 120).
EDP Sciences.
[6] Roy, Abhimanyu, et al. “Deep learning detecting fraud in credit card
transactions.” 2018 Systems and Information Engineering Design
Symposium (SIEDS). IEEE, 2018.
[7] Zhao, Jie, et al. “Extracting and reasoning about implicit behav-
ioral evidences for detecting fraudulent online transactions in
e-Commerce.” Decision support systems 86 (2016): 109–121.
Figure 13. G-mean result. [8] Zhao, Jie, et al. “Extracting and reasoning about implicit behav-
ioral evidences for detecting fraudulent online transactions in
e-Commerce.” Decision support systems 86 (2016): 109–121.
CONCLUSION AND FUTURE WORK [9] Pumsirirat, Apapan, and Liu Yan. “Credit card fraud detection
using deep learning based on auto-encoder and restricted boltzmann
machine.” International Journal of advanced computer science and
A hereditary calculation can be used to determine the applications 9.1 (2018): 18–25.
number of secret hubs and layers, as well as to select the [10] Srivastava, Abhinav, et al. “Credit card fraud detection using hidden
appropriate qualities for brain organizations. The review, Markov model.” IEEE Transactions on dependable and secure com-
F1-score, and G-mean qualities were expanded in the puting 5.1 (2008): 37–48.
analysis while utilizing the SMOTE approach. Memory [11] Lakshmi, S. V. S. S., and S. D. Kavilla. “Machine Learning For Credit
Card Fraud Detection System.” International Journal of Applied
utilizing brain networks rose from 52% to 74.6%, reviews
Engineering Research 13.24 (2018): 16819–16824.
utilizing gullible Bayes rose from 41.2% to 41.3%, reviews [12] Ray, S. and Leandre, D.Y., 2021. How Entrepreneurial University
utilizing arbitrary woodlands rose from 54% to 57%, and Model is changing the Indian COVID–19 Fight?. Entrepreneur’s
reviews utilizing choice trees rose from 57.7% to 62.3%. Guide, 14(3), pp. 153–162.
The value of the F1-score developer has increased for [13] Bouktif, Salah, et al. “Optimal deep learning lstm model for electric
all AI techniques, rising from 69.8% to 85.1% for neural load forecasting using feature selection and genetic algorithm: Com-
parison with machine learning approaches.” Energies 11.7 (2018):
networks, 67.9% to 94.5% for naive Bayes, 69.8% to 94.3%
1636.
for random forest, and 56.8% to 91.2% for decision trees. [14] Xuan, Shiyang, Guanjun Liu, and Zhenchuan Li. "Refined weighted
However, SMOTE increases the value. random forest and its application to credit card fraud detec-
In light of the discoveries of the previously mentioned tion." International Conference on Computational Social Networks.
try, it was resolved that SMOTE had the option to work Springer, Cham, 2018.
on the exhibition of brain organizations, arbitrary timber- [15] Samrat, R., 2021. Why Entrepreuneral University Fails to Solve
Poverty Eradication?. Herald Tuva State University. No. 1 Social and
lands, choice trees, and naive Bayes. Address the web-
Human Sciences, (1), pp. 35–43.
based business misrepresentation dataset’s lopsidedness [16] Zhao, Jie, et al. “Extracting and reasoning about implicit behavioral
by expanding G-mean and F-1 scores in contrast with for detecting fraudulent online transactions in e-Commerce.” Deci-
brain organizations, choice trees, irregular timberlands, sion support systems 86 (2016): 109–121.
14 Samrat Ray
[17] Sharma, Shiven, et al. “Synthetic oversampling with the majority [20] Harrison, Paula A., et al. “Selecting methods for ecosystem service :
class: A new perspective on handling extreme imbalance.” 2018 IEEE A decision tree approach.” Ecosystem services 29 (2018): 481–498.
International Conference on Data Mining (ICDM). IEEE, 2018. [21] Ray, S., 2021. Are Global Migrants At Risk? A Covid Referral Study
[18] Kim, Jaekwon, Youngshin Han, and Jongsik Lee. “Data imbalance of National Identity. In Transformation of identities: the experience
problem solving for smote based oversampling: Study on fault detec- of Europe and Russia (pp. 26–33).
tion prediction model in semiconductor manufacturing process.”
Advanced Science and Technology Letters 133 (2016): 79–84.
[19] Sadaghiyanfam, Safa, and Mehmet Kuntalp. “Comparing the Perfor-
mances of PCA (Principle Component Analysis) and LDA (Linear
Discriminant Analysis) Transformations on PAF (Paroxysmal Atrial
Fibrillation) Patient Detection.” Proceedings of the 2018 3rd Interna-
tional Conference on Biomedical Imaging, Signal Processing. ACM,
2018.