Fraud Detection in Mobile Payment Systems Using An Xgboost Based Framework
Fraud Detection in Mobile Payment Systems Using An Xgboost Based Framework
Fraud Detection in Mobile Payment Systems Using An Xgboost Based Framework
https://doi.org/10.1007/s10796-022-10346-6
Abstract
Mobile payment systems are becoming more popular due to the increase in the number of smartphones, which, in turn,
attracts the interest of fraudsters. Extant research has therefore developed various fraud detection methods using supervised
machine learning. However, sufficient labeled data are rarely available and their detection performance is negatively affected
by the extreme class imbalance in financial fraud data. The purpose of this study is to propose an XGBoost-based fraud detec-
tion framework while considering the financial consequences of fraud detection systems. The framework was empirically
validated on a large dataset of more than 6 million mobile transactions. To demonstrate the effectiveness of the proposed
framework, we conducted a comparative evaluation of existing machine learning methods designed for modeling imbalanced
data and outlier detection. The results suggest that in terms of standard classification measures, the proposed semi-supervised
ensemble model integrating multiple unsupervised outlier detection algorithms and an XGBoost classifier achieves the best
results, while the highest cost savings can be achieved by combining random under-sampling and XGBoost methods. This
study has therefore financial implications for organizations to make appropriate decisions regarding the implementation of
effective fraud detection systems.
Keywords Mobile payment · Fraud detection · Machine learning · Imbalanced data · Outlier detection
13
Vol.:(0123456789)
1986 Information Systems Frontiers (2023) 25:1985–2003
to increase customer trust and security, as reported in exist- methods. Nevertheless, a comprehensive evaluation of
ing mobile payment acceptance models (Chin et al., 2022; machine learning methods is not yet available in the litera-
Jia et al., 2022; Kar, 2021; Pal et al., 2021). ture. Moreover, little is known about how the two approaches
The increasingly growing use of mobile payments has can be integrated to improve the detection performance. To
boosted the chances of criminals committing mobile phone overcome the above problems, here we propose to enhance
fraud in an illegal effort to circumvent security measures the performance of eXtreme Gradient boosting (XGBoost),
of mobile payment services. There is consequently a lot of a state-of-the-art machine learning method, by including a
pressure to investigate potential security threats that may data sampling component addressing the issue of extreme
be exploited, with the ultimate aim of preventing fraud on class imbalance of mobile payment transactions.
a mobile payment service and developing countermeasures In many financial applications it is necessary to filter out
against attacks (Chen et al., 2021; Lopez-Rojas et al., 2016; unusual observations to ensure the reliability of the system
Rieke et al., 2013). Early detection of fraudulent transactions and prevent attempts to maliciously use it. This is particu-
is a key task in this effort. Recent developments in mobile larly useful for detecting financial fraud attempts, as their
payment services have therefore heightened the need for behaviour patterns differ significantly from normal financial
automated detection systems that enable immediate detec- transactions (Bernard et al., 2021). Outlier detection meth-
tion and prevention of fraudulent transactions. ods are capable of processing all available data in real time
The main challenges currently facing researchers involved to uncover patterns that evade traditional supervised learning
in detecting fraud in mobile payment transactions include: methods. By doing so, organised crime groups can be iden-
(1) extreme class imbalance (only a small proportion of cus- tified with higher accuracy and less false positives. Outlier
tomers have fraudulent intentions); (2) changing patterns of detection methods have indeed proved effective for detect-
fraud over time (fraudsters are always looking for new ways ing credit card fraud detection (Carcillo et al., 2021), online
to bypass systems and commit crimes); and (3) inadequate banking fraud detection (Carminati et al., 2015), and health
selection of performance metrics. The consequence of the insurance fraud detection (Yamanishi et al., 2004). Overall,
first challenge is a poor user experience for legitimate cus- however, there has been limited use of these methods to
tomers, as the detection of fraudsters usually also implies detect financial fraud, although some review studies sug-
rejecting some legitimate mobile payment transactions. The gest that they deserve more attention because the detection
second challenge usually leads to a decrease in the perfor- performance of supervised algorithms is negatively affected
mance and efficiency of the detection model. Therefore, by the inherently heavily imbalanced class distribution of
machine learning models must be constantly updated, oth- financial fraud data (Ngai et al., 2011). The scarce use of
erwise they will not meet their objectives. Regarding the last outlier detection methods can be attributed to the difficulty
challenge, in some cases the providers of mobile payment of detecting fraudulent behaviour (e.g., abnormal frequency
systems should prefer a higher false positive rate in exchange of transactions or spending behaviour) when overlapping
for a lower false negative rate and vice versa. But how to with legitimate behaviour in datasets contaminated with
choose the right ratio between these two errors remains a outliers and noise. Moreover, several other challenges have
challenging area in the field of fraud detection in mobile been identified that make it the difficult to detect outliers in
payment transactions. the financial domain. First, efficient general purpose outlier
A relatively high detection accuracy was reported in ear- detection methods are lacking because an outlier detection
lier research by using both traditional supervised learning method in one fraud domain may not be appropriate for other
methods (Choi & Lee, 2017, 2018) and deep learning-based scenarios, as legitimate and fraudulent behaviour is differ-
methods (Mubalaike & Adali, 2018; Xenopoulos, 2017). ent from domain to domain (Ahmed et al., 2016). Second,
However, a major problem with this kind of application is unsupervised learning is preferred as sufficient labelled data
the extreme class imbalance of transactions, with a consider- for building models are rarely available. Third, legitimate
able dominance of legitimate transactions in the data. This behaviour may change over time, and fraudsters try to make
in turn leads to a poor classification performance on the their activities look legitimate. To take advantages of both
minority class of fraudulent transactions. To address this supervised machine learning and outlier detection methods,
issue, two approaches have been utilized. The first approach for the first time, we propose a semi-supervised ensemble
relies on under-sampling methods used to generate a bal- fraud detection model combining unsupervised outlier
anced dataset (Pambudi et al., 2019). The main limitation of detection and supervised XGBoost methods that exploit all
this approach is the loss of potentially important information transactions contained in a large, highly imbalanced mobile
stored in discarded legitimate transactions, which can reduce payment transaction dataset.
detection accuracy. Alternatively, an attempt has been made Finally, financial implications of fraud detection methods
to isolate fraudulent transactions in an unsupervised fash- in mobile payment transactions have also been neglected
ion (Buschjäger et al., 2021), inspired by outlier detection in earlier research. Therefore, our third contribution is to
13
Information Systems Frontiers (2023) 25:1985–2003 1987
propose a novel performance measure of cost savings that broadly categorized according to the financial fraud type as
takes into account the financial implications of false positive follows (Onwubiko, 2020): (1) account takeover fraud, (2)
and false negative rates of fraud detection systems. Using the payment fraud, and (3) application fraud. Onwubiko (2020)
PaySim dataset, our findings provide evidence for the effec- also identified four main fraud channels, namely physical,
tiveness of both XGBoost leveraged by an under-sampling web, telephony, and mobile. Frauds in mobile payment
class-balancing procedure and extreme gradient boosting transactions have increasingly been recognized as a major
outlier detection (XGBOD), thus providing important tools concern in finance due to recent developments in mobile
to support operation and management of mobile payment payment services (Chen & Sivakumar, 2021). Therefore,
services. security requirements must be met to address security issues
In summary, the contributions of this study are threefold: related to mobile payment transactions, such as mobile mal-
ware and SMS-based attacks (Kang, 2018). Heterogeneous
1. Developing a novel fraud detection framework for software and hardware mobile platforms make the security
mobile payment systems by integrating the XGBoost problems more challenging (Li & Clark, 2013).
method with class-balancing adjustments and unsuper- Regarding the data used in previous studies and summa-
vised outlier detection methods, making it suitable for rized in Table 1, the lack of real-world datasets has been
detecting fraud in a typical class-imbalanced mobile identified as a major problem in the application domain.
payment scenario. Therefore, most earlier research tended to generate simu-
2. Proposing a novel cost savings measure to evaluate the lated synthetic data based on features captured from real-
performance of mobile payment fraud detection sys- world fraud and legitimate transactions. To do so, Rieke
tems. Unlike the traditional performance measures, the et al. (2013) extracted payment laundering patterns from
proposed measure considers both the cost savings from real-world events. However, the number of instances was
the correct detection of fraudulent transactions and the insufficient for efficient fraud detection, as indicated by rela-
decrease in the margin for the transactions incorrectly tively low false negative (legitimate) rates in early studies
identified as fraudulent. (Coppolino et al., 2015; Rieke et al., 2013). Considerable
3. Using the benchmark PaySim dataset of more than 6 progress has been made by introducing the PaySim financial
million mobile payment transactions, we demonstrate simulator (Lopez-Rojas et al., 2016, 2018) that resembles
that the proposed fraud detection framework not only normal mobile transactions and injects fraudulent behav-
outperforms state-of-the-art fraud detection methods in iour to produce a larger number of financial frauds. Agent-
terms of detection accuracy but also generates substan- based simulations and statistical analysis confirmed that
tial financial savings to the providers of mobile payment the simulated data are as prudent as the original aggregated
systems. anonymized real data, thus, representing an optimal control
environment for fraud detection in mobile payment trans-
The remainder of this paper is organized as follows. Sec- actions. By leveraging the PaySim data, Lopez-Rojas and
tion 2 reviews the related work on fraud detection in mobile Barneaud (2019) demonstrated their advantages over the
payment transactions with respect to data sources, methods relatively small real-world dataset. In addition, the simulated
used and performance achieved in earlier studies. Section 3 data retained the transactions and causal dynamics of the
outlines the proposed fraud detection framework. Section 4 original data. It should be however noted that by preserv-
provides the results of the evaluation on the PaySim data- ing the statistical properties of the real-world data, the high
set, robustness check, and financial implications. Section 5 class imbalance in favour of legitimate transactions is also
concludes with providing some possible directions for future maintained in the simulated dataset.
research. Traditional machine learning methods with supervised or
unsupervised learning are not effective in handling extreme
class imbalance in the data. Although a relatively high over-
2 Fraud Detection in Mobile Payment all accuracy was reported in several studies, these methods
Systems – Literature Review performed well only in terms of majority (legitimate) class
accuracy (Choi & Lee, 2017, 2018; Du et al., 2018; Zhou
A considerable amount of literature has been published on et al., 2018). This holds also for more recent deep learning
financial fraud detection, see West and Bhattacharya (2016) models, such as deep belief networks (Xenopoulos, 2017)
for a review and Hajek and Henriques (2017) for a com- and restricted Boltzman machines (Mubalaike & Adali,
prehensive evaluation of financial fraud detection methods. 2018). To overcome this major limitation, class imbalance
Risk factors of financial fraud were investigated, indicating was first approached by using under-sampling methods and
that pressure / incentive to commit fraud is the most impor- then machine learning methods were trained on the balanced
tant risk factor (Huang et al., 2017). Related studies can be dataset (Pambudi et al., 2019). Similarly, Xenopoulos (2017)
13
1988 Information Systems Frontiers (2023) 25:1985–2003
Rieke et al. (2013) synthetic logs (20/5,297) predictive security analyser FNR=0.550
Coppolino et al. (2015) synthetic logs Dempster-Shafer theory FNR=0.240
Xenopoulos (2017) PaySim (492/284,315) ensemble of deep belief networks Acc=89.05, AUC=0.961
Choi and Lee (2017; 2018) Korean payment data (2,402/274,670) unsupervised (EM, K-means, Acc=99.97
FarthestFirst, X-means, MakeDen-
sity), supervised (NB, SVM, LR,
OneR, C4.5, RF)
Mubalaike and Adali (2018) PaySim (8,213/6M) restricted Boltzman machines Acc=91.53
Du et al. (2018) PaySim (8,213/6M) SVM with LogDet regularization Acc=97.57, AUC=0.978
Zhou et al. (2018) Chinese bankcard enrolment (5,753/∼ GB DT, LR, RF, rule-based expert Precision=50.83, Recall=0.25
52M)
Pambudi et al. (2019) PaySim (4,093/246,033) RUS+SVM F1=0.900, AUC=0.880
Misra et al. (2020) PaySim (492/284,315) Autoencoder+MLP Acc=0.999, F1=0.827
Mendelson and Lerner (2020) PaySim (8,213/6M) cluster drift detection AUC=0.898
Turner et al. (2021) Bitcoin blockchain transactions DeepWalk network analysis −
Schlör et al. (2021) PaySim (8,213/6M) deep MLP with ReLU and iNALU F1=0.880, AUC=0.960
Buschjager et al. (2021) PaySim (269/572K) generalized Isolation Forest AUC=0.821
This study PaySim (8,213/6M) RUS+XGBoost, XGBOD
Legend: Acc – accuracy, AUC– area under ROC curve, DT – decision tree, EM – expectation-maximization, F1 – F1-score (average of precision
and recall), FNR – false negative (legitimate) rate, GB – gradient boosting, LR – logistic regression, MLP – multilayer perceptron, NB – Naïve
Bayes, PNN – probabilistic neural network, RF – random forest, SVM – support vector machine, XGBOD – extreme gradient boosting outlier
detection, and XGBoost – eXtreme Gradient boosting
used under-sampling to produce balanced bootstraps for the problem of extremely imbalanced classes in mobile payment
ensemble learning, and Misra et al. (2020) and Schlör et al. transaction data. We will demonstrate that this approach is not only
(2021) applied it to generate balanced training data for deep more accurate than supervised machine learning and outlier detec-
learning-based detection models. The main drawback of the tion methods used in existing studies but that our approach is also
under-sampling approach is that potentially useful instances more profitable in terms of the proposed cost savings measure.
are often excluded from the training data, which can sig-
nificantly degrade the detection accuracy. Alternatively, 3.1 Proposed Fraud Detection Models
isolation-based approaches were used to approximate the
data distribution and build a generative model using mixture This section outlines two fraud detection models proposed in
components. This outlier detection method was successfully this study. First, the eXtreme Gradient boosting (XGBoost)
applied to fraud detection by Buschjäger et al. (2021). method, augmented with random under-sampling, is intro-
However, a comprehensive evaluation of state-of-the-art duced to leverage both the supervised learning capability and
machine learning-based approaches exploiting under-sam- robustness of XGBoost, a state-of-the-art machine learning
pling methods for handling class imbalance problem, is lack- method, and the data sampling component to overcome the
ing in the literature. Hybrid semi-supervised methods taking class imbalance problem inherent in mobile payment trans-
advantage of supervised learning and unsupervised outlier action data. The second model exploits the extreme gradi-
detection methods have also been overlooked. Finally, only ent boosting outlier detection (XGBOD) method, a semi-
standard performance measures have been used to evaluate supervised algorithm that improves the performance of the
fraud detection performance in mobile payment systems, XGBoost method on highly imbalanced mobile payment
thus neglecting the financial implications of fraud detection. transaction data by introducing outlier scores obtained from
multiple unsupervised outlier detection methods.
13
Information Systems Frontiers (2023) 25:1985–2003 1989
Fig. 2. The RUS component is first used to generate bal- effective and enables sampling heterogeneous data (Haixi-
anced training samples, and XGBoost then generates addi- ang et al., 2017).
tive models to produce the final prediction on whether the
mobile payment transaction is fraudulent or not. Extreme Gradient Boosting XGBoost is a computationally
efficient and scalable implementation of gradient boosted
Under‑Sampling for Handling Class Imbalance Problem The decision trees that build additive models in a stepwise fash-
extremely high imbalance between legitimate and fraud ion. The overall error is minimized incrementally by intro-
classes makes detecting financial fraud a challenge (Du ducing additive models based on the errors obtained in the
et al., 2018). Considering the importance of class imbalance previous steps. This results in an ensemble of base learners
in financial fraud detection, numerous methods have been with better prediction ability than the individual classifiers.
used to improve the classification performance of supervised This is achieved by gradually improving the accuracy, low
learning methods. In the related literature (Pambudi et al., tree depth and equal contribution of the base learners to
2019), data-level solutions have been particularly successful the final combined model. To further improve robustness to
because they allow to address the imbalance problem before noise and overfitting, gradient boosting was augmented with
training machine learning methods. In addition, data-level a random sampling scheme (stochastic gradient boosting).
methods integrated into classifier ensembles appear to be XGBoost is an enhanced implementation with a more regu-
particularly effective (Galar et al., 2012). From the data-level larized model to control overfitting. The objective function
methods, over-sampling methods create artificial instances of XGBoost to be minimized is given as follows (Chen &
in the minority class to balance the training data. However, Guestrin, 2016):
this can lead to problems of overfitting and overgeneraliza-
tion as instances of the majority class are ignored. Moreover, n
∑ ( )2 ∑T
given the gradual increase in data on financial fraud, under- obj(t) = (yi − ŷ (t−1)
i
+ f (x
t i )) + Ω(ft ), (1)
i=1 t=1
sampling methods should be a better choice than their over-
sampling counterpart. where yi is the target value of the i-th instance, ŷ (t)
i
is its
predicted value at the t-th iteration, ft (xi ) is the additive
The RUS method used in this study enables controlling decision tree model greedily added to improve the model
for the number of samples selected from the original data. performance, and Ω(ft ) is a regularization term penalizing
RUS is a non-heuristic method that randomly selects a data the model complexity. The goal of this regularization proce-
subset from the majority class, which is computationally dure is to compress the weights for many features to zero to
13
1990 Information Systems Frontiers (2023) 25:1985–2003
13
Information Systems Frontiers (2023) 25:1985–2003 1991
ability, which is due to its robustness to overfitting and data both, minimizing the training error and maximizing the
imbalance. margin. To handle nonlinear relationships in the data, ker-
In the proposed XGBOD-based fraud detection model, a nel functions (e.g., linear, polynomial or radial basis func-
variety of unsupervised outlier detection methods (presented tions) are employed to map the classification problem from
in Section 3.2.2) are used to produce the TOS features. To the original feature space to a new feature space of higher
maintain the balance between their diversity and accuracy, dimension where linear separation is possible.
the balance selection algorithm (Zhao & Hryniewicki, 2018)
is used to perform TOS selection. This algorithm applies a Random Forest Random forest (RF) integrates multiple
discounted accuracy function Ψ(TOSi ) to pick the subset of decision tree predictors trained independently on different
p most relevant TOS. The function is defined as follows: data samples. This allows to generate a number of trees,
ensuring that the generalization error converges to a certain
AUCi
, limit. Another major advantage of RF is its non-differenti-
Ψ(TOSi ) = ∑k (2)
i,j=1
∣ 𝜌(TOSi , TOSj ) ∣ able decision boundary. In addition, random feature selec-
tion is used to split the nodes in each tree, making the RF
where AUCi is the AUC performance of the i-th outlier classifier more robust to noise. The application of RF in
detection method, and 𝜌(TOSi , TOSj ) denotes the Pearson financial fraud detection is particularly effective when the
correlation coefficient between a pair of TOS. class distribution is imbalanced because its hierarchical
structure enables learning patterns from both classes (Nami
3.2 Machine Learning Methods for Comparative & Shajari, 2018). These advantages explain the good per-
Evaluation formance of RF on financial fraud detection tasks (Zhou
et al., 2018).
In this section, we present the machine learning methods
used for comparative evaluation in detecting fraud in mobile 3.2.2 Outlier Detection Methods
payment transactions. The methods can be broadly divided
into (1) machine learning methods with supervised learning Outlier detection is typically conducted using unsuper-
that address the class imbalance problem typical for financial vised machine learning methods. The methods presented
fraud detection data, and (2) outlier detection methods. in this section are trained to represent the legitimate data
using clusters of similar data observations. Then, an unseen
3.2.1 Supervised Learning Methods for Imbalanced Data instance is assigned a score that is compared to a threshold
representing the decision boundary separating legitimate
k‑nearest Neighbour Classifier The k-nearest neighbour instances from outliers.
(k-NN) method is an instance-based non-parametric clas- The evaluation conducted in this study contains four types
sifier that uses training instances for comparison purpose. of outlier detection methods, namely (1) proximity-based
An instance is classified considering its k most-similar methods, (2) linear model-based methods, (3) ensembling
instances (typically in terms of Euclidean distance) using methods, and (4) neural network-based methods.
a majority vote. This simple approach proved to be accu-
rate in a comparative analysis of machine learning meth- Proximity‑Based Methods To detect outliers, proximity-
ods for highly imbalanced credit card fraud detection based methods investigate the neighbourhood of each data
(Awoyemi et al., 2017). In financial fraud detection, it instance. For example, the local outlier factor (LOF) method
is assumed that fraud instances are far from the samples (Breunig et al., 2000) uses the Euclidean distance between
of the legitimate class. Therefore, k-NN can be effec- the data instance and its closest neighbour to obtain an out-
tively used even in unsupervised outlier detection mode lier score. In the k-NN method (KNN) (Ramaswamy et al.,
(Ramaswamy et al., 2000). 2000), a partition-based algorithm is first used to identify
candidate partitions containing outliers, and then the dis-
Support Vector Machine SVM is a particularly effective tances of instances from these partitions are calculated to
classifier for financial fraud detection due to its capacity to detect outliers. An important advantage of proximity-based
deal with high-dimensional data (Du et al., 2018; Pambudi methods is their independence of the data distribution. In
et al., 2019; Seera et al., 2021). The SVM algorithm aims to other words, no a priori knowledge about the data distribu-
find the optimal separating hyperplane that maximizes the tion is required. However, these methods usually do not scale
margin between instances from different classes. The deci- well for high-dimensional data. To reduce the sensitivity of
sion boundary is represented by a subset of the data known LOF to the curse of dimensionality, the cluster-based local
as support vectors. Finding the parameters of the hyperplane outlier factor (CBLOF) method (He et al., 2003) replaces
is an optimization problem that takes into consideration closest neighbours with closest clusters, and the angle-based
13
1992 Information Systems Frontiers (2023) 25:1985–2003
outlier detection (ABOD) method (Kriegel et al., 2008) Table 2 Confusion matrix for fraud detection
replaces distances with the angular radius and variance of Prediction/Target Positive Negative
each data vector. The histogram-based outlier detection
(HBOS) method assumes independence of features to score Positive (fraudulent transaction) TP FP
instances in linear time and is thus computationally more Negative (legitimate transaction) FN TN
efficient compared to nearest-neighbour-based methods.
However, HBOS fails in detecting local outliers because the
density estimation produced by histograms does not allow of nonlinear dimensionality reduction and reproducing
modelling local outliers. input data vectors. Sakurada and Yairi (2014) showed that
autoencoder (AE) can be successfully applied to outlier
Linear Model‑Based Methods Linear model-based meth- detection. To detect outliers in financial fraud, AEs can be
ods rely on the construction of decision boundary sep- trained to learn legitimate behaviour and compute a recon-
arating instances in the legitimate class from the rest struction error representing the outlier score (Sakurada &
of the input data space. The one-class SVM (OCSVM) Yairi, 2014). To achieve robustness in learning disentan-
method (Schölkopf et al., 2000) constructs a separating gled representations, variational autoencoder (VAE) was
hyperplane in high-dimensional space by minimizing the proposed that utilizes both the joint data distribution and
structural risk to capture regions of data belonging to their latent generative factors (Burgess et al., 2018). VAE
the legitimate class. To prevent overfitting, this method represents a probabilistic graphical model whose posterior
allows a certain percentage of data instances (regulariza- distribution is estimated using a neural network. The out-
tion parameter) to fall outside the separation boundary. lier score of VAE is calculated as the reconstruction prob-
The minimum covariance determinant (MCD) method ability. Recently, generative adversial networks (GANs)
(Hardin & Rocke, 2004) combine a multivariate loca- have been deployed to unsupervised outlier detection.
tion and scale estimator with a robust clustering algo- Specifically, multi-objective generative adversarial active
rithm so that the determinant of the covariance matrix is learning (MO-GAAL) uses GANs to sample informative
minimized for each cluster. This method is first trained potential outliers following a mini-max game between
to fit a minimum covariance determinant model and a discriminator and a generator (Liu et al., 2019). Thus,
then the outlier score is calculated using the Mahalano- GANs assist the discriminative algorithm in finding a
bis distance. However, problems can arise when clusters boundary that can effectively separate fraudulent outliers
overlap significantly, leading to poor convergence of the from legitimate normal data. This has been exploited in
algorithm. several studies on financial fraud (Sethia et al., 2018; Dele-
court & Guo, 2019).
Ensembling Methods Isolation Forest (Liu et al., 2008)
aims to separate outliers from the rest of the data samples. 3.3 Performance Evaluation
To calculate an isolation score for the data instances, ran-
dom forest is employed. The method assumes that outliers In many related studies (Du et al., 2018; Misra et al., 2020;
are susceptible to isolation and, therefore, can be isolated Mubalaike & Adali, 2018), the ratio of correctly classified
closer to the root of the tree. Specifically, the average path transactions to the total number of transactions (i.e., accu-
length from the root of the trees can be used obtain the racy) has been used as the evaluation measure. However, in
isolation score. Isolation trees are thus able to build sub- the scenario of class-imbalanced data, this measure fails to
models on different data samples while maintaining low detect well the model performance for the minority (fraud)
computational complexity and the ability to scale to handle class.
large volumes of data and high-dimensional problems. Sim- As noted in previous research (Lopez-Rojas & Barneaud,
ilarly, lightweight on-line detector of anomalies (LODA) 2019), an inherent problem in detecting financial fraud that
comprises a collection of weak learners represented by needs to be addressed is the unknown distribution and
one-dimensional histograms approximating probabilities impact of all fraudulent transactions. In the absence of an
of random data projections. The use of sparse projections adequate measure of fraud detection performance, exist-
makes LODA robust to both the large number of samples ing fraud detection approaches rely on traditional measures
and missing data, allowing the detection of anomalous sam- of classification performance. The most desirable perfor-
ples in real-time (Pevny, 2016). mance measure is the ability to correctly identify fraudu-
lent transactions (true positive rate). In addition, minimiz-
Neural Network‑Based Methods Neural network-based ing false positive and false negative transaction rates (see
methods utilize feature learning to reduce dimensionality. confusion matrix in Table 2) is also a desirable quality of
An autoencoder is an unsupervised neural network capable fraud detection systems, especially in a changing fraudulent
13
Information Systems Frontiers (2023) 25:1985–2003 1993
(10)
∑
CostFP = (TN × AL × 0.035) − (FPj × ATj × 0.035),
Barneaud, 2019): j=1
13
1994 Information Systems Frontiers (2023) 25:1985–2003
Table 3 Attributes in the PaySim dataset testing data). To ensure reliable performance evaluation, we
Attribute Mean value / Range
repeated this process five times. Since the performance of
the fraud detection methods strongly relies on their hyper-
Step 1-743 parameter selection, we then conducted their optimal selec-
Type of transaction cash-out (35%), cash-in (34%), trans- tion using 5-fold cross-validation on the training data (for
fer and debit (31%)
the list of hyperparameters and their values, see Appendix
Amount of transaction 180K
Table 9). Then, we performed fraud detection in mobile
Customer name 6.35M unique values
payment transactions using the above supervised learning
Initial balance 834K
and outlier detection methods. For experiments, we used the
New balance 855K
following implementations: (1) supervised learning meth-
Recipient name 2.72M unique values
ods in the Python library Scikit-Learn 0.23.0, (2) the RUS
Initial balance of the recipient 1.1M
algorithm available in the library Imbalanced-Learn 0.6.2,
New balance of the recipient 1.22M
and (3) the outlier detection methods available in the library
Fraud 0 (legitimate 6.36M) / 1 (fraud 8.2K)
PyOD (Zhao et al., 2009). The performance of the methods
was evaluated using the measures defined in the following
subsection.
PaySim dataset2 in this study. The main objective of the
simulations performed by Lopez-Rojas and his research 4.3 Empirical Results
team (Lopez-Rojas et al., 2016, 2018; Lopez-Rojas &
Barneaud, 2019) was to replicate typical fraud scenarios We performed empirical experiments using the PaySim
that have similar statistical characteristics to the original dataset. This section consists of four subsections. First, we
mobile payment transaction data. To this end, different types investigate the performance of supervised learning methods
of fraudulent transactions were injected, including cash-in and the effect of random under-sampling on their effective-
(increasing account balance), cash-out (withdrawing cash), ness. Second, the performance of outlier detection methods
payment (paying for goods or services), transfer (to another is evaluated. Third, the financial consequences of the fraud
user) and debit (sending money to a bank account). PaySim detection models are evaluated. Finally, the robustness of the
simulated 743 time steps, representing thirty days of real- models is tested using a credit card fraud dataset.
time data. To introduce fraudulent behaviour into the sys-
tem, 1,000 fraudsters were included with a 3% probability 4.3.1 Supervised Learning Methods
of committing fraud at any time step. A total of 6,362,620
mobile transactions were involved in the dataset, of which In the first set of experiments, we compared the perfor-
8,213 were fraudulent. Table 3 provides descriptive statistics mance of four supervised learning methods (XGBoost, k-
of the dataset, and Fig. 4 shows the numbers and amounts of NN, SVM, and RF), without using RUS, to obtain baseline
transactions in time steps. performance. Table 4 shows the testing results of overall
We opted for this dataset for several reasons (Lopez-Rojas accuracy Acc, AUC, F1, Precision and Recall. The values of
& Barneaud, 2019). First, real-time historical data do not performance measures were obtained as the average of five
include enough fraudulent transactions. Therefore, some pre- experiments. For each performance measure, the number in
vious studies have considered all abnormal transactions to bold represents the best value among the tested methods.
be fraudulent (Choi & Lee, 2017). Second, privacy protec- The non-parametric Wilcoxon test was performed on the
tions prevent companies from making datasets public. Third, performance measure values obtained in the five experi-
fraudulent behaviour is adaptive, making it difficult to cre- ments to statistically compare the performance between the
ate sufficiently diverse real-world fraud data. In addition, a best performing method and the remaining methods. Signifi-
similar approach based on typical real attack scenarios was cantly similar results at the 5% level with respect to AUCand
taken in studies related to online banking fraud detection F1 are marked with an asterisk.
(Carminati et al., 2015). In terms of accuracy, all the supervised learning methods
used performed well. However, as noted above, the extreme
4.2 Experimental Setup class imbalance suggests that this evaluation measure is not as
relevant in this case. As for the AUCmeasure, XGBoost was
For data partitioning, we randomly created training and superior to the other methods, indicating a well-balanced perfor-
testing data with a 3:1 ratio (75% training data, 25% mance for both legitimate and fraud classes. The good balance
between Precision and Recall caused XGBoost to achieve the
best results also in terms of F1 measure. By contrast, SVM and
2
https://www.kaggle.com/ealaxi/paysim1 k-NN performed well only with respect to Precision and Recall,
13
Information Systems Frontiers (2023) 25:1985–2003 1995
Table 4 Fraud detection Method AUC F1 Acc Precision Recall Execution time [s]
performance of supervised
learning methods k-NN 0.9313 0.1588 0.9881 0.0873 0.8744 4,581.4
SVM 0.6543 0.4655 0.9991 0.9474 0.3086 12,082.9
RF 0.8961 0.8394* 0.9996 0.9146 0.7756 1,196.2
XGBoost 0.9350 0.8410* 0.9998 0.8794 0.8059 207.0
RUS+k-NN 0.8996 0.0405 0.9475 0.0207 0.8516 145.3
RUS+SVM 0.8344 0.0321 0.9431 0.0164 0.7255 1,041.5
RUS+RF 0.9933* 0.2305 0.9914 0.1303 0.9947 12.6
RUS+XGBoost 0.9955* 0.2812 0.9934 0.1637 0.9976 2.4
The best results are in bold, * statistically similar at 5% as the best performer in bold. The experiments
were performed on Intel® Core™ i5-8400 CPU @ 2.8GHz, 32 GB RAM with six cores on a Windows 10
oper. system in the Python libraries Scikit-Learn 0.23.0 and Imbalanced-Learn 0.6.2
respectively, making them unsuitable methods for fraud detection that only XGBoost without class-balancing adjustment is suitable
in mobile payment transactions. Overall, these results indicate for detecting fraud in such a class-imbalanced scenario.
13
1996 Information Systems Frontiers (2023) 25:1985–2003
Table 5 Fraud detection Method AUC F1 Acc Precision Recall Execution time [s]
performance of outlier detection
methods ABOD 0.8353 0.0680 0.9953 0.0675 0.0685 2,646.5
CBLOF 0.8593 0.0822 0.9954 0.0829 0.0822 41.3
HBOS 0.7731 0.0077 0.9951 0.0078 0.0076 4.1
LODA 0.6818 0.1060 0.9954 0.1026 0.1096 14.8
Isolation Forest 0.8358 0.0189 0.9964 0.0307 0.0137 189.9
KNN 0.8618 0.1260 0.9957 0.1288 0.1233 1,948.5
MCD 0.7705 0.1084 0.9956 0.1087 0.1081 127.4
OCSVM 0.6732 0.0273 0.9951 0.0272 0.0274 802.9
AE# 0.8050 0.0869 0.9954 0.0870 0.0868 931.1
VAE# 0.8050 0.0869 0.9954 0.0870 0.0868 2,922.9
MO-GAAL 0.9071 0.6059 0.9980 0.5902 0.6225 13,184.4
XGBOD 0.9958 0.8737 0.9994 0.9942 0.7793 4,256.3
The best results are in bold, # The experiments were performed on GPU NVIDIA GeForce GTX 1060
6GB, 1280 cores on a Windows 10 oper. system in the Python library PyOD
Then, we investigated the effect of the RUS under- addition, the transactions contained in the majority class
sampling procedure on the performance of the supervised of legitimate transactions are fully utilized by the multiple
learning methods. On the one hand, Table 4 shows that unsupervised outlier detection methods that produce outlier
RUS greatly improved the values of AUC for SVM, RF scores in XGBOD. The XGBoost algorithm applied in the
and XGBoost. On the other hand, there was a considerable improved XGBOD feature space exhibits good robustness to
deterioration in F1, which can be attributed to the lower overfitting and data imbalance, and outperforms the super-
Precision achieved at the cost of higher Recall. In other vised learning methods reported in Table 4 in terms of AUC
words, RUS caused almost all fraudulent transactions to be and F1. However, it should be admitted that the drawback of
detected, but this was accompanied by a substantial increase XGBOD is the longer execution time, on average 4,256.25
in the number of FP transactions. This resulted in a bias seconds.
for the minority class while reducing the accuracy for the
majority class. It is worth noting that we also experimented 4.4 Financial Impact of Fraud Detection
with other heuristic-based under-sampling methods, such
as edited neatest neighbour and Tomek links, to address the To investigate the financial consequences of the evaluated
class imbalance problem but without improvement in detec- fraud detection systems, we used the performance measures
tion performance. Finally, it should be noted that the execu- defined in Eqs. 9-11. Table 6 shows the average financial
tion time (training time + testing time) was substantially performance of all methods in terms of cost savings from TP
reduced by using RUS. For example, RUS+XGBoost was transactions, cost of FP transactions and total cost savings.
computationally most efficient with 2.38 seconds compared To calculate these results, we used the average amounts of
to 207.02 seconds required for XGBoost without using RUS. fraudulent and legitimate transactions in the data, i.e., AF =
1,468,000 and AL = 178,200.
4.3.2 Outlier Detection Methods In general, supervised learning methods outperformed
outlier detection methods in terms of overall cost savings,
In the seconds run of experiments, the performance of which can be attributed to the high Recall values of super-
XGBOD was evaluated compared with other outlier detec- vised learning methods. Note that cost savings from TP
tion methods. Table 5 shows that XGBOD significantly out- transactions were considered to have a stronger financial
performed the remaining methods in terms of AUCand F1. impact on total cost savings compared to FP transactions.
In addition, XGBOD was also dominant with respect to both In contrast, XGBOD delivered the lowest costs associated
Precision and Recall, indicating excellent performance on with FP transactions, which is related to its high Precision
both classes. performance. Surprisingly, SVM and unsupervised outlier
These results can be explained by the semi-supervised detection methods used in previous studies (Buschjäger
learning approach used in the XGBOD method. This is et al., 2021; Du et al., 2018) did not perform well in terms
because, unlike other outlier detection methods, XGBOD of financial impact and provided negative overall cost sav-
leverages the labels assigned to mobile transactions. In ings due to their low Recall values.
13
Information Systems Frontiers (2023) 25:1985–2003 1997
Table 6 Financial impact of fraud detection methods equipping SVM with LogDet regularization improves the
Method CSTP* CostFP CStotal
AUC performance. Indeed, the traditional SVM method
is reportedly sensitive to outliers and noisy data (Shajalal
k-NN 3,576.4 120.3 3,456.1 et al., 2021). Table 7 also shows that deep neural networks
SVM −2,135.4 218.3 −2,135.6 performed well in previous studies (Schlör et al. 2021; Xen-
RF 2,575.1 923.1 2,574.2 opoulos, 2017). However, their performance is limited by
XGBoost 3,630.7 380.5 3,630.3 the relatively low number of fraudulent transactions in the
RUS+k-NN 3,443.3 519.3 2,924.0 dataset. By contrast, the worst performance was reported for
RUS+SVM 2,155.9 561.4 1,594.5 the Isolation Forest method (Buschjäger et al., 2021). Note
RUS+RF 4,903.3 85.8 4,817.5 that the results for Isolation Forest obtained here (Table 5)
RUS+XGBoost 4,932.9 65.9 4,866.9 are consistent with those from Buschjäger et al. (2021). The
ABOD −4,556.9 23.5 −4,580.4 results in Table 7 suggest that the proposed XGBoost-based
CBLOF −4,418.7 22.6 −4,441.3 models perform better than those used in previous studies
HBOS −5,171.0 24.2 −5,195.2 in terms of AUC, which can be attributed to their good scal-
LODA −4,142.3 23.8 −4,166.2 ability and efficient processing of sparse data.
Isolation Forest −5,109.6 10.7 −5,120.3
KNN −4,004.2 20.7 −4,024.9 4.6 Robustness Check on Bank Payment Datasets
MCD −4,157.7 22.2 −4,179.7
OCSVM −4,971.4 24.3 −4,995.7 To confirm the obtained performance evaluation, we
AE −4,372.6 22.6 −4,395.3 checked the robustness of the considered fraud detection
VAE −4,372.6 22.6 −4,395.3 methods using a bank payment dataset. The BankSim data-
MO-GAAL 1,031.6 10.7 1,020.9 set3 (Lopez-Rojas & Axelsson, 2014) was generated using
XGBOD 2,612.9 0.1 2,612.8 a multi-agent simulation based on a sample of transac-
* amounts are given in mil. units of an African currency that could tional data from a Spanish bank. The dataset was validated
not be disclosed by data providers, the best results are in bold using statistical techniques and social network analysis of
customer-merchant relationships, thus approximating key
features of real bank payment frauds. Each transaction was
Table 7 Comparison of fraud detection performance of the proposed characterized by payment amount (in EUR), customer and
XGBoost-based models with the results of previous studies merchant zip codes, customer gender and age, and merchant
Study Method AUC category (e.g., fashion, technology, transport, and travel). A
total of 594,643 transaction records were included, of which
Xenopoulos (2017) ensemble of deep belief 0.961
7,200 were fraudulent transaction. The simulation was run
networks
for 180 steps representing months. Thieves were injected to
Du et al. (2018) SVM with LogDet regulariza- 0.978
tion steal or clone an average of three credit cards at each step
Pambudi et al. (2019) RUS+SVM 0.880 and conduct approximately two fraudulent transactions per
Mendelson and Lerner (2020) cluster drift detection 0.898 day. The result of the simulation is depicted in Fig. 5.
Schlör et al. (2021) deep MLP with ReLU and 0.960 The BankSim dataset provides a benchmark for detecting
iNALU fraud in bank payment transactions, as several recent studies
Buschjager et al. (2021) generalized Isolation Forest 0.821 have shown (Cui et al.. 2021; Vaughan, 2020). As a robust-
This study RUS+XGBoost 0.996 ness check, we trained the evaluated models on the BankSim
XGBOD 0.996 dataset using the same experimental setup as for the Pay-
Sim dataset. Note that the sampling process and data col-
The best results are in bold
lection system was unique and heterogeneous for both data-
set, which allowed us to verify the robustness of the tested
4.5 Comparison with State‑of‑the‑art Methods fraud detection models. The results in Table 8 suggest that
the under-sampling procedure is not as effective for smaller
To further show the effectiveness of the proposed fraud financial fraud datasets, improving the performance of super-
detection model, the obtained AUC was compared with vised learning methods only in terms of AUC. In contrast,
that of previous studies that examined the same dataset the performance of unsupervised outlier detection meth-
(Table 7). The best AUCperformance thus far reported was ods substantially improved compared to the large PaySim
achieved using SVM with LogDet regularization (Du et al.,
2018). Our result in Table 4 obtained for SVM confirm that
3
https://www.kaggle.com/ealaxi/banksim1
13
1998 Information Systems Frontiers (2023) 25:1985–2003
13
Information Systems Frontiers (2023) 25:1985–2003 1999
13
2000 Information Systems Frontiers (2023) 25:1985–2003
Appendix A
ABOD contamination = the proportion of frauds in the training dataset, neighbours k = {5, 10}
CBLOF number of clusters = 8, clustering estimator = K-means, alpha = 0.9
HBOS alpha = 0.1
LODA number of bins = 10, number of random cuts = 100
Isolation Forest number of estimators = {100, 200}
KNN neighbours k = {2, 3, 5}, radius = 1.0
MCD contamination = the proportion of frauds in the training dataset
OCSVM kernel function: {linear, polynomial, RBF with gamma = 0.01}, nu = 0.1
AE hidden activation = ReLU, optimizer = adam, epochs = 100, dropout rate = 0.2,
L2 regularizer = 0.2, hidden neurons = [8, 4, 4, 8]
VAE hidden activation = ReLU, optimizer = adam, epochs = 100, gamma = 1.0,
dropout rate = 0.2, L2 regularizer = 0.1, encoder neurons = [8, 4, 2], decoder neurons = [2, 4, 8]
MO-GAAL contamination = the proportion of frauds in the training dataset, number of
sub generators = 10, learning rate of the discriminator = 0.01, learning rate
of the generator = 0.0001, epochs = 20
XGBOD estimator list = {ABOD, CBLOF, HBOS, LODA, Isolation Forest, KNN,
MCD, OCSVM, AE, VAE}, p = 5, learning rate = 0.1
k-NN k = 2, 3, 5
SVM complexity parameter C = 1, kernel function: {linear, polynomial, RBF with gamma = 0.01}
RF number of trees = {100, 200}
XGBoost booster = gbtree, eta = 0.3, gamma = 0, maximum depth of a tree = {3, 6, 9},
sampling method = uniform, lambda = 1, alpha = 0
RUS sampling strategy = {0.5, 0.75, 1.0}
Table 9.
Acknowledgements This article was supported by the scien- Bansal, S., Bruno, P., Denecker, O., & Niederkorn, M. (2019). Global
tific research project of the Czech Sciences Foundation Grant No. Payments Report 2019: Amid Sustained Growth, Accelerating
19-15498S. Challenges Demand Bold Actions.
Bernard, P., De Freitas, N. E. M., & Maillet, B. B. (2021). A financial
Declarations fraud detection indicator for investors: an IDeA. Annals of Opera-
tions Research, 1–24.
Conflicts of interest The authors have no competing interests to de- Blumenstock, J. (2020). Machine learning can help get COVID-19 aid
clare that are relevant to the content of this article. to those who need it most. Nature, 13.7.2020, 1–3.
Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF:
Identifying density-based local outliers. In 2000 ACM SIGMOD
international conference on management of data - SIGMOD ’00
References (pp. 93–104) New York, New York, USA.
Burgess, C. P., Higgins, I., Pal, A., Matthey, L., Watters, N., Desjar-
Ahmed, M., Mahmood, A. N., & Islam, M. R. (2016). A survey of dins, G., & Lerchner, A. (2018). Understanding disentangling in
anomaly detection techniques in financial domain. Future Genera- 𝛽 -VAE. In Proc. of the 31st conference on neural information
tion Computer Systems, 55, 278–288. processing systems (pp. 1–11).
Akanfe, O., Valecha, R., & Rao, H. R. (2020). Assessing country-level Buschjäger, S., Honysz, P. J., & Morik, K. (2021). Randomized outlier
privacy risk for digital payment systems. Computers & Security, detection with trees. International Journal of Data Science and
99, 102065. Analytics, 1–14.
Awoyemi, J. O., Adetunmbi, A. O., & Oluwadare, S. A. (2017). Credit Carcillo, F., Le Borgne, Y. A., Caelen, O., Kessaci, Y., Oblé, F., &
card fraud detection using machine learning techniques: A com- Bontempi, G. (2021). Combining unsupervised and supervised
parative analysis. IEEE international conference on computing, learning in credit card fraud detection. Information Sciences, 557,
networking and informatics, ICCNI 2017 (pp. 1–9). IEEE. 317–331.
13
Information Systems Frontiers (2023) 25:1985–2003 2001
Carminati, M., Caron, R., Maggi, F., Epifani, I., & Zanero, S. (2015). on artificial intelligence applications and innovations, AIAI
BankSealer: A decision support system for online banking fraud 2019 (pp. 425–436). Springer.
analysis and investigation. Computers & Security, 53, 175–186. Hajek, P., & Henriques, R. (2017). Mining corporate annual reports for
Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting sys- intelligent detection of financial statement fraud - A comparative
tem. In Proc. of the 22nd ACM SIGKDD int. conf. on knowledge study of machine learning methods. Knowledge-Based Systems,
discovery and data mining (pp. 785–794) 128, 139–152.
Chen, Y., & Sivakumar, V. (2021). Invesitigation of finance industry Hardin, J., & Rocke, D. M. (2004). Outlier detection in the multiple
on risk awareness model and digital economic growth. Annals of cluster setting using the minimum covariance determinant estima-
Operations Research, 1–22. tor. Computational Statistics and Data Analysis, 44(4), 625–638.
Chen, S., Yuan, Y., Luo, X. R., Jian, J., & Wang, Y. (2021). Discover- He, Z., Xu, X., & Deng, S. (2003). Discovering cluster-based local
ing group-based transnational cyber fraud actives: A polymeth- outliers. Pattern Recognition Letters, 24(9–10), 1641–1650.
odological view. Computers & Security, 104, 102217. Huang, S. Y., Lin, C. C., Chiu, A. A., & Yen, D. C. (2017). Fraud
Chin, A. G., Harris, M. A., & Brookshire, R. (2022). An empirical detection using fraud triangle risk factors. Information Systems
investigation of intent to adopt mobile payment systems using Frontiers, 19(6), 1343–1356.
a trust-based extended valence framework. Information Systems Iman, N. (2018). Is mobile payment still relevant in the fintech era?
Frontiers, 24, 329–347. Electronic Commerce Research and Applications, 30, 72–82.
Choi, D., & Lee, K. (2017). Machine learning based approach to finan- Jia, L., Song, X., & Hall, D. (2022). Influence of habits on mobile pay-
cial fraud detection process in mobile payment system. IT CoN- ment acceptance: An ecosystem perspective. Information Systems
vergence PRActice (INPRA), 5(4), 12–24. Frontiers, 24, 247–266.
Choi, D., & Lee, K. (2018). An artificial intelligence approach to Jocevski, M., Ghezzi, A., & Arvidsson, N. (2020). Exploring the
financial fraud detection under IoT environment: A survey and growth challenge of mobile payment platforms: A business model
implementation. Security and Communication Networks, 2018, perspective. Electronic Commerce Research and Applications, 40,
5483472. 100908.
Coppolino, L., D’Antonio, S., Formicola, V., Massei, C., & Romano, Kang, J. (2018). Mobile payment in Fintech environment: trends, secu-
L. (2015). Use of the Dempster-Shafer theory to detect account rity challenges, and services. Human-Centric Computing and
takeovers in mobile money transfer services. Journal of Ambient Information Sciences, 8(1), 1–16.
Intelligence and Humanized Computing, 6(6), 753–762. Kar, A. K. (2021). What affects usage satisfaction in mobile payments?
Cui, J., Yan, C., & Wang, C. (2021). ReMEMBeR: Ranking metric Modelling user generated content to develop the “digital service
embedding-based multicontextual behavior profiling for online usage satisfaction model’’. Information Systems Frontiers, 23(5),
banking fraud detection. IEEE Transactions on Computational 1341–1361.
Social Systems, 8(3), 643–654. Kriegel, H. P., Schubert, M., & Zimek, A. (2008). Angle-based outlier
Davidovic, S., Nunhuck, S., Prady, D., Tourpe, H., & Anderson, E. detection in high-dimensional data. In Proc. of the 14th ACM
(2020). Beyond the COVID-19 crisis: a framework for sustain- SIGKDD international conference on knowledge discovery and
able government-to-person mobile money transfers. IMF Working data mining (pp. 444–452).
Papers, 198, 1–38. Li, Q., & Clark, G. (2013). Mobile security: A look ahead. IEEE Secu-
David-West, O., Oni, O., & Ashiru, F. (2022). Diffusion of innovations: rity and Privacy, 11(1), 78–81.
Mobile money utility and financial inclusion in Nigeria. Insights Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008). Isolation forest. In IEEE
from agents and unbanked poor end users. Information Systems int. conf. on data mining, ICDM (pp. 413–422). IEEE
Frontiers, 1–21. Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., & He, X. (2019).
Delecourt, S., & Guo, L. (2019). Building a robust mobile payment Generative adversarial active learning for unsupervised outlier
fraud detection system with adversarial examples. In 2019 IEEE detection. IEEE Transactions on Knowledge and Data Engineer-
2nd int. conf. on artificial intelligence and knowledge engineering ing, 32(8), 1517–1528.
(AIKE) (pp. 103–106). IEEE. Lopez-Rojas, E. A., & Axelsson, S. (2014). Banksim: A bank payments
Dhieb, N., Ghazzai, H., Besbes, H., & Massoud, Y. (2019). Extreme simulator for fraud detection research. In the 26th European Mod-
gradient boosting machine learning algorithm for safe auto insur- eling and Simulation Symposium (EMSS) (pp. 144–152).
ance operations. In 2019 IEEE international conference on vehic- Lopez-Rojas, E., Elmir, A., & Axelsson, S. (2016). Paysim: A financial
ular electronics and safety, ICVES 2019, (p. 1–5), IEEE mobile money simulator for fraud detection. In 28th European
Du, J. Z., Lu, W. G., Wu, X. H., Dong, J. Y., & Zuo, W. M. (2018). modeling and simulation symposium, EMSS 2016, Dime Univer-
L-SVM: A radius-margin-based SVM algorithm with LogDet sity of Genoa, Larnaca (pp. 249–255).
regularization. Expert Systems with Applications, 102, 113–125. Lopez-Rojas, E. A., & Barneaud, C. (2019). Advantages of the Pay-
Franque, F. B., Oliveira, T., & Tam, C. (2022). Continuance intention Sim simulator for improving financial fraud controls. Advances in
of mobile payment: TTF model with Trust in an African context. Intelligent Systems and Computing, 998, 727–736.
Information Systems Frontiers, 1–19. Lopez-Rojas, E. A., Axelsson, S., & Baca, D. (2018). Analysis of fraud
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., & Herrera, F. controls using the PaySim financial simulator. International Jour-
(2012). A review on ensembles for the class imbalance problem: nal of Simulation and Process Modelling, 13(4), 377–386.
Bagging-, boosting-, and hybrid-based approaches. IEEE Transac- Mahbobi, M., Kimiagari, S., & Vasudevan, M. (2021). Credit risk clas-
tions on Systems, Man and Cybernetics Part C: Applications and sification: an integrated predictive accuracy algorithm using arti-
Reviews, 42(4), 463–484. ficial and deep neural networks. Annals of Operations Research,
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & 1–29.
Bing, G. (2017). Learning from class-imbalanced data: Review Mendelson, S., & Lerner, B. (2020). Online cluster drift detection
of methods and applications. Expert Systems with Applications, for novelty detection in data streams. In Proc. of the 19th IEEE
73, 220–239. international conference on machine learning and applications,
Hajek, P. (2019). Interpretable fuzzy rule-based systems for detect- ICMLA 2020 (pp. 171–178).
ing financial statement fraud. In IFIP international conference
13
2002 Information Systems Frontiers (2023) 25:1985–2003
Misra, S., Thakur, S., Ghosh, M., & Saha, S. K. (2020). An autoen- Shajalal, M., Hajek, P., & Abedin, M. Z. (2021). Product backorder
coder based model for detecting fraudulent credit card transaction. prediction using deep neural network on imbalanced data. Inter-
Procedia Computer Science, 167, 254–262. national Journal of Production Research, 1–18.
Mubalaike, A. M., & Adali, E. (2018). Deep learning approach for Turner, A., Mccombie, S., & Uhlmann, A. (2021). Follow the money:
intelligent financial fraud detection system. In UBMK 2018 3rd Revealing risky nodes in a Ransomware-Bitcoin network. In
int. conf. on computer science and engineering (pp. 598–603). Proc. of the 54th Hawaii int. conf. on system sciences (pp.
Nami, S., & Shajari, M. (2018). Cost-sensitive payment card fraud 1560–1572).
detection based on dynamic random forest and k-nearest neigh- Vaughan, G. (2020). Efficient big data model selection with appli-
bors. Expert Systems with Applications, 110, 381–392. cations to fraud detection. International Journal of Forecasting,
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). 36(3), 1116–1127.
The application of data mining techniques in financial fraud Verkijika, S. F. (2020). An affective response model for understanding
detection: A classification framework and an academic review the acceptance of mobile payment systems. Electronic Commerce
of literature. Decision Support Systems, 50(3), 559–569. Research and Applications, 39, 100905.
Onwubiko, C. (2020). Fraud matrix: a morphological and analysis- Wang, C., Deng, C., & Wang, S. (2020). Imbalance-XGBoost: leverag-
based classification and taxonomy of fraud. Computers & Secu- ing weighted and focal losses for binary label-imbalanced classifi-
rity, 96, 101900. cation with XGBoost. Pattern Recognition Letters, 136, 190–197.
Pal, A., De, R., & Herath, T. (2020). The role of mobile payment West, J., & Bhattacharya, M. (2016). Intelligent financial fraud detec-
technology in sustainable and human-centric development: evi- tion: A comprehensive review. Computers & Security, 57, 47–66.
dence from the post-demonetization period in India. Informa- Wong, M. L., Seng, K., & Wong, P. K. (2020). Cost-sensitive ensemble
tion Systems Frontiers, 22(3), 607–631. of stacked denoising autoencoders for class imbalance problems in
Pal, A., Herath, T., De, R., & Rao, H. R. (2021). Is the conveni- business domain. Expert Systems with Applications, 141, 112918.
ence worth the risk? An investigation of mobile payment usage. Xenopoulos, P. (2017). Introducing DeepBalance: Random deep belief
Information Systems Frontiers, 23(4), 941–961. network ensembles to address class imbalance. In 2017 IEEE Int.
Pambudi, B. N., Hidayah, I., & Fauziati, S. (2019). Improving money Conf. on Big Data, Big Data 2017 (pp. 3684–3689).
laundering detection using optimized support vector machine. Yamanishi, K., Takeuchi, J. I., Williams, G., & Milne, P. (2004). On-
In 2019 2nd international seminar on research of information line unsupervised outlier detection using finite mixtures with
technology and intelligent systems, ISRITI 2019 (pp. 273–278). discounting learning algorithms. Data Mining and Knowledge
Papouskova, M., & Hajek, P. (2019). Two-stage consumer credit Discovery, 8(3), 275–300.
risk modelling using heterogeneous ensemble learning. Deci- Ye, X., Dong, L. A., & Ma, D. (2018). Loan evaluation in P2P lend-
sion Support Systems, 118, 33–45. ing based on random forest optimized by genetic algorithm with
Pevny, T. (2016). Loda: Lightweight on-line detector of anomalies. profit score. Electronic Commerce Research and Applications,
Machine Learning, 102(2), 275–304. 32, 23–36.
Ramaswamy, S., Rastogi, R., & Shim, K. (2000). Efficient algorithms Zhao, Y., & Hryniewicki, M. K. (2018). XGBOD: Improving super-
for mining outliers from large data sets. In Proc. of the 2000 vised outlier detection with unsupervised representation learning.
ACM SIGMOD int. conf. on management of data (pp. 427–438). In Proc. of the int. joint conf. on neural networks (pp. 1–8).
Reunanen, N., Räty, T., & Lintonen, T. (2020). Automatic optimiza- Zhao, Y., Nasrullah, Z., & Li, Z. (2019). PyOD: A Python toolbox for
tion of outlier detection ensembles using a limited number of scalable outlier detection. Journal of Machine Learning Research,
outlier examples. International Journal of Data Science and 20(96), 1–7.
Analytics, 10, 377–394. Zhou, H., Chai, H. F., & Qiu, M. L. (2018). Fraud detection within
Rieke, R., Zhdanova, M., Repp, J., Giot, R., & Gaber, C. (2013). bankcard enrollment on mobile device based payment using
Fraud detection in mobile payments utilizing process behavior machine learning. Frontiers of Information Technology and Elec-
analysis. In 2013 int. conf. on availability, reliability and secu- tronic Engineering, 19(12), 1537–1545.
rity, ARES 2013 (pp. 662–669).
Sakurada, M., & Yairi, T. (2014). Anomaly detection using autoen- Publisher’s Note Springer Nature remains neutral with regard to
coders with nonlinear dimensionality reduction. In Proc. of the jurisdictional claims in published maps and institutional affiliations.
MLSDA 2014 2nd Workshop on Machine Learning for Sensory
Data Analysis (pp. 4–11).
Schlör, D., Ring, M., Krause, A., & Hotho, A. (2021). Financial
fraud detection with improved neural arithmetic logic units. Petr Hajek is a Professor at the Science and Research Centre, Univer-
Lecture Notes in Computer Science, 12591, 40–54. sity of Pardubice, Czech Republic. He holds a Ph.D. degree in system
Schölkopf, B., Williamson, R., Smola, A., Shawe-Taylor, J., Platt, engineering and informatics. Professor Hajek is the author or coauthor
J., & Holloway, R. (2000). Support vector method for novelty of 5 books and more than 70 articles in leading journals such as Infor-
detection. In Advances in neural information processing sys- mation Sciences, Decision Support Systems, and Knowledge-Based Sys-
tems (pp. 582–588). MIT Press tems. His current research interests include business decision-making,
Seera, M., Lim, C. P., Kumar, A., Dhamotharan, L., & Tan, K. H. soft computing, text mining, and knowledge-based systems. He is a
(2021). An intelligent payment card fraud detection system. fellow of the Association for Computing Machinery (ACM), KES
Annals of Operations Research, 1–23. International, and Association for Information Systems (AIS).
Sethia, A., Patel, R., & Raut, P. (2018). Data augmentation using
generative models for credit card fraud detection. In 4th interna- Mohammad Zoynul Abedin is a Senior Lecturer in Fintech and Finan-
tional conference on computing communication and automation cial Innovation at Teesside University International Business School,
(ICCCA) (pp. 1–6). IEEE Teesside University, UK. He received his B.B.A. and M.B.A. degrees
in finance from the University of Chittagong, Bangladesh, and his
13
Information Systems Frontiers (2023) 25:1985–2003 2003
D.Phil. degree in investment theory from the Dalian University of He has published over 50 scientific articles in leading peer-reviewed
Technology, China. Dr. Abedin published more than 70 papers, includ- journals and conferences. His research has featured in reputable media/
ing peer reviewed full-length articles, conference papers, and book trade publications such as The World Economic Forum, BBC York-
chapters. His work appears on the Annals of Operations Research, shire, Computer Weekly and The Conversation. To date, he has a suc-
International Journal of Production Research, IEEE Transactions on cessful track record as Principal and Co-investigator in over £3 million
Industrial Informatics, to mention a few. His current research interests worth of Research and Innovation and consultancy projects funded by
include business data analytics, fintech, and computational finance. reputable funding bodies and commercial organisations. Some of the
He is a fellow of the Financial Management Association (FMA), and notable funders have been the European Commission (FP7, H2020,
British Accounting and Finance Association (BAFA). Marie Curie), Qatar National Research Fund (QNRF), Innovate UK/
DEFRA and British Council focusing on projects addressing business
Uthayasankar Sivarajah is a Professor of Technology Management and and societal challenges surrounding themes such as AI Innovation
Circular Economy. His passion for research and teaching is interdisci- Strategy Development, Smart Cities and Sustainable Societies. He is a
plinary in nature focusing on the use of emerging digital technology Fellow of the UK Higher Education Academy (FHEA) and a member
for the betterment of society, be it in a business or government context. of the British Academy of Management (BAM).
13