Risk Analysis
Risk Analysis
net/publication/333565847
CITATION READS
1 1,363
2 authors:
All content following this page was uploaded by Efstathios Kirkos on 03 June 2019.
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
Efstathios Kirkos
Alexander Technological Educational Institute of Thessaloniki
e-mail: stkirk@acc.teithe.gr
Abstract
Custom administrations develop automated systems to optimal inspect the rapidly increased
transactions by their limited inspection force. Literature suggests that Data Mining techniques
outperform the traditional methods, especially in the domain of risk analysis and fraud
detection. Data Mining is the process of analyzing vast datasets in order to reveal valuable
information. Also is a key tool for the analysis of structured and unstructured data, known as
Big Data. This paper aims to review research studies conducted, worldwide, to Customs Risk
Management domain using data mining techniques. Various combinations of keywords used
and the search yielded a sample of 16 relevant articles that deal with different problems that
Customs face, such as profiling of economic operators and efficient exploitation of
manpower. Most of them had been conducted in China. It is remarkable that very few
researches conducted in European countries and none of them in Greece. The majority of the
studies- about 63% - are dealing with the detection of smuggling and miscoding, two of the
biggest problems faced by the Customs. Out of 24 data mining techniques used in the articles
analyzed. Decision Trees model appears to be the leading one in detecting fraud. In general,
supervised learning tools have been used more frequently than the unsupervised ones. Best
technique for Customs Risk Management model is the combination of Neural Network and
Decision Trees. Most of the models achieved great accuracy, above 90%. In most studies, 10
out of 16, the data come from the customs declarations. This review summarizes the current
trends in the field of Customs Risk Management, reveals opportunities and needs, constitutes
a source of knowledge for relevant research and aspires to stimulate the interest of future
researchers and practitioners of Custom administrations.
Keywords: data mining, custom administrations, smuggling, fraud detection, customs risk
management
1. INTRODUCTION
Customs administrations are responsible for implementing a wide range of policies in areas
as diverse as collection of duties and taxes, defense of security and safety, compliance of
trade regulations. Also Customs are often support the work of other services such as police
and immigration authorities. The main problem that customs face is smuggling that is the
clandestine import of goods or the evasion of taxes by circumvention of customs controls.
The consequence of this is the avoidance of customs duties and taxes and the considerable
difficulty in trade due to the increase of unfair competition. The detection of smuggling
P a g e | 237
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
requires physical examination of goods. These controls must be quick and effective in order
not to disturb trade flows in a fast moving economy. However, due to the large increase in
trade over the last few years and the limited manpower and materials of customs, the physical
examination of all shipments is impossible.
For this reason, eligibility and risk targeting methods are applied through automated risk
analysis systems, so as to maintain a proper balance between customs controls and facilitation
of legitimate trade. The existing systems are based on simple criteria of selectivity, focusing
on the goods, the importer, the exporter, the carrier plus a random target. Also they are
strengthened by the exchange of information between customs authorities. However, they
have not proven to be particularly effective. They require customs officers to control a large
number of transactions which frequently results in very low recorded offence rates. For
example in Greece, in 2017, 79,513 checks were carried out and only the 5,608 of them were
actual infringements, ie only 7%, while in products of Excise Duty the rate was much lower,
ie only 3,3% [1]. Also, the existing systems cannot make use of unstructured data, causing it
to ignore a large amount of all available data.
Thus, many customs administrations are focusing on other practices, for maintaining more
efficient automated systems in order to improve their targeting/selectivity processes, such as
data mining. Data mining refers to extracting or “mining” knowledge from large amount of
data and can develop systems to support strategic decisions in customs risk management.
Data mining techniques are already being used in many areas which are facing the risk of
fraud, one of the fundamental applications of data mining. For example, insurance fraud,
credit card fraud, telecommunications fraud, and check forgery. Also it is widespread in the
area of sales promotion as an important tool for improving their competitiveness. Data
mining is even used in medicine in order to recognize the predisposition to illness.
The World Customs Organization and the European Union have already underline the
importance of data mining in the domain of Customs risk management, something that has
also been perceived by the Greek customs administration.
This paper aims to review research studies conducted, worldwide, to Customs Risk
Management domain using data mining techniques and is structured in five sections. The first
section describes the meaning of customs controls and customs risk management. In the
second section, a theoretical approach to data mining is being undertaken to consolidate all
relevant knowledge discovery processes. The third section presents sixteen studies that have
been published in scientific journals and conference proceedings around the world describing
the customs problems that have been solved, the datasets used and the data mining models
that have been developed. A description of the algorithm of each model is given, their
accuracy is reported and a comparative analysis of all studies is made. Finally section four
sets out the conclusions, the contributions of the survey and suggestions for future research.
Customs controls means all acts performed by the customs authorities in order to ensure
compliance with the customs legislation and other legislation governing the transaction of
goods from and to non-EU countries[18]. For the purposes of customs controls, customs
authorities verify the accuracy and completeness of the declaration, which is the document
that lists the details of the imported or exported product. In particular, the elements of the
declaration to be controlled are:
a) Origin of goods, the country or territory where the goods obtained.
P a g e | 238
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
b) Code of goods. All imported/ exported products are classified under a10 digit tariff code
that carries information about the customs duty rates and non-tariff measures. The EU
classification system consists of the Combined Nomenclature (NC) that serves the EU’s
common customs tariff plus the Integrated Tariff (TARIC) that provides information on all
trade policy and tariff measures applicable to specific goods in the EU (e.g. suspension on
duties, tariff quotas, tariff preferences, anti-dumping measures) [20]. The Combined
Nomenclature (NC) is based on the Harmonized Commodity Description and Coding System
(HS) developed by the WCO. Also the classification of goods includes taxes: The most
important taxes according to their participation in revenue are VAT and Excise Duty.VAT is
calculated as a percentage of the taxable amount. Its share of total customs revenue in Greece
is quite high, 38.57% in 2017, which corresponds to 4,896.0m euro. The Excise Duties cover
alcohol, alcoholic drinks, energy products, electricity, and tobacco products and its
percentage of total Greek customs revenue in 2017 was 55.61% [3]. Also most cases of
smuggling are recorded in the Excise Duty products. In 2017 huge quantities of products
were seized in Greece where the total amount of tax evaded was 81.835.712,60 euro[2].
c) Value of goods. The economic value of goods declared for importation.
All the above elements constitute the “identity” of the good that provides the basis for
assessment of duty that has to be paid.
Other significant elements of the declaration are the economic operators such as the importer,
the exporter and the carrier, the person who brings or assumes responsibility for the carriage
of the goods, in or out of the custom territory of the Union.
Customs controls, other than random checks, shall primarily be based on risk analysis using
electronic data-processing techniques, with the purpose of identifying and evaluating the
risks.
2.2 RISK
Risk is the probability of non-compliance with the customs laws. Customs risk management
is all practices that provide customs with the necessary information to identify and effectively
deal with the transactions that are of high risk. EU has gradually developed a common risk
management approach (CRMF) providing an equivalent level of protection at its external
borders, which is implemented by all Member States. It is based upon the exchange of risk
information (RIF) and risk analysis results between customs administrations. Also establishes
common risk criteria (CRC) and priority control areas (PCA) that include indicators of risk,
the nature and the duration of customs controls, types of goods, traffic routes of transport or
economic operators which are subject to increased levels of risk analysis [19].
But the existing system is not used with sufficient accuracy. According to the European Court
of Auditors [8] there are weaknesses in the information exchange tools, both in terms of
content and their use. Too much information, inappropriate feedback from the member states
and many messages for relatively small and local risks, was reported. All the above led to an
excess of information and difficulties to identify the key risks.
3. DATA MINING
Data are divided in two categories. The structured data collected from business information
monitoring systems and the unstructured data which come from additional data sources like
social media, email, customer’s feedback on the company’s product, smart phones, in-vehicle
infotainment devices etc. These two categories consists Big Data that expands in a great
volume.
P a g e | 239
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
Big Data can be analyzed for insights that lead to better decisions and strategic business
moves. Some Customs administrations have already embarked on Big Data initiatives,
leveraging the power of analytics, ensuring the quality of data (regarding cargos, shipments
and conveyances) and widening the scope of data they could use for analytical purposes to
ensure that better informed and smarter decisions are taken [15]. For example, the New
Zealand and Hong Kong customs have created joint repositories that combine sets of data
from various government authorities. United Kingdom aims at gaining access to commercial
data flows in the supply chain and Canada, is focusing on integrating additional resources and
information into the already used Enterprise Data Warehouse, so as to improve the ability to
assess the risk to goods and people through biometric examination, facial recognition, and
automated lying detection.
The analysis of Big Data necessitates the application of new efficient methods like
Knowledge Discovery in Databases, or in other words data mining. Data mining combines
methodologies of statistics, machine learning, artificial intelligence and other disciplines and
provides useful knowledge by revealing trends, patterns, exceptions and relationships through
vast datasets [12]. The steps of processing Big Data are a) storage, b) cleaning (remove the
seemingly abnormal data that may mislead the results), and c) analysis, apply the appropriate
technique depending on the problem that needs to be solved and the mining of the
information to solve the problem. There are several data mining techniques that are divided
into supervised learning techniques (target is known in advance and the algorithm tries to
find the relationship of data with the target) and unsupervised learning (the target is not
known in advance).
The main techniques of supervised learning are classification and regression. Both of them
group the stored data into classes. The algorithm "learns" from a training set and then
classifies the data of a test set according to the knowledge that acquired. Regression is being
used for numerical target while classification of nominal class values. A well-known example
of classification problem is the approval of bank loans. Methods of classification are:
a) Neural Networks: They use a set of nodes connected to each other like human brain
neurons and learn from their experience like humans.
b) Decision Trees represent the data relationships in the form of a tree
c) Bayesian Network is a probabilistic model, the probability of verifying case A given that
B is true.
d) In k- nearest neighbor the classification is based on the k nearest points of the training
set.
e) Support Vector Machines focus to the construction of a hyperplane that optimal separates
the data, displayed in a multidimensional space.
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
Literature suggests that Data mining may improve the efficiency and rationality of customs
decision-making policies that involve the least risk and the best probable outcomes. This
paper reviews research studies conducted to resolve the main problems that customs face
during customs controls using data mining techniques. Various combinations of keywords
used to identify the pertinent articles and the search yielded a sample of 16 relevant articles.
The table below presents all the customs problems solved by applying data mining
techniques in the sixteen studies, which are the main problems customs administrations face
in conducting controls. Column 3 "STUDIES" in the table shows how many articles of this
survey dealt with resolving this problem and the last column of the table shows the countries
where the studies were made.
The majority of these studies are dealing with the development of models that detect fraud
such as smuggling, misclassification, drug detection, empty container verification, smuggling
of vessels. Other studies present methodologies for building a structured database for the
Customs, capable of improving the accuracy of fraud detection systems. Three of the studies
deal with the classification of economic operators in groups of risk/reliability for the direct
identification of potential change in compliance behavior and emerging risks. Another study
proposes a model which predicts ships route and in the last study a data mining model
identifies the association of import-export commodities with other commercial factors, such
as market price and exchange rate. This knowledge allows Customs to understand variations
in the demand of services, so they may reallocate resources more efficiently.
Most of the studies were conducted in China and one of them in United States. It is
remarkable that very few researches conducted in European countries and none of them in
Greece.
P a g e | 241
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
The following tables summarize the 16 articles per technique of data mining. Also there is a
presentation of the data sets that used, the algorithms applied in each case as well as the
performance of each algorithm.
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
Best technique for Risk Management model is the combination of Neural Network and
Decision Trees. It achieved very good accuracy (93.87%) and even faced the problem of the
"out-of-balance" databases (where the sample of fraudulent declarations is too small
compared to un-fraudulent one).
In most studies, 10 out of 16, the data come from the customs declarations. Therefore, the
information provided by the customs databases has significant results in the detection of
fraud through data mining. Of course, this does not reduce the importance of information
from other sources, such as X-ray images or ship routes, since they give very high accuracy
rates.
Most of the models achieved great accuracy, above 90%. Thus, by adopting data mining
techniques Customs can develop more efficient Customs Risk Management models.
Out of 24 data mining techniques used in the articles analyzed. The table below summarizes
all the data mining techniques presented in the papers. In total 27 algorithms were developed.
Some of them were combined by creating hybrid models while others were applied to the
same database and the performance of each was evaluated. However, everyone has
encountered the problem, for which they were created, with very high accuracy.
Table 4. Most used data mining methods and their usage frequency
No Technique Frequency Description
1 Naïve Bayes 1 Uses Bayes rule of conditional probability to compute the
probability of a label given set of features
2 Tree-Augmented 1 Relaxes the hypothesis of independency by specifying a tree
Naïve Bayes structure on the feature set in which each feature only has as parent
the label and at most one other feature.
3 Markov blanket 1 The set of all parents, children and children’s other parents nodes
of a certain node
4 Mahalanobis 1 It is used to measure the distance between the observations and
takes note of the correlations between them
5 density based 1 The cluster expands, as long as the neighborhood of the adjacent
method points has the required density
6 X-mean 1 An advanced form of the K-means clustering algorithm, where
there is no need to accurately determine the number of clusters
7 Apriori 2 it pruns many of the sets that are unlikely to be totally frequent
(after measuring their support), thus saving extra efforts. Candidate
subsets are produced using only the strong subsets that were
exported during the previous pass.
8 GBAD-MDL 1 uses MDL to find the best infrastructure in a graph and then
examines all the snapshots of this infrastructure to find similarities
with this pattern
P a g e | 244
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
16 Interpretation Rules 1 Specifies how to extract the value of a feature through the confused
information given in an unstructured text. Then the system
constructs the characteristic-value pair that is returned by the
strong rule.
17 chi-square 1 Determines whether there is a significant difference between the
expected frequencies and the observed frequencies in one or more
categories.
18 C4.5 1 Uses the Profit Log as the separation criterion, which is the ratio of
information Log to the entropy
19 multi-dimension- 1 The multi-dimension-criterion model discovers the association
criterion relationship between attributes, which is different from the usual
association algorithm
20 k-means 1 The n samples are randomly selected as the center of the n groups.
Then each sample is assigned to the nearest cluster and the new
center of the newly formed cluster is updated.
21 Dynamic k-means 1 Through repetitive clustering separates the heterogeneous clusters
until it reaches the optimum number of clusters
22 J-48 Decision Trees 1 The C4.5 algorithm for building decision trees is implemented in
Weka as a classifier called J48
23 Back Propagation 1 The basic idea is that if the outcome is not expected, the error will
NN go back to the input for recalculation until it agrees with the
prediction
24 Q- cluster 1 An effective cluster is that the characteristics of the sample in one
type is less similar but the different sample is similarity.
25 Tertius 1 Creates rules from the values of the pairs of attributes in the
training data. It uses logical first order representation and displays
dependence on the number of lines in the rules
26 Predictive Apriori 1 Candidates are ranked according to foreseeable accuracy. It tries to
maximize the predictive accuracy of a correlation rule despite
confidence.
Supervised learning tools have been used more and Decision Trees model appears to be the
leading one, since it was used in 4 articles and detected different types of fraud. Neural
Networks and Logistic Regression follow. Thus, it could be stated that supervised learning
techniques are better-performing tools than the unsupervised ones in addressing the customs
P a g e | 245
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
difficulties which arise during the controls. Among the unsupervised learning techniques, the
Apriori algorithm had been used more and also gave great accuracy.
5. CONCLUSIONS
Customs are responsible for preserving commercial stability, timely settlement of the large
number of transactions and at the same time detecting and combating "fraudulent" behaviors.
The detection of smuggling requires physical examination of all goods that are imported and
exported, which is impossible due to the large volume of the international trade and the
limitation in customs manpower and materials. For this reason, automated targeting systems
that identify the high risk transactions is the most important tool for the customs controls.
However, the present methods do not have the desired effect. They give large percentages of
controls with significant small percentages of detected violations. Thus, customs
administrations are turning to data mining in order to create more sophisticated information
systems. Data mining provides useful knowledge by revealing the “valuable” information
through vast databases. Through its techniques, systems of neural networks, decision trees,
Bayes networks, clustering and association rules can be developed to support strategic
decisions in customs risk management. It has been used for years by a large number of
companies so as to increase profits and improve their competitiveness through describing past
trends and predicting the future ones. Also, it is a key tool for the analysis of Big Data, the
high in volume structured and unstructured data. Leveraging Big Data capabilities is
important to turn the available data into new insights. This will improve overall customs
operations and will provide lots of opportunities in many areas.
This study presents papers that have been published in scientific journals and conference
proceedings worldwide and highlights the usefulness of data mining in solving major
problems faced by customs when conducting their controls. It describes the customs
problems encountered, the data sets used, and the data mining models developed. From their
comparative analysis arises the fact that most of the studies have dealt with fraud detection
and risk targeting. Techniques of supervised learning of data mining have been used more
often. This does not reduce the importance of the unsupervised learning techniques that have
yielded a very high accuracy. Finally, in most studies, the data is derived from the import /
export declaration data. Therefore, the information provided by the customs databases has
significant results in the detection of fraud through data mining techniques.
The first contribution of this review is to highlight the usefulness of the application of data
mining techniques in Custom’s decision-support systems, especially in the domain of fraud
detection where it gives exceptionally better accuracy compared to the systems that Customs
already use. Furthermore, it aspires to be used as guidance for both researchers and
practitioners of Customs administrations in order to select the most appropriate technique
when dealing with Customs Risk Management decisions.
It is necessary to mention that other factors should be exploited in order to make data mining
tools more efficient for Customs. The important ones are listed below:
a) Access to Big Data. Apart the information derived from customs declarations
customs authorities should take advantage from the information given from other sources
such as other public services (tax authorities, etc), other government agencies (Single
Window, E-government) and private operators. Also, it should be exploited information
derived from the “cloud”, multilingual news sources and information from platforms
containing data from various electronic devices and objects, such as the "connected
containers", which allow objects to collect and exchange data, known as the "Internet of
Things".
P a g e | 246
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
b) The application of advanced data mining software such as SPSS, SAS, Maple,
Wolfram, Mathematica, RapidMiner, Neural Designer, Oracle Data Miner, SAP and R,
Python, etc. could help analyze data in the least costly way.
c) Finally, there is a need to develop an enterprise culture in Customs based on the
“value of data analysis”.
A very interesting research would be the implementation of data mining techniques in Greek
database for detection of fraudulent declaration (underestimated value, miscoding and
smuggling).We leave these issues open for future research.
References
P a g e | 247
3rd INTERNATIONAL CONFERENCE ON QUANTITATIVE, SOCIAL, BIOMEDICAL & ECONOMIC ISSUES 2019 - ICQSBEI 2019
[15] Okazaki, Y. (2017). Implications of Big Data for Customs - How It Can Support Risk
Management Capabilities, WCO Research Paper, No. 39.
[16] Rad, H., Arash, S., Rahbar, F., Rahmani, R., Heshmati, Z. and Fard, M. (2015). A Novel
Unsupervised Classification Method for Customs Fraud Detection, Indian Journal of Science
and Technology, 8(35).
[17] Roman, N., Ferreira, C., Meira, L., Rezende, R., Digiampietri, L. and Filho, J. (2019).
Attribute-Value Specification in Customs Fraud Detection A Human-Aided Approach., Paper
presented at the 10th International Digital Government Research Conference, Puebla Mexico.
[18]The European Parliament and the council of the European union. (2013).Union Customs
Code:. Definitions. (Article 5). Strasbourg.
[19] The European Parliament and the council of the European union. (2013).Union Customs
Code: Risk management and customs control. (Article 46). Strasbourg.
[20] The European Parliament and the council of the European union. (2013).Union Customs
Code: Tarrif classification of goods. (Article 57). Strasbourg.
[21] Triepels, R., Feelders, A. and Daniels, H. (2015). Uncovering Document Fraud in
Maritime Freight Transport Based on Probabilistic Classification, Computer Information
Systems and Industrial Management, 282-293.
[22] Wena, C., Hsu, P., Wang, C., Wuc, T. and Hsu, M. (2012). E-government Information
Application: Identifying Smuggling Vessels with Data mining Technology, Electronic
Journal of e-Government, 10(2), 47-58.
[23] Xiao, Z., Xiao, H. and Wang, Y. (2016). A Risk Decision-making Approach to Customs
Targeting, The Open Cybernetics & Systemics Journal, 10(1), 250-262.
[24] Zehero, B., Soro, E., Gondo, Y., Brou, P. and Asseu, O. (2018). Elicitation of
Association Rules from Information on Customs Offences on the Basis of Frequent
Motives,. Engineering, 10(09), 588-605.
[25] Zhu, Y., Wang, L. and Zhang, W. (2018). Detection of Contraband in Milk Powder
Cans by Using Stacked Auto-Encoders Combination with Support Vector Machine, IOP
Conference Series: Earth and Environmental Science, 170(3), 032114.