Effective and Efficient Approach in IoT Botnet Detection
Effective and Efficient Approach in IoT Botnet Detection
Susanto1, Deris Stiawan2*, M. Agus Syamsul Arifin1, Mohd Yazid Idris3, Rahmat Budiarto4
1
Faculty of Engineering Science, Universitas Bina Insan, Indonesia
2
Faculty of Computer Science, Universitas Sriwijaya, Indonesia
3
Faculty of Computing, Universiti Teknologi Malaysia, Malaysia
4
College of Computer Science, Albaha University, Saudi Arabia
Abstract Keywords:
The Internet of Things (IoT) enables the interaction of physical Dimensionality reduction;
systems connected to the internet network, resulting in the IoT;
generation of extensive data traffic with high dimensions. While IoT LDA;
applications offer benefits and convenience to users, network Article History:
security remains uncertain. One example is vulnerability to cyber- Received: May 16, 2023
attacks, such as botnets targeting consumers' IoT devices. In the Revised: July 23, 2023
realm of network security analysis, dealing with high-dimensional Accepted: July 28, 2023
data poses distinct challenges for researchers. These challenges Published: February 2, 2024
include the curse of dimensionality, which can complicate feature
definitions; predominantly unordered datasets; combinations of Corresponding Author:
clusters; and exponential data growth. In this study, we applied Deris Stiawan
Faculty of Computer Science,
feature reduction using the Linear Discriminant Analysis (LDA)
Universitas Sriwijaya, Indonesia
method to minimize features on the IoT network to detect botnet. Email: deris@unsri.ac.id
The reduction process is carried out on the N-BaIoT dataset which
has 115 features reduced to 2 features. Performing feature
reduction with detection systems has become more effective and
efficient. Experimental result showed that the application of LDA
combined with machine learning on the classification Decision Tree
method was able to detect with accuracy that reached 100% in
98.58s with only two features.
it is very efficient in memory space required for With the same N-BaIoT dataset, Nomm
data storage [9]. and Bahsi [17] performed feature reduction of 3,
Many studies have utilized dimensionality 5, and 10 features by using comparative
reduction method for data analysis on attack unsupervised method, i.e.: entropy, Hopkins
detection [10]–[12], however, most of the statistics and variance, then continue with the
methods are unsuccessful in utilizing lower SVM classification method and isolation forest.
dimension scale, because data dimension The authors reported that the entropy-SVM
reduction does not necessarily increase the method increased the accuracy up to 93.15%
classification [13]. This research work contributes when fewer features were used. The result is
towards an analysis of IoT botnet dataset opposite to variance-SVM and Hopkins-SVM
scalability reduction. The best data dimensional method, which significantly lowered the accuracy,
obtained from the experiment are used as whereas with the Hopkins-isolation forest, the
selected features for the detection system. The decrement was not significant but had low
level of effectiveness and efficiency is measured accuracy. Different results were obtained for the
through detection accuracy level as well as variance-isolation forest and entropy-isolation
detection time. The measurement results are forest methods, which have variant accuracy
compared to a detection system without the when using low features.
selected features. Moreover, the experiment Using the same N-BaIoT dataset, Liu et al.
results are also compared to existing detection [18] used a triangle area map-based multivariate
systems. Metrics used for comparison include: correlation analysis algorithm (TAM-based MCA)
accuracy, execution time, detection rate, false method to reduce features into 23 dimensions.
positive rate (FPR), false negative rate (FNR), Using the convolutional neural network (CNN)
sensitivity, specificity, and precision. method, their approach offered very high
The rest of this paper is divided into five accuracy, up to 99.57%. Alqahtani et al. [19]
sections as follows. In Section 2, we present optimized performance classification GXGBoost
related works on dimensionality reduction method by reducing data into three features using Fisher
on IoT botnet detection. Section 3 describes the Score. In the experiment, IoT Botnet detection
dataset, LDA, classification algorithm, evaluation using the N-BaIoT dataset was effective and
performance, and analysis tools. Section 4 efficient with an accuracy of 99.96%.
presents the results of the experimental analysis In another study with the IoT network
and a comparison with other works. Section 5 intrusion detection dataset, Desai et al. [20] used
presents the Discussion. Section 6 presents the dimensionality reduction method and optimized
conclusions and further work on dimensionality the classification function on IoT botnet detection.
reduction method in the IoT botnet. Principal component analysis (PCA) method was
chosen by the authors to reduce the data
RELATED WORK dimension. IoT network intrusion detection
The development of IoT technology has dataset, which has 115 features, was reduced to
increased the research need to develop effective 10, 15, and 20 features, and then are classified
security protection from an attack. The attack using multi-classification. The results from the
may caused by network traffic of heterogeneous experiment showed an accuracy level reaching
IoT devices, which generates high-scale data, 99.97% using Random Forest classifier. This
thus retaining the chance of attack [14]. Few result is superior to the Decision Tree and SVM
researchers have used the dimensionality classification methods.
reduction method to detect IoT botnet. The first Besides using the N-BaIoT dataset, there
step to creating botnet attack protection toward are also studies that have used the
an IoT network that has high-scale data traffic is dimensionality reduction method on IoT botnet
to use dimensional reduction. Bahsi et al. [15] with another dataset. Alshamkhany et al. [21]
performed feature reduction on an N-BaIoT reduced data dimension using the PCA method
dataset [16] from 115 features to 2, 3, and 10 and machine learning. Their experiments with the
features using the Fisher Score method. The UNSW-NB15 dataset [22] and Bot-IoT [23], which
proposed method uses fewer features to reach a were classified using the machine learning SVM-
high accuracy level. A slight decrease in RBF method, achieved a very high accuracy
accuracy was observed when using the result of 99.9%. The PCA method was used to
classification Decision Tree method while reduce number of features from 43 into 20
applying fewer features, although still above 98% features. Popoola et al. [24] reduced features in
for each feature. With k-NN, there is an increase the Bot-IoT dataset to six features using the long
in accuracy for each lower number of features, short-term memory autoencoder (LAE) method.
reaching up to 98.05% accuracy. The method showed that classification of deep
(4)
(5)
(6)
only for specificity value. In contrast, the values Table 11. Comparison of specificity values
for AB and GB relatively decrease. Method Without LDA With LDA
The experimental results clearly show that k-NN 0.9999 0.9999
the use of the LDA method on data dimension DT 1 1
RF 1 1
reduction gives an impact on IoT botnet AB 0.9999 0.9992
detection, as shown in Table 8. DT and RF show GB 0.9999 0.9993
the highest accuracy and stability, with or without
using the LDA method, with accuracy level
reaches 100%. k-NN, AB, and GB, show a slight Table 12. Comparison of FPR values
decrease in accuracy level when using the LDA. Method Without LDA With LDA
The use of the LDA method also has an k-NN 1.2511 6.255
impact on the precision value. Table 9 shows a DT 0 0
comparison between using the LDA method and RF 0 0
AB 8.2619 0.0008
without LDA. The dimension reduction with LDA GB 5.0043 0.0007
does not have an impact on DT and RF, which
has a stable value of 1. However, k-NN, AB, and
GB experience a slight decrease in precision. Table 13. Comparison of FNR values
We further evaluate the sensitivity of the
Method Without LDA With LDA
proposed method. Table 10 shows a comparison k-NN 0.0007 0.005
of the sensitivity values when the LDA method DT 0 0
was used and without LDA. DT and RF has RF 0 0
stable sensitivity values with and without the LDA AB 0.0002 0.0148
GB 0.0006 0.006
method, while k-NN, AB, and GB show a
decrease with the LDA method.
FPR performance was evaluated with
The results of the performance evaluation
similar condition, as shown in Table 12. Two
of specificity values are shown in Table 11. The
classification methods show decreased FPR
table shows a summary of the specificity values
values. Nevertheless, k-NN shows improvement
that impacted by the use of LDA method. On k-
with the LDA method, and exhibit significant
NN, DT, and RF the specificity values are stable
increase, which reaches 6.255. AB and GB show
and are not impacted. In contrast, AB and GB
an opposite trend, with a significant decrease in
experience a decrease in specificity value when
FPR, whereas DT and RF remained at 0.
using LDA method.
Performance evaluation results on FNR
are displayed in Table 13. While FNR DT and RF
Table 8. Comparison of accuracy values
still on 0 whether using LDA or without LDA, k-
Method Without LDA With LDA
NN and GB show decrement FPR values. This is
k-NN 99.99 99.82
DT 100 100 opposite to AB which has slightly increment FPR
RF 100 100 value.
AB 99.97 98.93 The use of LDA for dimensional reduction
GB 99.99 99.48 overall has positive impact on execution time as
shown in Table 14. The times to execute the
classification using k-NN, DT, RF, AB, and GB
Table 9. Comparison of precision values classifiers decrease. Executing classification
Method Without LDA With LDA using k-NN without LDA requires 30908.87
k-NN 0.9998 0.9984 seconds and decrease drastically to 73.95
DT 1 1
RF 1 1 seconds when incorporating LDA dimensional
AB 0.9997 0.9908 reduction. Execution time of DT decreases
GB 0.9999 0.9948 almost double, while for AB and GB, the
execution times are faster significantly when
incorporating the LDA dimensional reduction. The
Table 10. Comparison of sensitivity values fastest processing time for classification is
Method Without LDA With LDA achieved by k-NN, which only needs 73.95
k-NN 0.9993 0.995 seconds.
DT 1 1
RF 1 1 The experimental results show that the
AB 0.9998 0.9852 performance of each classification model has
GB 0.9994 0.9939 good results. Then validation was carried out to
detect overfitting problems using K-fold cross-
validation [48]. In Intrusion Detection System
(IDS) research cross-validation has been widely Table 17. Comparison results with other works
used, such as for validating the KNN, NB, SVM, Ref & No. of
Method Accuracy
and RF classification models in detecting DDoS (Year) Feature
attacks [49], validating the LSTM deep learning [15] Fisher Score + DT 2 98.43
[17] Entropy + SVM 3 93.15
model to detect different types of attacks Fisher Score +
between R2L and U2R [50], and validating the [19] 3 99.96
XGBoost
convolution neural network model for anomaly This
LDA + DT 2 100
attack detection [51]. In this experiment's Work
validation, a value of k=10 is utilized for each
classification model. In each iteration, the Discussion
sampled data used will be shuffled, and then We have presented detection systems with
each subset will contain an equal number of a high accuracy level and low FPR level for the
samples [52]. The results of the performance identification of IoT botnets. Classification models
evaluation with cross-validation are presented in used in the proposed system without LDA shows
Table 15. that DT and RF had the highest accuracy level,
reaching 100%. With LDA, only DT that remains
Comparison with other datasets and other stable, while RF shows a slight decrease in
work accuracy. A comparison of the classification
To determine the effectiveness of the use methods without using the dimensionality
of the proposed LDA method, we compare it with reduction LDA method reveals that DT has a
100% dataset N-BaIoT, and other datasets on stable accuracy of 100%, whereas the other
the DT classification method and previous classification methods experience decreased
research works that use the same dataset and accuracy with LDA. Overall, the achieved levels
also implement dimensional reduction methods. of accuracy show that the use of the
Here, only results of implementation of lower dimensionality reduction LDA method was very
dimensional data were considered. Results of the effective and efficient for IoT botnet detection for
comparison of other datasets are shown in Table classification. DT generates more accurate
16 and comparisons of other work are shown in results than other classifiers, i.e.; AB, k-NN, RF,
Table 17. and GB.
A detection system is better when it
Table 14. Comparison of execution time achieves a high accuracy level and a very low
FPR value. Models with a high accuracy level
Method Without LDA With LDA
k-NN 30908.87 s 73.95 s and high FPR cannot be used. With or without
DT 163.75 s 98.58 s LDA, DT and RF both show an FPR value of 0.
RF 675.74 s 270.36 s Combining dimensionality reduction LDA with DT
AB 1143.27 s 289.11 s and RF generates high accuracy level. Thus, this
GB 5404.97 s 665.13 s
fact shows that the proposed system has a good
performance because a more accurate classifier
was built when a lower FPR was generated.
Table 15 Evaluation with cross-validation Precision shows the reliability of the
Method Average Accuracy (%) Error (%) detection model in the classified sample as
k-NN 99.76 0.0001 positive. DT and RF have an excellent ability to
DT 100 0.0001
RF 100 0.0001 classify samples as positive, with a value of 1,
AB 98.95 0.0006 whether using LDA or without LDA. This is
GB 99.48 0.0003 different from k-NN, AB, and GB, which have no
decrement of precision value when using the
LDA.
Table 16. Comparison results with other datasets Specificity represents how much correct
20% of N- 100% of 30% of data are predicted by detection system. DT and
Metric
BaIoT N-BaIoT MedBIoT RF have a good ability to classify samples as
Accuracy 100 100 100 positive, with or without LDA. This result is
Precision 1 1 1 different for AB and GB, which show a decrease
Sensitivity 1 1 1 in the specificity value with LDA. k-NN has a flat
Specificity 1 1 1 specificity value with or without LDA.
FPR 0 0 0 FNR, which is the critical level when facing
FNR 0 0 0 the detection model, is reflected from model
sensitivity. DT and RF have the highest sensitivity
levels with or without LDA, and their FNR values
were 0. k-NN sensitivity value increases when the LDA method and DT and RF classifiers in IoT
LDA was used, while the FNR value decreases. Botnet detection system provides the best
The reverse was the case for AB and GB, which accuracy of 100% with a FPR value of 0.
has a decrease in the sensitivity value when LDA Detection times for DT and RF are 98.58
was used, with an increase in FNR value. Thus, seconds and 270.36 seconds, respectively.
in term of sensitivity value of 1 and FNR of 0, DT We propose that future studies should
and RF using LDA are the best models for the investigate the efficiency level of using LDA
IoT botnet detection system, because the models method from the perspective of energy
will cover all chances of detecting botnets. consumption and memory used. We also
The efficiency of the detection system is consider extending the framework of the research
observed by the speed of the execution time. for detecting botnet in real time fashion, using
During the experiments of IoT botnet detection, balanced data, which can boost execution time
we observe an increase in execution speed, and maximize the accuracy.
which was significant for k-NN, DT, RF, AB, and
GB classifiers with or without LDA. If we consider REFERENCES
the high accuracy level and lowest FPR, then DT [1] M. S. Mahdavinejad, M. Rezvan, M.
has the fastest time of execution, as it only Barekatain, P. Adibi, P. Barnaghi, and A. P.
requires 98.58s. Sheth, “Machine learning for internet of
Compared to other studies of the same things data analysis: a survey,” Digital
theme that use N-BaIoT dataset and with Communications and Networks, vol. 4, no.
dimensionality reduction LDA method, the 3, pp. 161–175, 2018, doi:
proposed system in this paper shows the highest 10.1016/j.dcan.2017.10.002.
accuracy, and DT classification had the highest [2] K. Somasundaram and K. Selvam, “IOT –
score, i.e.: 100%. Bahsi et al. [15] use the Fisher Attacks and Challenges,” International
Score dimensionality reduction method to reduce Journal of Engineering Research &
data dimension into two features, and in Technology, vol. 8, no. 9, pp. 9–12, 2018,
detecting botnet by using the DT classification doi: 10.31873/ijetr.8.9.67.
method, their accuracy level reaches 98.43%. [3] X. S. Yang, S. Lee, S. Lee, and N. Theera-
Nomm et al. [17] use the entropy method to Umpon, “Information Analysis of High-
reduce data dimension into three features, while Dimensional Data and Applications,”
SVM is used for its detection process, reaching Mathematical Problems in Engineering., vol.
only 93.15% accuracy level. Alqahtani et al. [19] 2015, no. ii, pp. 2–4, 2015, doi:
also use Fisher Score to reduce data into three 10.1155/2015/126740.
features, and select the XGBoost classification [4] A. Ullah, F. H. Khan, U. Qamar, and S.
method to detect botnet, with the accuracy level Bashir, “Dimensionality reduction
reaching 99.96%. approaches and evolving challenges in high
The effectiveness of the detection system dimensional data,” ACM International
can be observed at the level of its accuracy. This Conference Proceeding Series, pp. 1–8,
accuracy is indicated by the use of the number of 2017, doi: 10.1145/3109761.3158407.
features. Without the LDA method with 115 [5] J. Wang, S. Yue, X. Yu, and Y. Wang, “An
features compared to using the LDA method, efficient data reduction method and its
which had only two features, the accuracy level application to cluster analysis,”
of our models remains the same. This fact Neurocomputing, vol. 238, pp. 234–244,
suggests that the reduction in the number of 2017, doi: 10.1016/j.neucom.2017.01.059.
features used in the IoT botnet detection system [6] Z. Cheng and Z. Lu, “A novel efficient
was very effective. Compared to previous feature dimensionality reduction method and
studies, the proposed system is more effective in its application in engineering,” Complexity,
detecting IoT botnets, which is indicated by a vol. 2018, pp. 1-14 2018, doi:
higher level of accuracy. 10.1155/2018/2879640.
[7] T. Zhang and B. Yang, “Dimension reduction
CONCLUSION for big data,” Statistics and its Interface, vol.
The LDA dimensionality reduction method 11, no. 2, pp. 295–306, 2018, doi:
has been implemented and used to detecting IoT 10.4310/SII.2018.v11.n2.a7.
botnets effectively and efficiently. We showed [8] J. Yan et al., “Effective and efficient
that detection system with a very low feature dimensionality reduction for large-scale and
numbers can reach a very high accuracy level, streaming data preprocessing,” IEEE
and those fewer features can boost up time Transactions on Knowledge and Data
execution as well. We observed that combining Engineering, vol. 18, no. 3, pp. 320–332,