The Use of Machine Learning in Digital Forensics: Review Paper
The Use of Machine Learning in Digital Forensics: Review Paper
The Use of Machine Learning in Digital Forensics: Review Paper
Modern College of Business and Science, 3 Bawshar St, Muscat 133, Oman
{20202196,hothefa.shaker,basant}@mcbs.edu.om
1 Introduction
Digital forensics (DF) is a process utilized to analyze and present digital evidence
gathered from various sources such as databases, computers, and digital images
[1]. The increasing number of smart devices in our daily life results in a wide
variety of data, with different categories and characteristics. The digital forensics
investigation process collects and analyzes data to help investigators identify
and prevent unauthorized access to the collected information [2]. In most cases,
the data and evidence collected from a device can be deleted after the crime
has occurred. This process is very important for investigators as it can help
them determine the exact nature of the crime, and identify the victims [3].
Unfortunately, insufficient human resources to perform a thorough investigation
can take a long time.
Although many techniques can be used to manage the massive amount of data
collected by a digital forensics investigator, such as Hadoop, they do not function
c The Author(s) 2023
N. Bacanin and H. Shaker (Eds.): ICIITB 2022, ACSR 104, pp. 96–113, 2023.
https://doi.org/10.2991/978-94-6463-110-4_9
The Use of Machine Learning in Digital Forensics: Review Paper 97
performed in different ways depending on the complexity of the study [12]. Due
to the rise of digital technology, there has been an increase in the number of
data sources that can be collected. Figure 1 illustrates the digital forensics inves-
tigation process. Each stage is explained below.
b) Examination: The second stage aims to examine the data that has been
collected. Through the use of digital forensics techniques and tools, the neces-
sary pieces of information from the data are extracted. Moreover, defining the
data files that contains information of interest, including information concealed
through file compression, access control, and encryption [13,14].
d) Reporting: The final phase of the investigation is reporting, this step involves
analyzing the data collected during the analysis phase and presenting the findings
to the analyst in a formal documentation. It can be challenging to determine the
cause of an event or provide an accurate explanation, however, by gathering
information from the data, an analyst can improve their understanding of the
event and to also prevent any recurrence in the future [15].
Islam et al. proposed a model that can detect copy-move and splice attacks in
color images using local binary pattern (LBP) and discrete cosine transformation
(DCT) operators. The proposed system was evaluated using the SVM kernel. The
DCT and LBP operators capture the changes in the local frequency distribution
and detect micro-patterns. The proposed method considers the inter-cell values
of the LBP blocks and arranges them as feature vectors. The resulting images are
then classified into authentic and tampered ones using the SVM and radial basis
function (RBF). The study results show that the proposed method is well-suited
for image forgery detection and accuracy metrics [43].
Barni et al. proposed a system that can detect contrast enhancement using
an adaptive histogram in JPEG compression. This method is based on the color
SPAM features of an SVM detector. It can then be trained to recognize JPEG-
compressed images with enhanced contrast. The researchers tested the systems
performance by training it against a set of JPEG-compressed images with dif-
ferent quality factors (QFs). It only works well if the QF used matches the one
used in the test and the QFs are more extensive than 80 [44]. The proposed
system can be applied to multimedia analysis forensics.
104 Y. A. Balushi et al.
Huan et al. developed a mobile forensics system using the Apriori and K-means
algorithms. The Apriori algorithm improves the mining efficiency using mining
rules in two parts: generating frequent item sets and extracting the rules that
meet the minimum confidence requirement. Furthermore, it enhances the inti-
macy of the database by using a vertical structure to represent the data. The
clustering results are classified according to the relationship between the vari-
ous individuals. The researchers used the association rules to analyze the data.
They found that the high confidence rules indicate that the user’s daily habits
are consistent with the characteristics of the data [60] (Table 1).
ML Algorithm Limitation
SVM algorithm Not applicable for huge datasets
DT algorithm Not adequate for solving regression issues
KNN algorithm Less effective with large date sets and high number of dimensions
K-Means algorithm Specify the K value from the beginning
The Use of Machine Learning in Digital Forensics: Review Paper 109
5 Conclusion
The digital forensics domain has grown in many aspects. Forensic analysts have
proven many difficulties they face in each case to analyze big data, such as
images, video, etc., that may assist in revealing events. Over time, some new
challenges are emerging in digital forensics. This led to the use of automation
and intelligent techniques that facilitate the work of investigators. This research
has validated various ML algorithms to solve digital forensic challenges, e.g.,
SVM, KNN, DT, PCA, SVD, K-Means, NB, ANN, LR and RF. Algorithms cat-
egorize authentic data from fake ones for evidence in court. Finally, the paper
summarized the best practice for each algorithm in digital forensics according
to its features, advantages, and disadvantages. Based on the proposed research
papers, K-Means focuses on recovering removed digital evidence from memory
locations. The SVM, PCA, and SVD are the best possible practices to be imple-
mented in an image forensics investigation, while the KNN and NB support
network forensics. Machine learning developers have made significant progress
in making these systems think like humans in the past few years. They now
perform complex tasks and make decisions based on in depth analysis. While
progress has been made, machine learning still has many limitations such as
ethical aspects, lack of interpretablility, insufficient data to train machines and
lack of reproducibility.
References
1. Joakim Kävrestad. Fundamentals of Digital Forensics. Springer, 2020.
2. Konstantinos Karampidis, Ergina Kavallieratou, and Giorgos Papadourakis. A
review of image steganalysis techniques for digital forensics. Journal of information
security and applications, 40:217–235, 2018.
3. Graeme Horsman. Tool testing and reliability issues in the field of digital forensics.
Digital Investigation, 28:163–175, 2019.
4. Godson Kalipe, Vikas Gautham, and Rajat Kumar Behera. Predicting malarial
outbreak using machine learning and deep learning approach: a review and analysis.
In 2018 International Conference on Information Technology (ICIT), pages 33–38.
IEEE, 2018.
5. Anand Handa, Ashu Sharma, and Sandeep K Shukla. Machine learning in cyber-
security: A review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge
Discovery, 9(4):e1306, 2019.
6. R Saravanan and Pothula Sujatha. A state of art techniques on machine learning
algorithms: a perspective of supervised learning approaches in data classification.
In 2018 Second International Conference on Intelligent Computing and Control
Systems (ICICCS), pages 945–949. IEEE, 2018.
7. Athanasios Dimitriadis, Nenad Ivezic, Boonserm Kulvatunyou, and Ioannis
Mavridis. D4i-digital forensics framework for reviewing and investigating cyber
attacks.Array, 5:100015, 2020.
8. Sana Qadir and Basirah Noor. Applications of machine learning in digital forensics.
In 2021 International Conference on Digital Futures and Transformative Technolo-
gies (ICoDT2), pages 1–8. IEEE, 2021.
110 Y. A. Balushi et al.
28. Shahadat Uddin, Arif Khan, Md Ekramul Hossain, and Mohammad Ali Moni.
Comparing different supervised machine learning algorithms for disease prediction.
BMC medical informatics and decision making, 19(1):1–16, 2019.
29. Iqbal H Sarker. Machine learning: Algorithms, real-world applications and research
directions. SN Computer Science, 2(3):1–21, 2021.
30. Susmita Ray. A quick review of machine learning algorithms. In 2019 International
conference on machine learning, big data, cloud and parallel computing (COMIT-
Con), pages 35–39. IEEE, 2019.
31. Mei Sze Tan, Siow-Wee Chang, Phaik Leng Cheah, and Hwa Jen Yap. Integrative
machine learning analysis of multiple gene expression profiles in cervical cancer.
PeerJ, 6:e5285, 2018.
32. Joshua P Parreco, Antonio E Hidalgo, Alejandro D Badilla, Omar Ilyas, and Rishi
Rattan. Predicting central line-associated bloodstream infections and mortality
using supervised machine learning. Journal of critical care, 45:156–162, 2018.
33. Loong Chuen Lee and Abdul Aziz Jemain. On overview of pca application
strategy in processing high dimensionality forensic data. Microchemical Journal,
169:106608, 2021.
34. Lian Niu. A review of the application of logistic regression in educational research:
Common issues, implications, and suggestions. Educational Review, 72(1):41–67,
2020. A review of the application of logistic regression in educational research:
Common issues, implications, and suggestions. Educational Review, 72(1):41–67,
2020.
35. Steven L Brunton and J Nathan Kutz. Data-driven science and engineering:
Machine learning, dynamical systems, and control. Cambridge University Press,
2022.
36. M Sornalakshmi, S Balamurali, M Venkatesulu, M Navaneetha Krishnan, Laksh-
mana Kumar Ramasamy, Seifedine Kadry, Gunasekaran Manogaran, Ching-Hsien
Hsu, and Bala Anand Muthu. Hybrid method for mining rules based on enhanced
apriori algorithm with sequential minimal optimization in healthcare industry.
Neural Computing and Applications, pages 1–14, 2020.
37. Dijana Jovanovic, Milos Antonijevic, Milos Stankovic, Miodrag Zivkovic, Marko
Tanaskovic, and Nebojsa Bacanin. Tuning machine learning models using a group
search firefly algorithm for credit card fraud detection. Mathematics, 10(13):2272,
2022.
38. Nebojsa Bacanin, Catalin Stoean, Miodrag Zivkovic, Dijana Jovanovic, Milos
Antonijevic, and Djordje Mladenovic. Multi-swarm algorithm for extreme learn-
ing machine optimization. Sensors, 22(11):4204, 2022.
39. Nebojsa Bacanin, Miodrag Zivkovic, Fadi Al-Turjman, K Venkatachalam, Pavel
Trojovskỳ, Ivana Strumberger, and Timea Bezdan. Hybridized sine cosine algo-
rithm with convolutional neural networks dropout regularization application. Sci-
entific Reports, 12(1):1–20, 2022.
40. Mohamed Salb, Luka Jovanovic, Miodrag Zivkovic, Eva Tuba, Ali Elsadai, and
Nebojsa Bacanin. Training logistic regression model by enhanced moth flame opti-
mizer for spam email classification. In Computer Networks and Inventive Commu-
nication Technologies, pages 753–768. Springer, 2023.
41. Nebojsa Bacanin, Miodrag Zivkovic, Marko Sarac, Aleksandar Petrovic, Ivana
Strumberger, Milos Antonijevic, Andrija Petrovic, and K Venkatachalam. A novel
multiswarm firefly algorithm: An application for plant classification. In Interna-
tional Conference on Intelligent and Fuzzy Systems, pages 1007–1016. Springer,
2022.
112 Y. A. Balushi et al.
42. Ehsan Nowroozi, Ali Dehghantanha, Reza M Parizi, and Kim-Kwang Raymond
Choo. A survey of machine learning techniques in adversarial image forensics.
Computers & Security, 100:102092, 2021.
43. Mohammad Manzurul Islam, Gour Karmakar, Joarder Kamruzzaman, Manzur
Murshed, Gayan Kahandawa, and Nahida Parvin. Detecting splicing and copy-
move attacks in color images. In 2018 Digital Image Computing: Techniques and
Applications (DICTA), pages 1–7. IEEE, 2018.
44. Mauro Barni, Ehsan Nowroozi, and Benedetta Tondi. Detection of adaptive his-
togram equalization robust against jpeg compression. In 2018 International Work-
shop on Biometrics and Forensics (IWBF), pages 1–8. IEEE, 2018.
45. Sara Ferreira, Mário Antunes, and Manuel E Correia. Exposing manipulated pho-
tos and videos in digital forensics analysis. Journal of Imaging, 7(7):102, 2021.
46. Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. Unmask-
ing deepfakes with simple features. arXiv preprint arXiv:1911.00686, 2019.
47. Gurpal Singh Chhabra, Varinderpal Singh, and Maninder Singh. Hadoop-based
analytic framework for cyber forensics. International Journal of Communication
Systems, 31(15):e3772, 2018.
48. Nighat Usman, Saeeda Usman, Fazlullah Khan, Mian Ahmad Jan, Ahthasham
Sajid, Mamoun Alazab, and Paul Watters. Intelligent dynamic malware detection
using machine learning in ip reputation for forensics data analytics. Future Gen-
eration Computer Systems, 118:124–141, 2021.
49. Amit V Kachavimath, Shubhangeni Vijay Nazare, and Sheetal S Akki. Distributed
denial of service attack detection using naı̈ve bayes and k-nearest neighbor for
network forensics. In 2020 2nd International conference on innovative mechanisms
for industry applications (ICIMIA), pages 711–717. IEEE, 2020.
50. Paola Barra, Carmen Bisogni, Michele Nappi, David Freire-Obregón, and Modesto
Castrillón-Santana. Gait analysis for gender classification in forensics. In Interna-
tional Conference on Dependability in Sensor, Cloud, and Big Data Systems and
Applications, pages 180–190. Springer, 2019.
51. Anton Yudhana, Imam Riadi, and Faizin Ridho. Ddos classification using neural
network and naı̈ve bayes methods for network forensics. International Journal of
Advanced Computer Science and Applications, 9(11), 2018
52. T Satya Sudha and Ch Rupa. Analysis and evaluation of integrated cyber crime
offences. In 2019 Innovations in Power and Advanced Computing Technologies (i-
PACT), volume 1, pages 1–6. IEEE, 2019.
53. Muhammad Faris Ruriawan, Bintaran Anggono, Isaac Anugerah Siahaan, and
Yudha Purwanto. Development of digital evidence collector and file classification
system with k-means algorithm. In 2019 IEEE Asia Pacific Conference on Wireless
and Mobile (APWiMob), pages 64–68. IEEE, 2019.
54. Dixit Roy. Naskar, & chakraborty.(2020). digital image forensics theory and imple-
mentation. Studies in Computational Intelligence, 755.
55. Muhammad Ali, Stavros Shiaeles, Nathan Clarke, and Dimitrios Kontogeorgis. A
proactive malicious software identification approach for digital forensic examiners.
Journal of Information Security and Applications, 47:139–155, 2019.
56. Maryam Hina, Mohsan Ali, Abdul Rehman Javed, Gautam Srivastava,
Thippa Reddy Gadekallu, and Zunera Jalil. Email classification and foren-
sics analysis using machine learning. In 2021 IEEE SmartWorld, Ubiquitous
Intelligence & Computing, Advanced & Trusted Computing, Scalable Comput-
ing & Communications, Internet of People and Smart City Innovation (Smart-
World/SCALCOM/UIC/ATC/IOP/SCI), pages 630–635. IEEE, 2021.
The Use of Machine Learning in Digital Forensics: Review Paper 113
57. Belal Ahmed, T Aaron Gulliver, and Saif alZahir. Blind copy-move forgery detec-
tion using svd and ks test. SN Applied Sciences, 2(8):1–12, 2020.
58. Jobin Varghese and C Sathish Kumar. Robust copy-move forgery detection algo-
rithm using singular value decomposition and discrete orthonormal stockwell trans-
form. Australian Journal of Forensic Sciences, 52(6):711–727, 2020.
59. Turker Tuncer, Fatih Ertam, and Sengul Dogan. Automated malware identifica-
tion method using image descriptors and singular value decomposition. Multimedia
Tools and Applications, 80(7):10881–10900, 2021.
60. Huan Li, Bin Xi, Shunxiang Wu, Jingchun Jiang, and Yu Rao. The application of
association analysis in mobile phone forensics system. In International Conference
on Intelligence Science, pages 126–133. Springer, 2018.
61. Timothy Bollé, Eoghan Casey, and Maëlig Jacquet. The role of evaluations in
reaching decisions using automated systems supporting forensic analysis. Forensic
Science International: Digital Investigation, 34:301016, 2020.
62. Abiodun A Solanke. Explainable digital forensics ai: Towards mitigating distrust
in ai-based digital forensics analysis using interpretable models. Forensic Science
International: Digital Investigation, 42:301403, 2022.
63. Nighat Usman, Saeeda Usman, Fazlullah Khan, Mian Ahmad Jan, Ahthasham
Sajid, Mamoun Alazab, and Paul Watters. Intelligent dynamic malware detection
using machine learning in ip reputation for forensics data analytics. Future Gen-
eration Computer Systems, 118:124–141, 2021.
64. Felix Anda, David Lillis, Nhien-An Le-Khac, and Mark Scanlon. Evaluating auto-
mated facial age estimation techniques for digital forensics. In 2018 IEEE Security
and Privacy Workshops (SPW), pages 129–139. IEEE, 2018.
Open Access This chapter is licensed under the terms of the Creative Commons
Attribution-NonCommercial 2.5 International License (http://creativecommons.org/
licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, dis-
tribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons
license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter’s
Creative Commons license, unless indicated otherwise in a credit line to the material. If
material is not included in the chapter’s Creative Commons license and your intended
use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder.
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: