Special Issue On Application of AI in Digital Forensics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

KI - Künstliche Intelligenz (2022) 36:121–124

https://doi.org/10.1007/s13218-022-00777-3

EDITORIAL

Special Issue on Application of AI in Digital Forensics


Johannes Fähndrich1 · Wilfried Honekamp2 · Roman Povalej3 · Heiko Rittelmeier4 · Silvio Berner5

Published online: 21 November 2022


© The Author(s) 2022

Keywords  Digital forensics · Artificial intelligence · Legal investigation · Cybercrime

1 Introduction [4] but has found little application in the forensic context.
Approaches to the use of machine learning in forensics have
When crimes are committed, countless traces are created. also been part of the scientific discourse for some time [5].
An constantly growing proportion of these are digital traces. The field of digital forensics has specific requirements for
For example approximately 400 thousand new variants of the methods used. Chain of custody and legal certainty as
malware enter circulation every day [1]. The increasing well as data protection are major hurdles for the use of Arti-
proliferation of information systems offers an ever-growing ficial Intelligence (AI).
gateway for these malicious programs. With the increasing With this growing amount of potential sources of evi-
use of digital communication channels such as instant mes- dence, the application of AI in forensics is essential. Machine
saging, the number of traces to be analyzed has grown far learning and data science methods must be extended to be
beyond human capabilities [2]. With the simplified use of explainable and valid for legal purposes. One example is the
anonymization techniques, new challenges arise, such as article by Bermann et al. in this special issue, which uses
the use of author determination methods [3]. This has been deep learning for the classification of blood spatter patterns
researched for years for traditional media such as e-mail, in criminal investigations. This article argues that the use of
such methods should not be based on trust, but controls of
the used data and the learned features which influenced the
Roman Povalej, Heiko Rittelmeier and Silvio Berner have output of the used methods.
contributed equally to this work. In machine learning, the use of statistical models is vali-
dated through experiments, where the data is separated in
* Johannes Fähndrich
johannesfaehndrich@hfpol-bw.de
three sets [6]. This is done to build models which perform
well outside the training data set and reduce problems like
Wilfried Honekamp
wilfried.honekamp@hochschule-stralsund.de
overfitting [7].
Roman Povalej
roman.povalej@polizei.niedersachsen.de
Training set: Most of the data, which is used for training
the model. The data points are normally
Heiko Rittelmeier
heiko@rittelmeier.de
randomly chosen (following a probability
distribution) to reduce bias.
Silvio Berner
Silvio.Berner@polizei.sachsen.de
1
Police College Baden-Württemberg, Validation set: This is a smaller part of the data to validate
Villingen‑Schwenningen, Germany hyper parameters of the learned model.
2
University of Applied Science Stralsund, Stralsund, Germany After training a model with some param-
3
Police Academy of Lower Saxony, Nienburg (Weser), eters, one validates its performance on
Germany this data. The date should have the same
4
Central Office for Information Technology in the Security probability distribution as the training set
Sector (ZITiS), Munich, Germany to reduce bias.
5
University of Applied Police Sciences Saxony,
Rothenburg O.L., Germany

13
Vol.:(0123456789)

122 KI - Künstliche Intelligenz (2022) 36:121–124

Test set: Some of the data, which is used to test the and regression algorithms. In this regard, the interac-
model. This data set also should have the tion between forensic scientists and investigators must be
same probability distribution as the train- redefined depending on the context. An attempt to for-
ing set and the validation set. The final malize and analyze this process has already been made.
performance of the model is tested on this The article of Spranger et al. describes a system which
data. Sometimes, this data is not available is designed for the analysis of mobile communication,
to the creators of the model. enabling investigators to deal with the massive amount
of communication found in evidence like smartphones.
Depending on the size and quality of the data sets, the Various support systems have been presented and their
learned model has different properties like generalizabil- problems and limitations have been discussed. Unfortu-
ity, robustness to error and bias, or accuracy. Using this nately, language models such as BERT and image models
approach, depending on the outcome of the model perfor- such as Image GPT-3 have not yet been integrated into
mance on the test set, we can decide on the utilization of the forensic applications [12].
learned model. For the use of machine learning in forensics, The ever-increasing flood of data to be analyzed, and
the learned models have to hold to high standards, because thus the information content can only be handled by
errors could have fatal influences on human lives. This automation. Artificial intelligence methods can and will
means that the test data set must be large and well analyzed increasingly support investigative authorities in the future.
to reduce bias or other errors. One challenge in itself is the multimodal processing of
Developments in recent years have shown that the hetero- data. This includes, for example, object recognition and
geneity of the traces to be processed and their data errors, thus the linking of pictorial and textual representations.
such as incorrect or outdated information, inconsistencies In this context, research is still being done today on the
or missing values, and the amount of irrelevant data, are semantic analysis of images or videos. Image GPT is a
particularly problematic [8]. The lack of automatic detection recent example of how a system can be taught identifiers,
of data types, such as entropy analysis, and the lack of onto- also called labels, using images. Through one-shot learn-
logical integration, such as data property classification [9], ing, this system can recognize objects in images without
and thus understanding the meaning of unstructured data, having seen them before [13].
make this work a high manual effort [2]. One example of Many areas of AI research can find application in foren-
such data is the analysis of videos with the goal of identify- sics. Unfortunately, this connection has not yet been estab-
ing humans based on their gait, cloth, or body type features. lished to the point where the scientific community wants
The article of Becker et al. presents the result of a research to evaluate it on a large scale. Data sets, problem sets, and
project, which uses AI for a forensic analysis of persons. application scenarios could and should be created so that
Artificial intelligence methods have not yet been com- more of the new methods can be applied. Specific to the
monly used in forensic investigations, not only for technical application area under investigation is, however, that the
reasons, but also for legal reasons [10]. Typical applications prototypes developed in research must each be tested for
include automatic profiling of suspects, vehicle identifica- legal explainability and forensic replicability. A sufficient
tion, analysis of cryptocurrencies, or automatic recogni- understanding of the methods used is necessary to ensure
tion of child pornography imagery [11]. Explainable AI that no errors have occurred in the classification. How-
(XAI) is an important methodological approach to meet ever, with most blackbox methods, such as large neural
the legal requirements mentions above, as it can be used networks, comprehensibility is only possible to a limited
to trace how the systems come to their conclusions. The extent [14]. Several hurdles therefore stand in the way of
article of Szepannek and Lübke shows an investigation into its use in an investigative procedure. One must provide the
partial dependence plots for the increase of interpretability right insights and then stand up in court by explaining why
for methods of machine learning. They show that partial this method and its result can be used. We interviewed one
dependence plots can be used in automated classification of of the leading experts in the science of AI and explain-
chemical analysis for glass identification tasks. ability Prof. Dr. Müller at the Technical University Berlin
At many points in an investigation, artificial intelligence who explains the basic ideas of XAI. An interesting inter-
methods can facilitate the work, even when the flow of an disciplinary field of research between computer science
investigation changes between several people. Here, errors and law is emerging here.
could be avoided, and automatable process steps could be This special issue collects papers on AI with application
mapped by machine learning and automatically adopted in to forensics, focusing on the fusion of computer science,
the future. The article of Solanke et al. analyzes common data analytics, and machine learning with discussion of
methods of machine learning and discusses techniques law and ethics for their application to cyberforensics.
for evaluating their effectiveness, e.g. for classification

13
KI - Künstliche Intelligenz (2022) 36:121–124 123

2 Content • Police Informatics. https://​poliz​eiinf​ormat​ik.​de/


• The International Conference on Forensic Computer Sci-
This special Issue includes the following content. ence. http://​icofcs.​org/
• European Academy of Forensic Science Conference.
https://​www.​eafs2​022.​eu/.
2.1 Technical Contributions

• Digital Forensics and Strong AI 3.2 Journals


• Explaining Artificial Intelligence with Care Analyzing
the Explainability of Black Box Multiclass Machine • IEEE Transactions on Information Forensics and Secu-
Learning Models in Forensics rity. http://​www.​signa​lproc​essin​gsoci​ety.​org/​publi​catio​
  (Gero Szepannek, Karsten Lübke) ns/​perio​dicals/​foren​sics/
• Automatic Classification of Bloodstains with Deep • Forensic Science International: Digital Investigation.
Learning Methods https://​www.​journ​als.​elsev​ier.​com/​foren​sic-​scien​ce-​inter​
  (Tommy Bergmann, Martin Klöden, Jan Dreßler, Dirk natio​nal-​digit​al-​inves​tigat​ion
Labudde). • Forensic Science Communications. https://​archi​ves.​fbi.​
gov/​archi​ves/​about-​us/​lab/​foren​sic-​scien​ce-​commu​nicat​
ions
2.2 Discussion Paper • International Journal of Cyber-Security and Digital
Forensics. http://​sdiwc.​net/​ijcsdf/
• Digital Forensics AI: Evaluating, Standardizing and
• International Journal of Digital Crime and Forensics.
Optimizing Digital Evidence Mining Techniques
https://​www.​igi-​global.​com/​journ​al/​inter​natio​nal-​journ​
  (Abiodun Abdullahi Solanke, Maria Angela Biasiotti).
al-​digit​al-​crime-​foren​sics/​1112
• International Journal of Electronic Security and Digital
2.3 System Description Forensics. https://​dl.​acm.​org/​journ​al/​ijesdf
• International Journal of Forensic Computer Science.
• MoNA: A Forensic Analysis Platform for Mobile Com- http://​ijofcs.​org/​polic​ies-​focus.​html
munication • The Journal of Digital Forensics, Security and Law.
  (Michael Spranger, Jian Xi, Lukas Jaeckel, Jenny Fel- https://​www.​jdfsl.​org/
ser, Dirk Labudde). • International Journal of Cyber Forensics and Advanced
Threat Investigations. https://​conce​ptech​int.​net/​index.​
2.4 Project Report php/​CFATI

• COMBI: Artificial intelligence for computer-based foren-


sic analysis of persons Project Reports
  (Sven Becker, Marie Heuschkel, Sabine Richter, Dirk Open Access  This article is licensed under a Creative Commons Attri-
Labudde). bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
as you give appropriate credit to the original author(s) and the source,
2.5 Interviews provide a link to the Creative Commons licence, and indicate if changes
were made. The images or other third party material in this article are
• Iterview: AI Expert Prof. Müller on XAI Interview included in the article's Creative Commons licence, unless indicated
otherwise in a credit line to the material. If material is not included in
  (Johannes Fähndrich, Roman Povalej, Heiko Rit- the article's Creative Commons licence and your intended use is not
telmeier, Silvio Berner). permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a
copy of this licence, visit http://​creat​iveco​mmons.​org/​licen​ses/​by/4.​0/.

3 Service

3.1 Conferences References
1. BSI (2021) Die Lage der IT-Sicherheit in Deutschland. http://​
• International Workshop on Digital Forensics - An inter- www.​bmi.​bund.​de/​Share​dDocs/​downl​oads/​DE/​publi​katio​nen/​
exchange of law enforement and science. https://​infor​ themen/i​ t-d​ igita​ lpoli​ tik/b​ si-l​ agebe​ richt-c​ ybers​ icher​ heit-2​ 021.p​ df;​
matik​2022.​poliz​eiinf​ormat​ik.​de/ jsess​ionid=​90C4A​17D4F​4086D​74E30​8B211​D0A66​C1.2_​cid28​
7?__​blob=​publi​catio​nFile​&v=3. Accessed 11 Apr 2022

13

124 KI - Künstliche Intelligenz (2022) 36:121–124

2. Spranger M, Heinke F, Appelt L, Puder M, Labudde D (2016) 9. Glimm B, Horrocks I, Motik B, Stoilos G (2010) Optimising
MoNA: automated identification of evidence in forensic short ontology classification. In: International semantic web confer-
messages. Int J Adv Secur 9:1 ence. Springer, Berlin, pp 225–240
3. Iqbal F, Debbabi M, Fung BC (2020) Machine learning for author- 10. Rademacher T (2020) Artificial intelligence and law enforcement.
ship attribution and cyber forensics. Springer, Berlin In: Regulating artificial intelligence. Springer, Berlin, pp 225–254
4. De Vel O (2000) Mining e-mail authorship. In: Proceedings of 11. Raaijmakers S (2019) Artificial intelligence for law enforcement:
the workshop on text mining, ACM international conference on challenges and opportunities. IEEE Secur Priv 17(5):74–77
knowledge discovery and data mining (KDD’2000). Citeseer 12. Povalej R, Rittelmeier H, Fähndrich J, Berner S, Honekamp W,
5. McClendon L, Meghanathan N (2015) Using machine learning Labudde D (2021) Die Enkel von Locard. Inf Spekt 44(5):355–363
algorithms to analyze crime data. Mach Learn Appl Int J (MLAIJ) 13. Chen M, Radford A (2020) Sutskever: image GPT. https://o​ penai.​
2(1):1–12 com/​blog/​image-​gpt/. Accessed 11 Apr 2022
6. Kohavi R et al (1995) A study of cross-validation and bootstrap 14. Samek W, Montavon G, Vedaldi A, Hansen LK, Müller K-R
for accuracy estimation and model selection. In: Ijcai, vol 14. (2019) Explainable AI: interpreting, explaining and visualizing
Montreal, Canada, pp 1137–1145 deep learning, vol 11700. Springer, Berlin
7. Hawkins DM (2004) The problem of overfitting. J Chem Inf Com-
put Sci 44(1):1–12
8. Garfinkel S (2012) Lessons learned writing digital forensics tools
and managing a 30 TB digital evidence corpus. Digit Investig
9:80–89

13

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy