Digital Forgery g26
Digital Forgery g26
Submitted in
partial fulfillment of the requirements for the degree of
Bachelor of Technology
by
Assistant Professor
1
SELF-DECLARATION
We, the undersigned, hereby declare that the project work titled:
DIGITAL FORGERY DETECTION
submitted to the Department Of Computer Science And Engineering in Indian Institute Of
Information Technology Kota, in partial fulfillment of the requirements for the degree of
Bachelor of Technology in Computer Science and Engineering, is an original work carried out by
us.
We further declare that:
1. This project report contains novel and independent work completed solely by the
undersigned.
2. No part of this work has been plagiarized or copied from any other source without proper
acknowledgment, and the similarity index complies with the institute's prescribed norms.
3. This work has not been submitted, in full or in part, to any other university or institution
for the award of any degree, diploma, or certificate.
We understand that any violation of the above declaration will lead to the rejection of the project
report and may result in disciplinary action as per the institute's rules.
We affirm the above statements to be true to the best of our knowledge and belief.
Date: 11/12/2024
Place:KOTA
2
Certificate
This is to certify that the thesis entitled, “DIGITAL FORGERY DETECTION”, submitted by
Rikith Devangam (2021KUCP1138), Sanjay Bhargav (2021KUCP1076), Mourya Arnepalli
(2021KUCP1097) and Boreddy Muktheswar Reddy (2021KUCP1081) in partial fulfillment
of the requirements for the award of Bachelor of Technology Degree in Department of
Computer Science & Engineering at Indian Institute of Information Technology Kota is an
authentic work carried out by him under my supervision and guidance.
To the best of my knowledge, the matter embodied in this report has not been submitted
elsewhere to any other university/institute for the award of any other degree.
3
Acknowledgment
We, the students of the Indian Institute of Information Technology, Kota, from the
Department of Computer Science and Engineering, express our heartfelt gratitude to all
those who have contributed to the successful completion of our project titled "Digital
Forgery Detection".
Firstly, we extend our deepest appreciation to our project supervisor, Dr. Ajay Nehra, for
his unwavering guidance, insightful feedback, and mentorship throughout this research
journey. His expertise and encouragement have been pivotal in shaping the direction and
outcomes of our work.
We are also grateful to our mentors and faculty members for their valuable inputs and
support. Their constructive criticism and suggestions have greatly enhanced the quality of
this study.
We sincerely thank the Indian Institute of Information Technology, Kota, for providing us
with an inspiring academic environment and access to the necessary resources to
accomplish this project. Our gratitude extends to our colleagues and peers for their
collaborative spirit and helpful discussions, which enriched our learning experience.
Lastly, we acknowledge the constant encouragement and moral support of our families
and friends, without whom this endeavor would not have been possible. This project has
been a remarkable learning experience, and we are truly thankful for the collective efforts
that have contributed to its success.
4
Abstract
The rise of advanced digital manipulation tools has led to a significant increase in image
forgeries, making it increasingly challenging to verify the authenticity of visual content.
Traditional techniques for detecting image forgeries often rely on manually crafted
features, which struggle to capture subtle alterations effectively. This paper proposes a
hybrid model for image forgery detection, integrating Error Level Analysis (ELA)
preprocessing, DenseNet-201 feature extraction, and Squeeze-and-Excitation (SE)
blocks for attention mechanisms.
The proposed model is evaluated using the CASIA v2 dataset, demonstrating superior
performance in detecting both overt and covert image manipulations. By combining
advanced feature extraction, attention mechanisms, and preprocessing techniques, this
hybrid approach achieves improved detection accuracy and provides a degree of
interpretability. Its robustness and effectiveness make it particularly suitable for practical
applications in digital forensics, content verification, and other domains requiring reliable
forgery detection.
5
Table of Contents
1. Introduction…………………………………………………….……..7
1.1 Background………………………………...……………….....7
1.2 Motivation and Objectives…………………...………………..8
1.3 Scope of the Project………………………...………………....9
2. Literature Review…………………………………………………....11
2.1 Research Gaps……………………………...………………...12
3. Problem Statement and Objectives……………………………….…14
3.1 Problem Statement……………………………...…………....14
3.2 Objectives……………………………...…………………….15
3.3 Research Questions…………………………………………..16
4. Methodology………………………………………………………...17
4.1 Overview of the Approach…………………………………...18
4.2 Data preprocessing…………………………………………...19
4.3 Feature Extraction and Encoding…………………………….20
5. System Design and Implementation………………………………...23
5.1 Architecture Design…………………………...……………..23
5.2 Tools and Technologies……………………………...……. ...25
5.3 Algorithms………………………………………...………....27
6. Results………………………………………….………………..…..29
6.1 Performance Metrics………………………………………....29
6.2 Results and Analysis………………………………………....30
7. Discussion…………………………………………….…………..…34
8. Conclusion and Future Work………………………………………..36
8.1 Conclusion…………………………………………………...36
8.2 Future Directions…………………………………………….37
9. References…………………………………………………………..38
10. Appendices……………………………………………………..…..40
6
CHAPTER 1
Introduction
With the development of sophisticated editing tools, digital image forgery has seen a
significant rise in frequency, raising concerns about the authenticity of visual content.
Identifying manipulated images has become increasingly challenging, particularly when
the forgeries are subtle or skillfully executed. Traditional forgery detection techniques
often rely on manually designed features, such as inconsistencies in illumination or
compression artifacts. However, these methods have struggled to detect complex
manipulations effectively, especially with the rapid advancements in digital editing tools.
content.
1.1 Background
Deep learning approaches have transformed the field of forgery detection, allowing
models to learn intricate hierarchies of features directly from data. In this regard, CNNs
are particularly effective in identifying manipulated regions in images. However, standard
CNNs tend to miss subtle manipulation artifacts, requiring preprocessing techniques and
attention mechanisms to enhance their performance.
This research paper presents a hybrid model for image forgery detection that integrates
Error Level Analysis (ELA) preprocessing, DenseNet-201 for feature extraction, and
Squeeze-and-Excitation (SE) blocks for attention. ELA preprocessing focuses the model's
attention on the critical inconsistencies in the tampered regions. Besides, SE blocks learn
to dynamically adjust the importance of features channel by channel, improving the
accuracy of detection, especially for subtle forgeries.
Accessibility of the image processing software has increased the ease with which
manipulated photographs can be edited and shared over the Internet. Thus, there is a
growing need for advanced methods of forgery detection. Among those used for
tampering identification and the quality of compressed or reconstructed images PSNR
and ELA methods are used extensively. These techniques find anomalies or changes in an
image, and the photos are flagged for further investigation.
Detection and analysis of forgeries by machine learning algorithms play a critical role in
7
the entire process. Segmentation and identification of parts of an image by algorithms can
be very helpful, especially in forensic and scientific investigation work where precision
matters. It is significantly effective to increase the efficiency and accuracy in detecting
forged images, contributing to improved forensic and law enforcement investigations.
In summary, this combines machine learning, image processing techniques, and advanced
architectures such as DenseNet-201 with attention mechanisms to create a robust solution
to the ever-increasing challenge of digital forgery detection. This study aims to further
refine both overt and covert manipulations within visual content.
8
preprocessing, DenseNet-201 for feature extraction, and Squeeze-and-Excitation
(SE) blocks for attention mechanisms, to improve forgery detection accuracy.
2. To enhance the detection of subtle manipulations: By integrating preprocessing
methods like ELA and leveraging DenseNet-201's deep architecture for feature
extraction, the study aims to capture intricate details and subtle artifacts, which are
often overlooked by traditional methods.
3. To evaluate the effectiveness of the proposed model using standard datasets: The
proposed model will be evaluated using well-known datasets such as the CASIA
v2 dataset for image forgery and DFDC for video forgery to assess its performance
in detecting both visible and hidden forgeries.
4. To improve detection for both images and videos: The model will be designed to
detect forgeries in both images and videos, ensuring that it can handle a wide
range of digital media formats.
5. To provide a practical solution for digital forensics: The research seeks to
contribute to the field of digital forensics by offering a model that can be used for
real-world applications, including content verification, media authentication, and
legal investigations.
6. To explore the use of attention mechanisms in forgery detection: The study aims to
explore how attention mechanisms (such as SE blocks) can enhance the detection
process by focusing the model's attention on the most relevant features of
manipulated images and videos, thereby improving detection accuracy.
By achieving these objectives, the study seeks to provide an effective, scalable, and
practical solution for detecting digital forgeries in various types of visual content.
9
Hybrid Model Development:
The project integrates different state-of-the-art techniques for forgery detection. The
model uses ELA preprocessing, DenseNet-201 for feature extraction, and SE blocks for
attention mechanisms. This hybrid approach is designed to enhance the model's ability to
detect both subtle and overt manipulations in digital media.
Dataset Utilization:
The project will use public datasets, like CASIA v2, with both authentic and forged
images and DFDC which consists of both original and tampered videos. The dataset to be
used is used as a benchmark for evaluating the performance of a model on different kinds
of manipulations. The dataset will be used for testing and training the model.
Practical Application:
While the project primarily aims to advance the field of forgery detection through
research, its scope also extends to practical applications in digital forensics, content
verification, and law enforcement. The model could potentially be used to verify the
authenticity of digital media in real-world scenarios, such as social media content, legal
proceedings, and journalistic investigations.
The scope of this project aims at creating a hybrid approach both in images and video in
detection forgery. At its core, the technique targets improving the subtle detection in deep
learning models utilizing attention mechanisms and preprocessing methods to produce a
reliable forensic and content verification tool in the end.
10
CHAPTER 2
Literature Review
Image forgery detection is crucial due to its increasing ease of digital manipulation.
Methods such as Convolutional Neural Networks (CNNs) and Error Level Analysis
(ELA) have gained popularity in the detection of these types of forgeries.ELA is a
technique used to identify parts of an image that have different levels of compression,
which indicates potential manipulation. Here [Patekar, Sankalp & Khan, Sumaiya &
Bhusare, Diksha & Bhujbal, Manish & Hegde, and Gayatri] showed this method involves
compressing an image at a low quality and then saving it again at a higher quality.[2]
Differences between the original and the re-saved image highlight areas of tampering.
Studies have demonstrated the efficacy of ELA in revealing altered regions, with various
tools and software implementing this technique.[2]
The area of digital forgery detection has seen great advancements with the help of deep
learning algorithms. These methods streamline feature extraction and representation,
reducing the need for manual engineering. Despite their potential, deep learning
techniques require substantial amounts of training data, posing challenges in data-scarce
scenarios.[9]
In [9] it involves training a deep learning model from scratch. Requires large datasets and
significant computational resources. Allows customization of architecture and settings
but is less effective with limited data.[9]
The choice of an algorithm is contingent upon a number of criteria, including the nature
of the forgeries, the amount of the dataset, the processing resources available, and the
attributes that are accessible. Each technique has pros and cons. [3]
SVM is a type of algorithm for supervised learning that's frequently used for regression
and classification applications. Support Vector Machines (SVMs) function by identifying
the most effective hyperplane that distinguishes between various classes. Key
11
characteristics include Binary Classification: Highly effective for distinguishing between
authentic and altered images. Handling High-Dimensional Feature Spaces: Suitable for
extracting diverse characteristics from images. Non-linear Interaction: Capable of
handling non-linear relationships through the use of kernel functions.[3]
CNNs are complex neural network structures that have gained immense importance in the
field of computer vision. They are particularly valuable in the task of identifying
manipulated or counterfeit images. Among its qualities are: Autonomous Feature
Extraction: This technique minimizes the requirement for human feature engineering by
autonomously extracting hierarchical features from unprocessed picture data. Local and
Worldwide Feature Acquisition: adept in capturing the details as well as the general feel
of pictures.[3]
In [3], several machine learning techniques, each with its own advantages, are used to
detect image forgeries.. The selection of an algorithm for forgery detection relies on the
particular needs of the task, the attributes of the dataset, and the computational abilities,
ranging from conventional SVMs and Random Forests to sophisticated deep learning
structures such as CNNs and ensemble methods. The amalgamation of conventional and
deep learning methodologies persistently amplifies the precision and resilience of image
forgery detection systems.[3]
In [4]Copy-move forgery entails duplicating and pasting segments of the same image to
hide or replicate elements, thereby undermining the image’s integrity. Detecting such
forgeries is crucial for maintaining the authenticity of digital images, especially in legal,
journalistic, and scientific contexts.[4]
The detection process generally involves block division, feature extraction, matching and
clustering, localization, and post-processing. Effective detection methods must balance
sensitivity to small-scale and large-scale forgeries.[4]
12
High Computational Requirements
Deep learning models, including CNNs and hybrid approaches, require high-performance
hardware to function properly. The complexity of such models requires powerful GPUs
or specialized processors for training. The cost can be very high. Also, large datasets are
needed to achieve high accuracy, thereby increasing the computational cost. It is
particularly challenging for researchers with limited access to high-performance
machines or cloud computing resources. Hence, the trade-off between model complexity
and computational efficiency should be addressed in further research, or more lightweight
models can be explored that would run efficiently on less powerful hardware.
Storage Performance
Storage Requirements: One of the major research gaps in forgery detection involves the
storage requirements for large datasets. Most image and video datasets for training deep
learning models are enormous in size, thus requiring substantial storage space.. Efficient
data storage and retrieval mechanisms are also important for handling these large
datasets, but current systems often break when trying to manage the scale of data required
for robust model training. Research into optimizing data storage and access speeds will
be vital in handling the massive amounts of data involved.
Although deep learning models have advanced the field of image and video forgery
detection to unprecedented levels, the demand for high-performance laptops, large
storage capacity, and efficient processing times remains a significant gap in research. It is
important to address these issues to make forgery detection systems more accessible,
scalable, and practical for real-world applications.
13
CHAPTER 3
With millions of images and videos being created and disseminated every day, it is
virtually impossible to manually identify such forgeries. This process becomes even more
complicated considering the sophisticated techniques involved in modern digital
manipulation, such as copy-move, splicing, and object insertion, which often escape the
human eye. In addition, manual verification requires a great deal of time and specific
knowledge, thus making it an impossible approach to validate images on a large scale.
Consequently, there is a significant need for automated solutions which can detect
manipulated areas within images or videos and distinguish between genuine and forged
content. The solutions developed have to be accurate, scalable, fast, and adaptable to
different kinds of manipulations and various formats of images. Here comes a promising
solution with the integration of machine learning and deep learning technologies.
Machine learning algorithms can be trained to recognize subtle patterns and artifacts that
indicate tampering, offering a more reliable and efficient alternative to manual detection.
This project addresses this growing need by leveraging machine learning techniques to
develop a robust forgery detection system. The proposed system will include advanced
methods such as Convolutional Neural Networks (CNNs), Error Level Analysis (ELA),
and hybrid approaches to achieve high accuracy in overt and subtle forgeries. The
designed system will be capable of handling large datasets, thereby making it applicable
for real-world applications where thousands or even millions of images are to be
processed rapidly. To improve performance, the system should be optimized for real time
so that it can successfully be implemented in critical places like digital forensics and
social media verification and to track legal investigations.
14
Ultimately, this project aims at increasing the credibility of digital media by providing an
automated tool for forgery detection with reliability. It is aimed at combating the growing
threat of digital manipulation and supporting the preservation of truth in visual content,
through offering a scalable and efficient solution.
3.2 Objectives
This project aims to design an automated digital forgery detection system that uses
machine learning techniques to identify regions of digital media that have been
manipulated. By employing advanced methods like Error Level Analysis (ELA) and
Convolutional Neural Networks (CNNs), the system aims to detect both subtle and overt
forgeries. The project improves forgery detection's accuracy and efficiency, offering a
scalable solution for applications in digital forensics, media verification, and legal
investigations, for ensuring the authenticity and credibility of visual content at a time
when the digital landscape is rapidly changing.
15
maps. SE blocks have been shown to improve performance across various vision tasks,
including object detection and classification, by allowing models to focus on the most
informative features.
What are the design considerations necessary to ensure that a forgery detection model is
both efficient and scalable for large datasets and real-time applications?
To what extent does training on diverse datasets improve the model’s accuracy in
detecting forgeries in digital media with varying levels of manipulation complexity?
Feature Extraction:
How do attention mechanisms, such as Squeeze-and-Excitation (SE) blocks, improve the
model's ability to focus on the most relevant features for forgery detection?
16
CHAPTER 4
Methodology
This method uses a hybrid approach which combines Convolutional Neural Networks
(CNN) with Error Level Analysis (ELA) for image and video forgery detection. ELA is a
widely used forensic technique which detects parts of an image or video frame that have
been altered based on differences in error levels caused by compression. The technique
assumes that the manipulated areas of the image or video frame will have a different
compression level than the rest of the image. Comparing the error levels between the
original and re-compressed versions of the image, ELA can then highlight inconsistencies
that suggest tampering.
In video, the method of detection is applied to a series of individual frames. Each frame
analysis captures compression artifacts that appear if the video has been spliced or an
object inserted; analysis is performed on every single frame in the video sequence in
order to detect any tampering present within the video frame-by-frame or over time.
However, CNNs are primarily strong at automatically extracting the features of images
and videos and are useful for analysis. It basically utilizes the concept of using
convolutional layers, which actually applies several filters to the image or video frame to
help in the extraction of features hierarchically. These actually learn the patterns like
edges, textures, and various visual elements important to differentiate between the
original and modified content. Applying CNN in video forgery detection not only
analyses the spatial features in each frame but also tries to extract the temporal features
which go across consecutive frames so that inconsistencies in motion or transitions might
indicate manipulation.
These features obtained by the CNN are further utilized in the classification, where the
model classifies whether it is authentic or manipulated. This hybrid approach, combining
ELA for error detection and CNNs for feature extraction, makes the solution robust for
both images and videos. The ELA method helps to highlight potential areas of
manipulation, while CNN enhances the model's ability to classify these areas effectively;
thus, the system is very effective in detecting a broad range of forgery types in static
images and dynamic video content.
17
4.1 Overview of the Approach
It merges state-of-the-art approaches into one method for image and video forgery
detection through ELA and CNN. It will ensure that such detection is highly accurate as
well as robust, due to the hybrid method handling very slight manipulations in static
images or frames of video.
For Images: The original image is compressed, and the differences between the original
and compressed images are calculated to expose potential forgeries.
For Videos: ELA is applied frame-by-frame to analyze compression artifacts and detect
inconsistencies in individual frames, which may indicate tampering. This frame-by-frame
approach ensures detailed scrutiny of video content.
The CNNs are used for feature extraction and classification. They process the input data,
that is, the processed images or frames, and automatically learn hierarchical features that
distinguish between authentic and manipulated content.
Feature Extraction: The convolutional layers of the CNN extract spatial features such as
edges, textures, and other patterns crucial for detecting manipulations. In videos,
temporal features across consecutive frames are also captured to identify inconsistencies
in motion or transitions.
Classification: The extracted features are fed through fully connected layers and
classification units to decide whether the image or video is authentic or not.
Hybrid Integration:
18
system:
CNN Contribution: It further processes these cues and learns complex patterns,
increasing the model's ability to detect forgeries that might not be visible to the naked eye
or identifiable by ELA alone.
SE Contribution: It helps to identify subtle changes in the images or frames that might
have been missed by the CNN and makes the model train on them. Further increasing the
models performance.
The approach is extended seamlessly to video forgery detection by analyzing each frame
individually through ELA and CNNs. Temporal analysis is incorporated to capture
inconsistencies across frames, including motion mismatches or unnatural transitions. This
ensures that manipulations both at the frame level and sequence level, such as splicing,
duplication, or frame insertion, are detected.
The system is trained and evaluated on standard datasets (e.g., CASIA v2 for images and
datasets like DFDCfor videos) to ensure its robustness and reliability. Model optimization
techniques, such as attention mechanisms (e.g., Squeeze-and-Excitation blocks), are
employed to enhance the system's focus on critical features, improving detection
accuracy and computational efficiency.
This integrated approach not only gives a scalable and effective solution to detecting
image forgery but also can be extended to video content, enabling this work for practical
applications in digital forensics, media verification, and law enforcement.
19
forgery detection, preparation of data to highlight potential tampered regions and
standardization for entry into the Convolutional Neural Network (CNN) are involved.
The key steps for data preprocessing in this project include:
For images, the original file is compressed and re-saved, and the differences between the
two versions are calculated. These differences highlight tampered regions, which are
emphasized as distinct patterns or artifacts in the ELA output.
For videos, ELA is applied frame-by-frame to check for manipulation in each frame
automatically by the model. This ensures that even slight tampering in certain frames is
detected, regardless of the impact on the video sequence.
b. Normalization:
Normalization transforms pixel values to normalize it across the dataset. Scaling the pixel
intensity over a given range (0 to 1, for example) during preprocessing reduces variations
due to lighting and image quality such that the CNN is able to learn relevant patterns
more easily.
All images and video frames are resized to a fixed resolution to ensure compatibility with
the CNN model. Data augmentation techniques, such as flipping, rotation, and cropping,
increase the diversity of the training dataset, thus improving the model's robustness and
generalizability.
20
architectures for Convolutional Neural Networks (CNNs).
Images: CNNs are used to extract spatial features such as edges, textures, and regions of
interest from static images. These features are useful in detecting inconsistencies that
may be introduced by tampering.
Videos: Besides the spatial features, temporal features are extracted across consecutive
frames to detect inconsistencies in motion or transitions. This is especially useful in
detecting manipulations like frame insertion or duplication.
b. Batch Encoding:
Images and frames are divided into smaller batches for detailed analysis. The batch is
analyzed independently to extract localized features, which are later combined to provide
a holistic view of potential forgeries. This granular approach ensures that even subtle
manipulations are detected.
c. Attention Mechanisms:
The features that CNN extracted are passed to the fully connected layers where the model
learns to differentiate between authentic and tampered content. These layers analyze the
hierarchical features and provide a confidence score for each class, which is either
authentic or forged.
21
b. Output Layers and Decision Thresholds:
The final layer of the CNN uses a softmax activation function to produce probability
distributions over the possible classes. A decision threshold is applied to classify the
input based on the highest probability score.
It marks the suspected regions of tampering in the image or video frames and thus gives
insights into the detection results. This visualization is helpful to understand the nature of
the forgery and adds transparency to the model's decision-making process.
The approach here combines the strength of ELA for preprocessing, CNNs for feature
extraction, and advanced classification techniques to ensure that it will detect forgeries in
images and videos across different datasets with accuracy and robustness.
22
CHAPTER 5:
23
○ This difference map highlights subtle artifacts introduced during
manipulation, forming the input feature image.
● Reshaping:
○ The feature image is reshaped into a format suitable for further processing
by convolutional neural networks (CNNs).
● Forgery Detection:
○ The reshaped feature image is analyzed using a deep CNN model trained to
detect forgery patterns.
○ The output of the CNN model classifies the image as either genuine or
forged.
The video forgery detection system is designed to detect deep fake videos or other
manipulated content. The system consists of two workflows: training flow and
prediction flow.
Training Flow:
24
● Dataset Preparation:
○ A dataset containing real and fake videos is used.
○ Videos are split into individual frames.
● Preprocessing:
○ Each video frame undergoes face detection and cropping to focus on
relevant regions.
○ The cropped face videos are stored as a processed dataset for training and
divided into batches.
● Model Training:
○ The processed dataset is divided into training and testing sets.
○ Features are extracted using a DenseNet 201 model.
○ An LSTM (Long Short-Term Memory) network is employed for video
classification, leveraging the temporal information across video frames.
● Model Evaluation:
○ The trained model is evaluated using metrics like the confusion matrix to
assess performance.
○ The trained model is then exported for deployment.
Prediction Flow:
● Input Video:
○ A video is uploaded for prediction.
○ It undergoes preprocessing steps, including splitting into frames.
● Forgery Detection:
○ The preprocessed frames are fed into the trained model.
○ The model processes the frames and classifies the video as real or fake.
25
● Matplotlib/Seaborn: For visualizing results (e.g., plotting images, evaluation
metrics, and confusion matrices).
● OS: To handle file paths and access directories containing images.
● Scikit-learn: For evaluating model performance using metrics like classification
reports and confusion matrices.
Technologies
Deep Learning:
● A digital image forensics technique used to highlight regions of an image that may
have been tampered with.
● Multi-channel ELA (using different quality levels) helps to improve the detection
process.
Image Processing:
● Techniques to enhance and prepare images for input into the neural network.
Model Inference:
5.3 ALGORITHMS
Densenet201
DenseNet (Densely Connected Convolutional Networks) is a type of deep learning
architecture introduced by Gao Huang, Zhuang Liu, and Kilian Q. Weinberger in the
paper Densely Connected Convolutional Networks (2017). It aims to address the
vanishing gradient problem, encourage feature reuse, and improve the flow of
26
information across layers. DenseNet-201 is a variant of the DenseNet family, which
stands out because of its deep structure, containing 201 layers. DenseNet-201 is
particularly known for its efficiency in terms of parameter usage, while maintaining high
accuracy for a wide range of vision-related tasks.
In a DenseNet architecture, each layer receives input from all previous layers. This is in
contrast to traditional convolutional neural networks (CNNs), where each layer only
receives input from the previous layer. DenseNet’s dense connectivity pattern allows the
model to reuse features effectively, significantly reducing the number of parameters
needed for training while improving the overall performance.
Architecture of DenseNet-201
DenseNet-201 consists of several key components that set it apart from standard deep
learning models:
● Dense Blocks:
The central concept of DenseNet is the dense block, which consists of several
convolutional layers where each layer’s output is concatenated with the inputs to
all subsequent layers. In other words, for each layer in a dense block, all preceding
layers’ outputs are used as inputs. This facilitates feature reuse across layers,
improving gradient flow during backpropagation and reducing the risk of
27
overfitting. In DenseNet-201, the architecture consists of four dense blocks.
● Transition Layers:
Between dense blocks, DenseNet includes transition layers. Transition layers are
used to reduce the size of feature maps and the number of channels by applying a
convolution followed by a pooling operation. These layers help in managing the
computational complexity and control the growth of feature maps between blocks.
● Growth Rate:
DenseNet introduces the concept of the growth rate, which defines how many
feature maps are generated by each layer. For DenseNet-201, the growth rate is
typically 32, meaning each layer in a dense block generates 32 feature maps,
which are then concatenated with the inputs of the subsequent layers. The growth
rate significantly impacts the model’s efficiency and performance.
● Bottleneck Layers:
To further improve efficiency, DenseNet uses bottleneck layers before each dense
block. These bottleneck layers apply 1x1 convolutions, reducing the number of
feature maps before the more computationally expensive 3x3 convolutions. This
helps in reducing the computational load while still maintaining high model
performance.
28
CHAPTER 6:
Results
29
6.2 Results and Analysis
30
Hybrid Model:
31
Fig 7: Comparison of performance metrics of our and 3D CNN models
Fig 8: Training and validation accuracy of our digital forgery detection model for
images and videos
32
Fig 9: Confusion matrix of the predicted labels vs actual labels over a testing dataset
of 2861 files consisting of both images and videos
33
CHAPTER 7:
Discussion
Impact of SE Blocks on Feature Representation
The comparison of models with and without SE blocks shows that SE-enhanced models
achieve better recall and F1-scores, indicating a reduced rate of false negatives. The
recalibration process introduced by SE blocks helps in accurately identifying tampered
regions, even when they are not visually prominent. By suppressing irrelevant features
and amplifying crucial ones, SE blocks enhance the model's ability to capture
distinguishing characteristics of both authentic and tampered images, leading to improved
overall accuracy.
In order to improve the model's capacity to identify image forgeries, Error Level Analysis
(ELA) was a crucial preprocessing method. ELA enabled the model to concentrate on
regions suggestive of tampering by analyzing compression artifacts and providing feature
representations that highlighted manipulated regions. Highlighting discrepancies created
during manipulative procedures like splicing and copy-move forgeries was made easier
by this preprocessing step.
Because ELA provided features that strengthened the differentiation between real and
manipulated images, it played a significant role in the model's strong performance
metrics, such as high precision and recall. When tampered with, ELA consistently
identified manipulated areas, allowing the model to consistently identify forgeries. The
detection pipeline's incorporation of it was crucial in identifying minute yet significant
variations between authentic and modified content.
34
Insights from Correct and Incorrect Classifications
Analyzing cases where the model made correct classifications provides insights into the
strengths of the proposed approach. In correctly classified authentic images, the model's
visualizations generally focus on the overall structure and textural consistency, indicating
the model’s ability to recognize natural patterns in unaltered images. This pattern of focus
suggests the model has learned essential characteristics of authentic images, allowing it to
differentiate them reliably from manipulated ones.
However, incorrect classifications reveal areas for further improvement. False positives
(authentic images misclassified as tampered) tend to occur in images with high
compression artifacts or unusual textures, which may resemble tampering cues. False
negatives (tampered images misclassified as authentic) are typically seen in cases of
subtle or professionally edited forgeries where tampered regions lack strong ELA
artifacts or are well-integrated into the image. These misclassifications highlight the
limitations of relying on compression artifacts alone and suggest that incorporating
additional spatial or contextual information could improve the model’s robustness.
35
CHAPTER 8:
8.1 Conclusion
This study introduces a hybrid model for image forgery detection that combines Error
Level Analysis (ELA), DenseNet-201, and Squeeze-and-Excitation (SE) blocks. ELA as
a preprocessing step highlights tampered regions by capturing compression artifacts,
enhancing the model's ability to detect even subtle alterations that might be imperceptible
to the human eye. DenseNet-201 serves as the backbone, enabling efficient feature
extraction through its densely connected layers, which reduce redundancy and promote
better gradient flow during training. This architecture allows the model to effectively
differentiate between authentic and tampered content.
We presented a neural network-based approach to classify the video as deep fake or real,
along with the confidence of the proposed model. Our method is capable of predicting the
output by processing 1 second of video (10 frames per second) with a good accuracy. We
implemented the model by using a pre-trained DenseNet CNN model to extract the frame
level features and LSTM for temporal sequence processing to spot the changes between
the t and t-1 frame.
36
8.2 Future Directions
Improvement of Detection Algorithms with Deep Learning:
While traditional methods in forgery detection have been effective, the future of the field
lies in leveraging more advanced deep learning techniques. By incorporating cutting-edge
models such as Generative Adversarial Networks (GANs) for detecting deep fakes and
using convolutional neural networks (CNNs) for feature extraction, it is possible to
improve accuracy, robustness, and the ability to detect more sophisticated forgeries.
37
CHAPTER 9:
References
[1] Fridrich, Jessica Soukal, David Lukás, Jan. (2003). Detection of Copy-Move
Forgery in Digital Images. Int. J. Comput. Sci. Issues. 3. 55-61.
[2] Bayar, Belhassen Stamm, Matthew. (2016). A Deep Learning Approach to Universal
Image Manipulation Detection Using a New Convolutional Layer. 5-10.
10.1145/2909827.2930786.
[3] Farid, Hany. (2009). Exposing digital forgeries from JPEG ghosts. Information
Forensics and Security, IEEE Transactions on. 4. 154 - 160. 10.1109/TIFS.2008.2012215.
[4] Carvalho, Tiago Riess, Christian Angelopoulou, Elli Pedrini, Helio Rocha,
Anderson. (2013). Exposing Digital Image Forgeries by Illumination Color
Classification. Information Forensics and Security, IEEE Transactions on. 8. 1182-1194.
10.1109/TIFS.2013.2265677.
[5] G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, "Densely Connected
Convolutional Networks," 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2261-2269, doi:
10.1109/CVPR.2017.243.
[6] Ali, Syed Ganapathi, Iyyakutti. (2022). Image Forgery Detection Using Deep
Learning by Recompressing Images. Electronics. 11. 403. 10.3390/electronics11030403.
[7] J. Dong, W. Wang and T. Tan, "CASIA Image Tampering Detection Evaluation
Database," 2013 IEEE China Summit and International Conference on Signal and
Information Processing, Beijing, China, 2013, pp. 422-426, doi:
10.1109/ChinaSIP.2013.6625374.
[9]Hu, J., Shen, L., Sun, G. (2018). Squeeze-and-excitation networks. Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI:
38
10.1109/CVPR.2018.00745.
[12] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus
Thies,Matthias Nießner, “FaceForensics++: Learning to Detect Manipulated Facial
Images” in arXiv:1901.08971.
[14]D. Güera and E. J. Delp, "Deepfake Video Detection Using Recurrent Neural
Networks," 2018 15th IEEE International Conference on Advanced Video and Signal
Based Surveillance (AVSS), Auckland, New Zealand, 2018, pp. 1-6.
39
Appendices
40
Fig 12: ELA of Authentic Images
41