0% found this document useful (0 votes)
199 views6 pages

Splicebuster: A New Blind Image Splicing Detector

Uploaded by

abhas kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
199 views6 pages

Splicebuster: A New Blind Image Splicing Detector

Uploaded by

abhas kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Splicebuster: a new blind image splicing detector

Davide Cozzolino, Giovanni Poggi, Luisa Verdoliva


DIETI, University Federico II of Naples, Italy

Abstract—We propose a new feature-based algorithm to detect


image splicings without any prior information. Local features are
computed from the co-occurrence of image residuals and used to
extract synthetic feature parameters. Splicing and host images are
assumed to be characterized by different parameters. These are
learned by the image itself through the expectation-maximization
algorithm together with the segmentation in genuine and spliced
parts. A supervised version of the algorithm is also proposed.
Preliminary results on a wide range of test images are very en-
couraging, showing that a limited-size, but meaningful, learning Fig. 1: Splicebuster working on a toy example. Local features
set may be sufficient for reliable splicing localization.
extracted from the input image (left) are used to learn a model
Index Terms—Image forensics, forgery detection and localiza-
tion, local image descriptors, blind algorithm. with two classes, associated with genuine and forged areas.
The output heat map (right) indicates clearly a splicing in
I. I NTRODUCTION correspondence with the ghost.
Images and videos account already for the biggest share
of traffic and storage space over the internet, and this trend
is only going to increase in the near future. As manipulating [5]. In alternative, detection and localization may rely on the
multimedia content becomes ever more widespread and easy, different intensity and properties of the noise introduced in the
the interest for digital image forensics is rapidly growing. image by different camera sensors [6], [7].
Image forensic tools must address a wide variety of specific A different form of prior information concerns the process-
goals, form establishing the authenticity of an image, to dis- ing history of host image and splicing. In particular, assuming
covering the presence of a manipulation, its type, its location, the images are always saved in compressed JPEG format,
and so on. Indeed, many different forms of manipulation exist performing a splicing induces a double JPEG compression
like copy-moving parts of an image, covering objects through which leaves clear traces in the DCT coefficients of image
inpainting, retouching details, or inserting material taken from blocks. Therefore, several methods have been proposed, like
a different source (splicing). Such diverse scenarios call for [8], [9] or [10], which exploit the statistical distribution of
specific approaches and techniques. For example, to find copy- such coefficients.
moves one looks for near-duplicates in the image, while to All the above techniques rely on some strong and very
find a splicing one must discovery anomalies with respect to specific hypothesis, which are not always met in practice. A
a typical behavior. These anomalies may be macroscopic, re- more general approach consists in assuming that the different
lated to illumination or perspective inconsistencies, but skilled in-camera processing chain or out-camera processing history
attackers avoid easily these errors. To detect accurate forgeries, of host and splicing give rise to subtle differences in the high-
statistical signal analysis tools are necessary. pass content of the image. Whatever their origin, these patterns
In the last decade, many techniques have been proposed can be captured by some suitable features and classified by
for splicing detection and localization, which can be classified machine learning. In this context, the research focuses on the
based on the amount of prior information they rely upon. When definition of the most expressive local features that account
either the host camera or an arbitrary number of images taken for such subtle differences. A first step in this direction dates
from it are available, one can estimate the so-called camera back to 2004, with the model proposed in [11]. However, a
fingerprint, or photo-response non-uniformity noise (PRNU) major impulse comes only some years later with [12], where
pattern [1]. Being unique for any camera sensor, it allows to features based on both first-order and higher-order statistics of
reliably identify the source camera, and also to detect and DCT coefficients are used, providing a performance gap with
localize possible manipulations [1], [2] provided they are not respect to the previous state of the art. In [13] the approach
too small. is extended to include also wavelet-based features, while [14]
A step below in this prior information scale, one can know resorts to a noncausal Markov model. A local feature proposed
or estimate the color filter array (CFA) and the interpolation originally for steganalysis [15], based on the co-occurrence of
filter characterizing the camera model. Given these pieces of image residuals, is used in [16] for splicing detection with
information, one can detect transitions between original and excellent results. In [17], the same features are used, but there
spliced regions, as already suggested back in 2005 [3]. Several is a switch from the machine learning paradigm to model based
effective algorithms are based on this simple idea, like [4] and detection. Assuming that only genuine images are available,

978-1-4673-6802-5/15/$31.00 ©2015 IEEE 2015 IEEE International Workshop on Information Forensics and Security (WIFS)
a model is learned for the host camera and used to detect procedure is used for simultaneous parameter estimation and
data departing from the model. This latter work, therefore, image segmentation. These cases are explored in the following
borders the anomaly detection field, and also the camera model after describing the proposed feature.
identification problem [18], [19].
Methods based on machine learning and feature modeling, A. Co-occurrence based local feature
though more general than the previous ones, have themselves a
Feature extraction is based on three main steps [15]
serious handicap, the need for a large training set. Sometimes,
this set is simply not available. One may be given a single 1) computation of residuals through high-pass filtering;
image and urged to decide whether it is pristine or forged, and 2) quantization of the residuals;
which part of it has been manipulated. Barring fortunate cases, 3) computation of a histogram of co-occurrences.
like copy-moves or double JPEG compression, this “blind” The final histogram is the feature vector associated with the
forgery detection problem may be very challenging. whole image, which can be used for classification. To compute
In this paper we propose a new algorithm for the blind the residual image we use a linear high-pass filter of the third
detection and localization of forgeries, nicknamed splicebuster. order, which assured us a good performance for both forgery
No prior knowledge is available on the host camera, on detection [16], [17] and camera identification [19], defined as
the splicing, or on their processing history. We use the co-
occurrence based features proposed in [15] and, as in [17], rij = xi,j−1 − 3 xi,j + 3 xi,j+1 − xi,j+2 (1)
follow an anomaly detection approach, learning a model for
the features based on the very same image under analysis. where x and r are origin and residual images, and i, j indicate
In a first supervised scenario, the user is required to select a spatial coordinates. The next step is to compute residual co-
tentative training set to learn the model parameters, while in occurrences. To this end, residuals must be first quantized,
the unsupervised scenario, segmentation and model learning using a very small number of bins to obtain a limited feature
are pursued jointly by means of the expectation-maximization length. Therefore, we perform quantization and truncation as:
(EM) algorithm. Experimental results show that, despite the
obvious loss of reliability due to the lack of an adequate rbij = truncT (round(rij /q)) (2)
training set, a very good performance can be obtained in most
with q the quantization step and T the truncation value. We
cases of interest.
compute co-occurrence on four pixels in a row, that is
II. P ROPOSED METHOD
To localize possible forgeries in the image we start from C(k0 , k1 , k2 , k3 ) =
X
the approach proposed in [17], which is based on three major I(b ri,j = k0 , rbi+1,j = k1 , rbi+2,j = k2 , rbi+3,j = k3 )
steps: i,j
• defining an expressive feature that captures the traces left
where I(A) is the indicator function of event A, equal to 1 if
locally by in-camera processing;
A holds and 0 otherwise. The homologous column-wise co-
• computing synthetic feature parameters (mean vector and
occurrences are pooled with the above based on symmetry
covariance matrix) for the class of images under test,
considerations. Unlike in [15], we pass the normalized his-
based on a suitable training set;
tograms through a square-root non-linearity, to obtain a final
• using these statistics to discover where the features com-
feature with unitary L2 norm. In fact, in various contexts, such
puted locally depart from the model, pointing to some
as texture classification and image categorization, histogram
possible image manipulation.
comparison is performed by measures such as χ2 or Hellinger
With respect to this paradigm, we have the major additional
that are found to work better than the Euclidean distance.
problem that no training set is available. A single image is
After square rooting, the Euclidean distance between features
given with no prior information. Still, we want to follow the
is equivalent to do the Hellinger distance between the original
same approach as before, computing model parameters and
histograms [20].
testing model fitting. This raises two distinct problems: i) even
if an oracle told us which part of the image is pristine, the data
B. Supervised scenario
available for training may be too scarce for reliable decision,
and ii) we have no oracle, actually, so we must localize the In this case, the user is assumed to take an active role
forgery and estimate the parameters of interest at the same in the process. She is required to select a bounding box,
time. Indeed, if ideal single-image training does not provide including the possible forgery, that will be subject to the
reliable results, the whole approach is unsuitable for this task, analysis, while the rest of the image is used as training set
no matter what we do. However, in Section 3, we will provide (see Fig.1 for example). The analysis is carried out in sliding-
experimental evidence that single-image training is sufficient window modality [17], using blocks of size W × W , large
in most cases. Turning to the second issue, we will consider enough to extract a meaningful feature, that is, the normalized
two scenarios, a supervised case, in which the user acts as histogram of co-occurrences, h. The N blocks taken from
an oracle, and an unsupervised case, where an EM-based the training area are used to estimate in advance mean and
1 1 1 1

0.75 0.75 0.75 0.75


TPR

TPR

TPR

TPR
0.5 0.5 0.5 0.5

0.25 0.25 0.25 0.25

[0.9479 ; 0.9595 ; 0.9666] [0.9037 ; 0.9455 ; 0.9634] [0.8120 ; 0.9323 ; 0.9626] [0.5536 ; 0.7326 ; 0.9363]
0 0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
FPR FPR FPR FPR

Fig. 2: Performance as a function of the training set size M : from left to right, M =50, M =10, M =5, M =1. For each FPR
level, the bar ranges from the worst to the best TPR over the training sets. In parentheses, the worst, median and best AUC.

covariance of the feature vector We note explicitly that the Gaussian model is only a handy
N simplification, lacking more precise information on the feature
1 X
µ= hn (3) distribution.
N n=1 The first model is conceived for the case when the forged
1 XN area is relatively large w.r.t. the whole image. Therefore, the
Σ= (hn − µ)(hn − µ)T (4) two classes have the same dignity, and can be expected to
N n=1 emerge easily through the EM clustering. The block-wise
Then, for each block of the test area, the associated feature h0 decision statistic is the ratio between the two Mahalanobis
is extracted, and its Mahalanobis distance w.r.t. the reference distances.
feature µ is computed When the forged region is very small, instead, the intra-class
variability, mostly due to image content (e.g., flat vs. textured
D(h0 , µ; Σ) = (h0 − µ)T Σ−1 (h0 − µ) (5) areas) may become dominant w.r.t. inter-class differences,
Large distances indicate blocks that deviate significantly from leading to wrong results. Therefore, we consider the Gaussian-
the model. In the output map provided to the user, each Uniform model, which can be expected to deal better with
block is given a color associated with the computed distance. these situations, and in fact has been often considered to
Eventually, the user decides based on the visual inspection of account for the presence of outliers, e.g., [21]. Note that, in this
the map (see again Fig.1). case, the decision test reduces to comparing the Mahalanobis
Note that the user may repeat the process several times with distance from the Gaussian model with a threshold λ as
different bounding boxes, implying that a meaningful analysis already done in [17].
can be conducted even in the absence of any initial guess of We do not choose between these two models, leaving the
the presence and location of a forgery. final say to the experimental analysis.

C. Unsupervised scenario III. E XPERIMENTS


In this case, after the feature extraction phase, carried out We present now a number of experiments which provide
on the whole image with unit stride, we rely on an automatic insight into the potential of the blind techniques proposed here
algorithm to jointly compute the model parameters and the There is wide variety of manipulations of possible interest, and
two-class image segmentation. Although there are many tools we have shown in [17] that the co-occurrence based feature
available for this task, for the time being, we resort to a simple allows one to detect and localize very well most of them.
expectation-maximization clustering. Here we focus only on splicing from other cameras and use
As input, we need the mixture model of the data, namely, 6 cameras of 6 different models and 4 manufacturers: Canon
the number of classes, their probabilities, π0 , π1 , . . ., and the EOS 450D, Canon IXUS 95IS, Nikon D200, Nikon Coolpix
probability model of each class. For us, the number of classes S5100, Digimax 301, Sony DSC S780. For each camera we
is always fixed to two, corresponding to the genuine area of have a large number of images, which are cropped to size
the image (hypothesis H0) and the tampered area (hypothesis 768×1024 to speed-up processing.
H1). We will consider two cases for the class models Considering the limited training data available in this case,
1) both classes are modeled as multivariate Gaussian we must reduce as much as possible the feature length, so as
to allow reliable estimates. Therefore, the truncation parameter
p(h) = π0 N (h|µ0 , Σ0 ) + π1 N (h|µ1 , Σ1 ) is set to T =1, implying only three quantization levels for
2) class H0 is modeled as Gaussian, while class H1 is the residual, including 0. To balance losses, a relatively large
modeled as Uniform over the feature domain Ω, quantization step, q=2 is used. Thanks to symmetries, the final
feature has length 50, which is further reduced to 25 through
p(h) = π0 N (h|µ0 , Σ0 ) + π1 α1 I(Ω) PCA. The block size is 128×128, as a good compromise
1

0.75
TPR

0.5

0.25

0
0 0.25 0.5 0.75 1
FPR

Fig. 3: Sample ROCs (left) obtained with single-image training


and corresponding training images (right).

between accuracy and resolution. Since the results of the


iterative EM algorithm depend strongly on the initialization
we run it 30 times with different random initial parameters,
selecting eventually the outcome for which the data exhibit the
highest likelihood. Note that saturated and very dark areas tend
to cause false alarms, and are hence excluded in this analysis.

A. Dependence on training set size and quality


Before showing results in the blind context, we carry out
an experiment to study how results depend on the size and Fig. 4: Results for some selected examples. Top to bottom:
quality of the training set. We select a single camera as our forged images, maps obtained with the unsupervised method
host, and all the others as source of spliced material. The (GG and GU mixtures), and the supervised method.
feature parameters for the host camera are estimated on a
certain number M of training images. Then we test an equal
number of genuine and fake blocks, deciding on their nature that size is not really a limiting factor (at this level) provided
based on how their associated features fit with the camera sufficient variety is guaranteed. In addition, turning to our
model. Performance is measured in terms of true positive rate blind scenario, the training image is automatically well fit to
(TPR) vs. false positive rate (FPR). Notice that a very similar the test, since most textures can be expected to be present in
experiment was presented in [17], using always M =200. In both sections.
Fig.2 we show the results obtained for M = 50, 10, 5 and B. Analysis in controlled conditions
1, the latter amounting to single-image training. Since results
To assess the performance of splicebuster we use visual
may depend very much on the specific training images chosen,
inspection of results for some images with known splicing. In
especially when just a few of them are used, the experiment is
Fig.4 we show three selected examples, where the spliced area
repeated several times with random instances of the training
is highlighted, together with the maps provided by the variants
set, 200 times for the case M =1. In the figure, for each value
of our method, that is, the unsupervised method with the two-
of FPR, we show a bar going from the worst to the best TPR.
Gaussian (GG) and Gaussian-Uniform (GU) mixture models
The solid curve corresponds to median values.
(middle rows), and the supervised method (last row). The GU
It is clear that, with a large training set, say, 50 images,
mixture provides always good results, while the GG mixture
results are very good and depend very weakly on the specific
leads to some false alarms, a behavior observed also more in
set of images. With smaller sizes, 10 or 5, results are still
general. The supervised method is always very accurate. Note
generally good but present a larger variability. Going down to
that the result of the unsupervised case can be used as a guide
M =1, the dependence on the single training image becomes
for the selection of the areas to investigate in more depth with
very strong. It is worth underlining, however, that for some
the supervised approach.
instances of the single-image training the performance is quite
good, not far from that of the 50-image case. C. Comparison with the state of the art
Fig.3 sheds some light on the nature of the good and bad We now consider some comparisons with state of the
training images. As could be expected, bad training images art approaches. We used the 180 images coming from the
(red/magenta curves and boxes) are characterized by low Columbia Dataset1 . Images are all in uncompressed formats
contrast and limited variety of textures, sometimes highly with size from 757 × 568 to 1152 × 768. Spliced images were
unusual. On the contrary, good training images (green/blue created using Adobe Photoshop with material coming from
curves and boxes) are quite varied, presenting bright and dark exactly two cameras, and no post processing was performed
areas, with both textures and smooth contents. Considering
that with such images performance is so good, one may argue 1 http://www.ee.columbia.edu/ln/dvmm/downloads/authsplcuncmp/.
1
assessment, and robustness to JPEG compression and other
0.9 forms of post-processing should be explored.
0.8
R EFERENCES
0.7
[1] M. Chen, J. Fridrich, M. Goljan, and J. Lukás, “Determining image ori-
0.6 gin and integrity using sensor noise,” IEEE Transactions on Information
avr TPR

Forensics and Security, vol. 3, no. 1, pp. 74–90, March 2008.


0.5 [2] G. Chierchia, G. Poggi, C. Sansone, and L. Verdoliva, “A Bayesian-MRF
0.4
approach for PRNU-based image forgery detection,” IEEE Transactions
on Information Forensics and Security, vol. 9, no. 4, pp. 554–567, 2014.
0.3 [3] A.C. Popescu and H. Farid, “Exposing digital forgeries in color filter
Proposal (GU mixture) array interpolated images,” IEEE Transactions on Signal Processing,
0.2 Popescu and Farid [3] vol. 53, no. 10, pp. 3948–3959, 2005.
Lyu et al. [7] [4] A. Swaminathan, M. Wu, and K.J. Ray Liu, “Digital image forensics
0.1 Bianchi and Piva (A−DJPG) [9]
Bianchi and Piva (NA−DJPG) [9] via intrinsic fingerprints,” IEEE Transactions on Information Forensics
0 and Security, vol. 3, no. 1, pp. 101–117, march 2008.
0 0.2 0.4 0.6 0.8 1
avr FPR [5] P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, “Image Forgery
Localization via Fine-Grained Analysis of CFA Artifacts,” IEEE
Fig. 5: Pixel-level ROCs on Columbia database. Transactions on Information Forensics and Security, vol. 7, no. 5, pp.
1566–1577, october 2012.
[6] B. Mahdian and S. Saic, “Using noise inconsistencies for blind image
forensics,” Image and Vision Computing, vol. 27, no. 10, pp. 1497–1503,
[22]. We implemented the approaches of Popescu and Farid 2009.
[7] S. Lyu, X. Pan, and X. Zhang, “Exposing Region Splicing Forgeries
[3] based on CFA artifacts, and Lyu et al. [7] exploiting noise with Blind Local Noise Estimation,” International Journal of Computer
level inconsistencies. The code for the method of Bianchi and Vision, vol. 110, no. 2, pp. 202–221, 2014.
Piva [9] based on double JPEG compression was available on- [8] Y.-L. Chen and C.-T. Hsu, “Detecting Recompression of JPEG Images
via Periodicity Analysis of Compression Artifacts for Tampering Detec-
line. Fig.5 shows ROCs obtained at pixel level and it can be tion,” IEEE Transactions on Information Forensics and Security, vol. 6,
seen that splicebuster performs much better than all references. no. 2, pp. 396–406, june 2011.
We also considered more realistic scenarios by using images [9] T. Bianchi and A. Piva, “Image Forgery Localization via Block-
Grained Analysis of JPEG Artifacts,” IEEE Transactions on Information
publicly available on the net, where no information is provided Forensics and Security, vol. 7, no. 3, pp. 1003–1017, june 2012.
about the nature of the splicings, hence it is possible that [10] I. Amerini, R. Becarelli, R. Caldelli, and A. Del Mastio, “Splicing
the images have undergone some post-processing operations. forgeries localization through the use of first digit features,” in IEEE
Workshop on Information Forensics and Security, 2014, pp. 143–148.
The first three are taken from the training set of the first [11] T.T. Ng and S.F. Chang, “A model for image splicing,” in IEEE
IEEE Image Forensics Challenge2 , and come with a ground International Conference on Image Processing, 2004, pp. 1169–1172.
truth. The following four come from the test set of the same [12] Y.Q. Shi, C. Chen, and G. Xuan, “Steganalysis versus splicing detec-
tion,” in International Workshop on Digital Watermarking, 2008, vol.
challenge, and the last two are drawn from the Worth1000 5041, pp. 158–172.
site3 . In both cases no ground truth is available. [13] Z. He, W. Lu, W. Sun, and J. Huang, “Digital image splicing
In Fig.6 next to each image, we show the heat maps obtained detection based on Markov features in DCT and DWT domain,” Pattern
Recognition, vol. 45, pp. 4292–4299, 2012.
by the reference methods and the ones of the proposed ap- [14] J. Zhao and W. Zha, “Passive forensics for region duplication image
proach in unsupervised (GU mixture) and supervised modality. forgery based on harris feature points and local binary patterns,” in
In the latter case, we tested various bounding boxes. The Mathematical Problems in Engineering, 2013, pp. 1–12.
[15] J. Fridrich and J. Kodovský, “Rich models for steganalysis of digital
visual inspection of the heat maps confirms the very good images,” IEEE Transactions on Information Forensics and Security, vol.
performance of splicebuster, except for some false alarms in 7, no. 3, pp. 868–882, june 2012.
the unsupervised case (dark blue areas correspond to saturated [16] D. Cozzolino, D. Gragnaniello, and L.Verdoliva, “Image forgery
detection through residual-based local descriptors and block-matching,”
or very dark image regions and are not considered at all). in IEEE Conference on Image Processing (ICIP), October 2014, pp.
Only in some cases, instead, the reference techniques provide 5297–5301.
sensible results, and the maps are typically less readable than [17] L. Verdoliva, D. Cozzolino, and G. Poggi, “A feature-based approach
for image tampering detection and localization,” in IEEE Workshop on
those of the proposed method. Information Forensics and Security, 2014, pp. 149–154.
[18] M. Kirchner and T. Gloe, “Forensic camera model identification,” in
Handbook of Digital Forensics of Multimedia Data and Devices, T.S.
IV. C ONCLUSION AND FUTURE WORK Ho and S. Li, Eds. Wiley-IEEE Press, 2015.
[19] F. Marra, G. Poggi, C. Sansone, and L. Verdoliva, “Evaluation of
We proposed a new blind splicing detector. Results are residual-based local features for camera model identification,” in Inter-
definitely encouraging, especially if compared with reference national Workshop on Recent Advances in Digital Security: Biometrics
methods. Still, there is much work ahead. Key parameters and Forensics (BioFor), september 2015.
[20] R. Arandjelovic̀ and A. Zisserman, “Three things everyone should know
(like α in the GU mixture) are selected heuristically, for the to improve object retrieval,” in IEEE Conference on Computer Vision
time being. Likewise, the conversion from heat map to binary and Pattern Recognition, 2012, pp. 2911–2918.
decision is still to perform. A major effort is then required [21] A.C. Popescu and H. Farid, “Exposing digital forgeries by detecting
traces of resampling,” IEEE Transactions on Signal Processing, vol. 53,
to set up a sensible paradigm for objective performance no. 2, pp. 758–767, 2005.
[22] Y.F. Hsu and S.F. Chang, “Detecting image splicing using geometry
2 http://ifc.recod.ic.unicamp.br/fc.website/index.py?sec=0. invariants and camera characteristics consistency,” in IEEE International
3 http://www.Worth1000.com Conference on Multimedia and Expo, 2006, pp. 549–552.
Fig. 6: Results of reference and proposed algorithms on some images available on the net. From left to right: forged image,
heat maps obtained with the method of Popescu and Farid [3], Lyu et al. [7], Bianchi and Piva [9], splicebuster in unsupervised
(GU mixture) and supervised modality.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy