Splicebuster: A New Blind Image Splicing Detector
Splicebuster: A New Blind Image Splicing Detector
978-1-4673-6802-5/15/$31.00 ©2015 IEEE 2015 IEEE International Workshop on Information Forensics and Security (WIFS)
a model is learned for the host camera and used to detect procedure is used for simultaneous parameter estimation and
data departing from the model. This latter work, therefore, image segmentation. These cases are explored in the following
borders the anomaly detection field, and also the camera model after describing the proposed feature.
identification problem [18], [19].
Methods based on machine learning and feature modeling, A. Co-occurrence based local feature
though more general than the previous ones, have themselves a
Feature extraction is based on three main steps [15]
serious handicap, the need for a large training set. Sometimes,
this set is simply not available. One may be given a single 1) computation of residuals through high-pass filtering;
image and urged to decide whether it is pristine or forged, and 2) quantization of the residuals;
which part of it has been manipulated. Barring fortunate cases, 3) computation of a histogram of co-occurrences.
like copy-moves or double JPEG compression, this “blind” The final histogram is the feature vector associated with the
forgery detection problem may be very challenging. whole image, which can be used for classification. To compute
In this paper we propose a new algorithm for the blind the residual image we use a linear high-pass filter of the third
detection and localization of forgeries, nicknamed splicebuster. order, which assured us a good performance for both forgery
No prior knowledge is available on the host camera, on detection [16], [17] and camera identification [19], defined as
the splicing, or on their processing history. We use the co-
occurrence based features proposed in [15] and, as in [17], rij = xi,j−1 − 3 xi,j + 3 xi,j+1 − xi,j+2 (1)
follow an anomaly detection approach, learning a model for
the features based on the very same image under analysis. where x and r are origin and residual images, and i, j indicate
In a first supervised scenario, the user is required to select a spatial coordinates. The next step is to compute residual co-
tentative training set to learn the model parameters, while in occurrences. To this end, residuals must be first quantized,
the unsupervised scenario, segmentation and model learning using a very small number of bins to obtain a limited feature
are pursued jointly by means of the expectation-maximization length. Therefore, we perform quantization and truncation as:
(EM) algorithm. Experimental results show that, despite the
obvious loss of reliability due to the lack of an adequate rbij = truncT (round(rij /q)) (2)
training set, a very good performance can be obtained in most
with q the quantization step and T the truncation value. We
cases of interest.
compute co-occurrence on four pixels in a row, that is
II. P ROPOSED METHOD
To localize possible forgeries in the image we start from C(k0 , k1 , k2 , k3 ) =
X
the approach proposed in [17], which is based on three major I(b ri,j = k0 , rbi+1,j = k1 , rbi+2,j = k2 , rbi+3,j = k3 )
steps: i,j
• defining an expressive feature that captures the traces left
where I(A) is the indicator function of event A, equal to 1 if
locally by in-camera processing;
A holds and 0 otherwise. The homologous column-wise co-
• computing synthetic feature parameters (mean vector and
occurrences are pooled with the above based on symmetry
covariance matrix) for the class of images under test,
considerations. Unlike in [15], we pass the normalized his-
based on a suitable training set;
tograms through a square-root non-linearity, to obtain a final
• using these statistics to discover where the features com-
feature with unitary L2 norm. In fact, in various contexts, such
puted locally depart from the model, pointing to some
as texture classification and image categorization, histogram
possible image manipulation.
comparison is performed by measures such as χ2 or Hellinger
With respect to this paradigm, we have the major additional
that are found to work better than the Euclidean distance.
problem that no training set is available. A single image is
After square rooting, the Euclidean distance between features
given with no prior information. Still, we want to follow the
is equivalent to do the Hellinger distance between the original
same approach as before, computing model parameters and
histograms [20].
testing model fitting. This raises two distinct problems: i) even
if an oracle told us which part of the image is pristine, the data
B. Supervised scenario
available for training may be too scarce for reliable decision,
and ii) we have no oracle, actually, so we must localize the In this case, the user is assumed to take an active role
forgery and estimate the parameters of interest at the same in the process. She is required to select a bounding box,
time. Indeed, if ideal single-image training does not provide including the possible forgery, that will be subject to the
reliable results, the whole approach is unsuitable for this task, analysis, while the rest of the image is used as training set
no matter what we do. However, in Section 3, we will provide (see Fig.1 for example). The analysis is carried out in sliding-
experimental evidence that single-image training is sufficient window modality [17], using blocks of size W × W , large
in most cases. Turning to the second issue, we will consider enough to extract a meaningful feature, that is, the normalized
two scenarios, a supervised case, in which the user acts as histogram of co-occurrences, h. The N blocks taken from
an oracle, and an unsupervised case, where an EM-based the training area are used to estimate in advance mean and
1 1 1 1
TPR
TPR
TPR
0.5 0.5 0.5 0.5
[0.9479 ; 0.9595 ; 0.9666] [0.9037 ; 0.9455 ; 0.9634] [0.8120 ; 0.9323 ; 0.9626] [0.5536 ; 0.7326 ; 0.9363]
0 0 0 0
0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1 0 0.25 0.5 0.75 1
FPR FPR FPR FPR
Fig. 2: Performance as a function of the training set size M : from left to right, M =50, M =10, M =5, M =1. For each FPR
level, the bar ranges from the worst to the best TPR over the training sets. In parentheses, the worst, median and best AUC.
covariance of the feature vector We note explicitly that the Gaussian model is only a handy
N simplification, lacking more precise information on the feature
1 X
µ= hn (3) distribution.
N n=1 The first model is conceived for the case when the forged
1 XN area is relatively large w.r.t. the whole image. Therefore, the
Σ= (hn − µ)(hn − µ)T (4) two classes have the same dignity, and can be expected to
N n=1 emerge easily through the EM clustering. The block-wise
Then, for each block of the test area, the associated feature h0 decision statistic is the ratio between the two Mahalanobis
is extracted, and its Mahalanobis distance w.r.t. the reference distances.
feature µ is computed When the forged region is very small, instead, the intra-class
variability, mostly due to image content (e.g., flat vs. textured
D(h0 , µ; Σ) = (h0 − µ)T Σ−1 (h0 − µ) (5) areas) may become dominant w.r.t. inter-class differences,
Large distances indicate blocks that deviate significantly from leading to wrong results. Therefore, we consider the Gaussian-
the model. In the output map provided to the user, each Uniform model, which can be expected to deal better with
block is given a color associated with the computed distance. these situations, and in fact has been often considered to
Eventually, the user decides based on the visual inspection of account for the presence of outliers, e.g., [21]. Note that, in this
the map (see again Fig.1). case, the decision test reduces to comparing the Mahalanobis
Note that the user may repeat the process several times with distance from the Gaussian model with a threshold λ as
different bounding boxes, implying that a meaningful analysis already done in [17].
can be conducted even in the absence of any initial guess of We do not choose between these two models, leaving the
the presence and location of a forgery. final say to the experimental analysis.
0.75
TPR
0.5
0.25
0
0 0.25 0.5 0.75 1
FPR