Programmable Media

Cloudinary Duplicate Image Detection (Beta)

Last updated: Jul-16-2024

Important
Cloudinary Duplicate Image Detection is currently in Beta. There may be minor changes to parameter names or other implementation details before the general access release. We invite you reach out to us to try it out. We would appreciate any feedback via our support team.

Cloudinary is a cloud-based service that provides solutions for image and video management, including server or client-side upload, on-the-fly image and video transformations, quick CDN delivery, and a variety of asset management options.

The Cloudinary Duplicate Image Detection add-on can be invoked either on image upload, or on images already stored in your Cloudinary product environment, to determine if duplicate images exist in your media storage. The add-on uses hashing algorithms to provide 'fingerprints' for selected images. A configurable threshold determines how close a fingerprint has to be to produce a match. Therefore, images do not need to be identical - for example, they can differ subtly in compression, resolution, contrast or brightness and still be close enough to be termed a duplicate. The add-on uses the moderation flow, so you can manually override any decisions made about the image.

Specifying which images are included in the search

Start by telling Cloudinary which of the images stored in your Cloudinary product environment you want to be included in the duplicate detection search. For each of these images use the explicit API method, with the moderation parameter set to duplicate:0, for example:

When uploading subsequent images to your product environment, you can add these to the set of images that are searched, in a similar way, using the upload method:

Additionally, all images approved by the Cloudinary Duplicate Image Detection add-on, either automatically or via a manual override, are added to the set of images to search in subsequent duplicate detection requests.

Automatic image moderation flow

The Cloudinary Duplicate Image Detection add-on uses the following moderation flow to mark images as approved or rejected based on whether duplicate images are detected in the product environment:

  1. Image upload
    1. Upload an image to Cloudinary, requesting duplicate detection and specifying a confidence threshold.
    2. The uploaded image is set to a 'pending' status, with short term CDN caching.
  2. Image moderation
    1. The uploaded image is sent to the Duplicate Image Detection algorithm for asynchronous analysis in the background.
    2. The image is either approved or rejected by the add-on, based on whether the confidence score is below or above the threshold.
    3. An optional notification callback is sent to your webhook with the image moderation result.
    4. If the image is approved, i.e. no duplicate images are detected, its cache settings are modified to be long-term.
    5. If the image is rejected, i.e. duplicate or near-duplicate images are found in your product environment, the image does not appear in your listed assets, but is backed up, consuming storage, so that it can be restored if necessary.
  3. Manual override
    1. Pending, approved and rejected images can be listed programmatically using Cloudinary's API or interactively using our online Media Library web interface.
    2. You can manually override the automatic moderation using the API or Media Library.

Note
By default, assets that are marked for moderation are deliverable and available via the Media Library. If you'd like the assets marked for moderation to be blocked (until approved) either from the Media Library, or from being delivered, or from both, contact support.

Detecting duplicate images

To activate duplicate detection when uploading an image, set the moderation parameter in the upload method to duplicate:<threshold>, where threshold is a float greater than 0 and less than or equal to 1.0, and specifies how similar an image needs to be in order to be considered a duplicate (see our threshold guidelines for an idea of what to set this to). A value of 1.0 means the image is an exact duplicate, whereas lower levels indicate subtle differences between images. For example, to detect images that are almost identical to new_pic.jpg, where the threshold for a positive detection is 0.8:

Tip
You can use upload presets to centrally define a set of upload options including add-on operations to apply, instead of specifying them in each upload call. You can define multiple upload presets, and apply different presets in different upload scenarios. You can create new upload presets in the Upload Presets page of the Console Settings or using the upload_presets Admin API method. From the Upload page of the Console Settings, you can also select default upload presets to use for image, video, and raw API uploads (respectively) as well as default presets for image, video, and raw uploads performed via the Media Library UI.

Learn more: Upload presets

The uploaded image is available for delivery based on the randomly assigned public ID with short-term caching of 10 minutes. Image analysis by the Duplicate Image Detection add-on is performed asynchronously and should be completed within a few minutes.

The following snippet shows the response of the upload API call that signifies that the duplicate detection is in the pending status.

If you want to apply duplicate detection to an already uploaded image, you can use the explicit method in a similar way:

Status notification

Due to the fact that the Cloudinary Duplicate Image Detection add-on analyzes images asynchronously, you might want to get notified when the analysis is complete.

When calling the upload API with duplicate image detection, you can request a notification by setting the notification_url parameter to a webhook. Cloudinary sends a POST request to the specified endpoint when the analysis is complete.

The following JSON snippet is an example of a POST request sent to the notification URL when moderation is completed. The moderation_status value in this case can be either approved or rejected:

If the image is rejected, the response includes the public IDs of all images that scored higher than the threshold. In this case, one identical image was found, and one that differed very slightly, in brightness.

Original image to upload Identical image
Confidence: 1
Happy couple Near-duplicate image
Confidence: 0.98

Image moderation list

Cloudinary's Admin API can be used to list all moderated images. You can list all approved, pending or rejected images by specifying the value of the status parameter of the resources_by_moderation API method. For example to list all rejected images:

Example response:

Manual override

While the automatic image analysis of the Cloudinary Duplicate Image Detection add-on is very accurate, in some cases you may want to manually override the moderation decision. You can either approve a previously rejected image or reject an approved one.

One way to manually override the moderation result is using Cloudinary's Media Library web interface. From the left navigation menu, select Moderation. Then, from the drop-down list of moderation types in the top menu, select Duplicate and then select the status of the images you want to display (Pending, Rejected, or Approved).

  • When displaying the images rejected by the add-on, you can click on the thumbs up Approve button to revert the decision and recover the original rejected image.
  • When displaying the images approved by the add-on, you can click on the thumbs down Reject button to revert the decision and prevent a certain image from being publicly available to your users.

Alternatively, you can use Cloudinary's Admin API to manually override the moderation result. The following sample code uses the update API method while specifying a public ID of a moderated image and setting the moderation_status parameter to the approved status.

Threshold guidelines

The tables below show the returned confidence scores for images with various modifications, to give an idea of the thresholds you should expect to be using to determine if images with slight variations are regarded as duplicates or not.

Cropped images

Images cropped even a small amount are generally not detected as duplicates if the cropped out area is significant to the image. If the cropped out area is just plain background, then the image is detected as duplicate with a higher confidence.

Original image Crop to 98% of width Crop to 96% of width Crop to 93% of width Crop to 90% of width
1.0
0.844
0.629
0.551
0.395
1.0
0.883
0.824
0.707
0.648

Resized images

Resized images are detected as duplicates with, or close to, 100% confidence. This is true for both downscaled and upscaled images (upscaled not shown here).

Original image Scale to 90% Scale to 50% Scale to 20%
1.0
1.0
1.0
1.0
1.0
1.0
0.98
0.98

Images with overlays

For some images, overlays must be quite prominent for an image not to be detected as a duplicate.

Original image Overlay of 10% width Overlay of 25% width Overlay of 50% width Overlay of 80% width
1.0
0.863
0.766
0.512
0.258
1.0
0.883
0.551
0.238
<0.1

Blurred images

Blurred images are detected as duplicates with a high confidence, even if the level of blur is high.

Original image Blur 200 Blur 500 Blur 1500
1.0
1.0
0.98
0.98
1.0
1.0
1.0
0.98

Images of different formats and quality

Images of different formats and/or quality (compression) are detected as a duplicates with a high confidence.

Original image
(JPEG Quality 100)
JPEG Quality 80 WebP Quality 80 JPEG Quality 10 WebP Quality 10
1.0
1.0
1.0
0.98
0.98
1.0
0.98
0.98
0.961
0.98

✔️ Feedback sent!

Rate this page: