Skip to content

add CLIP w/ TORCH backend to inference_experimental #1415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Jul 18, 2025

Conversation

hansent
Copy link
Contributor

@hansent hansent commented Jul 9, 2025

Description

This adds torch implementation of CLIP and

full test coverage for:

  • shared preprocessing between onnx and torch implementation
  • e2e tests ensuring matching results for text and image results using cos similarity between original clip, clip_torch, and clip_onnx

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How has this change been tested, please provide a test case or example of how you tested the change?

locally with torch weights only registered for RN50 atm

Any specific deployment considerations

n/a

Docs

n/a

@hansent hansent changed the title initial stab at adding CLIP add CLIP to inference_experimental Jul 9, 2025
Base automatically changed from inference-exp/add-perception-encoder to feature/inference-v1-models July 10, 2025 15:31
Base automatically changed from feature/inference-v1-models to main July 11, 2025 16:41
device: torch.device,
):
self.model = model
self.preprocess = preprocess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand the preprocess parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer parameter, just instantiated in init to use shared preprocessor with onnx implementation

for img in images:
tensor = _to_tensor(img)
if tensor.dtype == torch.uint8:
tensor = tensor.to(torch.float32) / 255.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like normalisation to 0-1 should be done regardless of data type(?) - hard to say arbitrarily thb as this is probably just a convention - I would keep assumption that image comes usually as [0-255] and this is how that is implemented for other models from what I remember, but you may check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now always done, based on how you had it in onnx preproc:

    transforms = Compose(
        [
            Resize(image_size, interpolation=InterpolationMode.BICUBIC, antialias=True),
            CenterCrop(image_size),
            lambda x: x.to(torch.float32) / 255.0,
            Normalize(MEAN, STD),
        ]
    )
    ```

images_to_stack.append(cropped)
tensor_batch = torch.stack(images_to_stack, dim=0)
else:
# Handle single image or 4D batch for optimized processing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about single ndarray - seems it will be failing into this branch

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think handled properly by new shared pre-processor

test here cover:

  • test_embed_single_numpy_image
  • test_embed_single_tensor_image
  • test_embed_list_of_numpy_images
  • test_embed_list_of_tensor_images
  • test_embed_batch_of_tensor_images

@hansent hansent changed the title add CLIP to inference_experimental add CLIP w/ TORCH backend to inference_experimental Jul 15, 2025
@hansent hansent marked this pull request as ready for review July 15, 2025 21:19
]
)

def _preprocess(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems to be possible to be extracted from being inner one and just passed as first callable in a chain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted it into standalone function in the module to avoid the nesting/inner function, but I think tricky to make it part of compose chain

torchvision.transforms.Compose pipeline expects each transform to take a single argument. However, our _preprocess function is designed to be the main entry point and does more:

  • handles multiple input types: It accepts a single np.ndarray, a single torch.Tensor, a list of arrays, or a list of tensors.
  • calls the Compose pipeline on the prepared tensor batch, but have to do in for loop for lists because they may have different size and creates creates a batch tensor from a list of images, which may have variable sizes.

I might be wrong, not sure I completly understand how torchvision.transforms.Compose works / if it we always convert list to batch tensor first and then transform on that?!

@@ -0,0 +1,203 @@
import os

os.environ["ROBOFLOW_API_HOST"] = "https://api.roboflow.one"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the models should be registered in prod API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, models are registered in prod now

import os

os.environ["ROBOFLOW_API_HOST"] = "https://api.roboflow.one"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1] lack of types of objects in functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean on this

@PawelPeczek-Roboflow PawelPeczek-Roboflow self-requested a review July 17, 2025 12:55
@PawelPeczek-Roboflow PawelPeczek-Roboflow self-requested a review July 17, 2025 13:20
@PawelPeczek-Roboflow PawelPeczek-Roboflow self-requested a review July 17, 2025 14:09
@PawelPeczek-Roboflow PawelPeczek-Roboflow self-requested a review July 17, 2025 17:04
@PawelPeczek-Roboflow PawelPeczek-Roboflow self-requested a review July 18, 2025 06:11
@PawelPeczek-Roboflow PawelPeczek-Roboflow merged commit 87abccd into main Jul 18, 2025
40 checks passed
@PawelPeczek-Roboflow PawelPeczek-Roboflow deleted the inference-exp-add-clip branch July 18, 2025 06:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy