-
Notifications
You must be signed in to change notification settings - Fork 194
add CLIP w/ TORCH backend to inference_experimental #1415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
inference_experimental/inference_exp/models/clip/clip_pytorch.py
Outdated
Show resolved
Hide resolved
device: torch.device, | ||
): | ||
self.model = model | ||
self.preprocess = preprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I understand the preprocess
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer parameter, just instantiated in init to use shared preprocessor with onnx implementation
for img in images: | ||
tensor = _to_tensor(img) | ||
if tensor.dtype == torch.uint8: | ||
tensor = tensor.to(torch.float32) / 255.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like normalisation to 0-1 should be done regardless of data type(?) - hard to say arbitrarily thb as this is probably just a convention - I would keep assumption that image comes usually as [0-255] and this is how that is implemented for other models from what I remember, but you may check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now always done, based on how you had it in onnx preproc:
transforms = Compose(
[
Resize(image_size, interpolation=InterpolationMode.BICUBIC, antialias=True),
CenterCrop(image_size),
lambda x: x.to(torch.float32) / 255.0,
Normalize(MEAN, STD),
]
)
```
images_to_stack.append(cropped) | ||
tensor_batch = torch.stack(images_to_stack, dim=0) | ||
else: | ||
# Handle single image or 4D batch for optimized processing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handled properly by new shared pre-processor
test here cover:
- test_embed_single_numpy_image
- test_embed_single_tensor_image
- test_embed_list_of_numpy_images
- test_embed_list_of_tensor_images
- test_embed_batch_of_tensor_images
This reverts commit ee700e4.
…eights from our registry
] | ||
) | ||
|
||
def _preprocess( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function seems to be possible to be extracted from being inner one and just passed as first callable in a chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted it into standalone function in the module to avoid the nesting/inner function, but I think tricky to make it part of compose chain
torchvision.transforms.Compose
pipeline expects each transform to take a single argument. However, our _preprocess function is designed to be the main entry point and does more:
- handles multiple input types: It accepts a single np.ndarray, a single torch.Tensor, a list of arrays, or a list of tensors.
- calls the Compose pipeline on the prepared tensor batch, but have to do in for loop for lists because they may have different size and creates creates a batch tensor from a list of images, which may have variable sizes.
I might be wrong, not sure I completly understand how torchvision.transforms.Compose
works / if it we always convert list to batch tensor first and then transform on that?!
@@ -0,0 +1,203 @@ | |||
import os | |||
|
|||
os.environ["ROBOFLOW_API_HOST"] = "https://api.roboflow.one" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the models should be registered in prod API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, models are registered in prod now
import os | ||
|
||
os.environ["ROBOFLOW_API_HOST"] = "https://api.roboflow.one" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1] lack of types of objects in functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean on this
Description
This adds torch implementation of CLIP and
full test coverage for:
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a test case or example of how you tested the change?
locally with torch weights only registered for RN50 atm
Any specific deployment considerations
n/a
Docs
n/a