LMM For Classification¶
Class: LMMForClassificationBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm_classifier.v1.LMMForClassificationBlockV1
Classify an image into one or more categories using a Large Multimodal Model (LMM).
You can specify arbitrary classes to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm_for_classification@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
classes |
List[str] |
List of classes that LMM shall classify against. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM For Classification
in version v1
.
- inputs:
OpenAI
,Email Notification
,Relative Static Crop
,Pixelate Visualization
,Line Counter Visualization
,Image Contours
,Circle Visualization
,SIFT
,Color Visualization
,Single-Label Classification Model
,Clip Comparison
,Trace Visualization
,Keypoint Visualization
,Image Blur
,Florence-2 Model
,Buffer
,Grid Visualization
,Roboflow Dataset Upload
,Halo Visualization
,OpenAI
,Polygon Visualization
,Dynamic Zone
,Stability AI Image Generation
,VLM as Classifier
,Multi-Label Classification Model
,Image Convert Grayscale
,SIFT Comparison
,Twilio SMS Notification
,Crop Visualization
,Classification Label Visualization
,Stability AI Inpainting
,Depth Estimation
,OpenAI
,LMM For Classification
,Anthropic Claude
,Mask Visualization
,Image Slicer
,Camera Calibration
,Perspective Correction
,Camera Focus
,Model Monitoring Inference Aggregator
,CSV Formatter
,Background Color Visualization
,Bounding Box Visualization
,Object Detection Model
,Dot Visualization
,Reference Path Visualization
,Roboflow Custom Metadata
,Ellipse Visualization
,Blur Visualization
,Google Vision OCR
,Llama 3.2 Vision
,Roboflow Dataset Upload
,CogVLM
,Triangle Visualization
,VLM as Detector
,Model Comparison Visualization
,Dynamic Crop
,Stitch Images
,Stitch OCR Detections
,Image Threshold
,Size Measurement
,Image Slicer
,Slack Notification
,Polygon Zone Visualization
,Corner Visualization
,OCR Model
,Local File Sink
,LMM
,Google Gemini
,Stability AI Outpainting
,Webhook Sink
,Image Preprocessing
,Label Visualization
,Keypoint Detection Model
,Clip Comparison
,Absolute Static Crop
,Dimension Collapse
,Florence-2 Model
,Instance Segmentation Model
- outputs:
OpenAI
,Perception Encoder Embedding Model
,Email Notification
,Line Counter Visualization
,Circle Visualization
,YOLO-World Model
,Color Visualization
,Clip Comparison
,Time in Zone
,Trace Visualization
,Detections Classes Replacement
,Keypoint Visualization
,Detections Stitch
,Image Blur
,Florence-2 Model
,Roboflow Dataset Upload
,Halo Visualization
,OpenAI
,Polygon Visualization
,Path Deviation
,Stability AI Image Generation
,SIFT Comparison
,Twilio SMS Notification
,Classification Label Visualization
,Crop Visualization
,Segment Anything 2 Model
,Stability AI Inpainting
,OpenAI
,LMM For Classification
,Anthropic Claude
,Mask Visualization
,Perspective Correction
,Model Monitoring Inference Aggregator
,PTZ Tracking (ONVIF)
.md),Bounding Box Visualization
,Background Color Visualization
,Line Counter
,Pixel Color Count
,Dot Visualization
,Reference Path Visualization
,Roboflow Custom Metadata
,Ellipse Visualization
,CLIP Embedding Model
,Google Vision OCR
,Llama 3.2 Vision
,Roboflow Dataset Upload
,CogVLM
,Triangle Visualization
,Model Comparison Visualization
,Dynamic Crop
,Image Threshold
,Size Measurement
,Time in Zone
,Slack Notification
,Polygon Zone Visualization
,Corner Visualization
,Local File Sink
,LMM
,Cache Set
,Google Gemini
,Webhook Sink
,Stability AI Outpainting
,Image Preprocessing
,Distance Measurement
,Path Deviation
,Label Visualization
,Line Counter
,Cache Get
,Instance Segmentation Model
,Florence-2 Model
,Instance Segmentation Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM For Classification
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..lmm_type
(string
): Type of LMM to be used.classes
(list_of_values
): List of classes that LMM shall classify against.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
raw_output
(string
): String value.top
(top_class
): String value representing top class predicted by classification model.parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.prediction_type
(prediction_type
): String value with type of prediction.
Example JSON definition of step LMM For Classification
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm_for_classification@v1",
"images": "$inputs.image",
"lmm_type": "gpt_4v",
"classes": [
"a",
"b"
],
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx"
}