LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
OpenAI
,Email Notification
,Relative Static Crop
,Pixelate Visualization
,Line Counter Visualization
,Image Contours
,Circle Visualization
,SIFT
,Color Visualization
,Single-Label Classification Model
,Clip Comparison
,Trace Visualization
,Keypoint Visualization
,Image Blur
,Florence-2 Model
,Grid Visualization
,Roboflow Dataset Upload
,Halo Visualization
,OpenAI
,Polygon Visualization
,Stability AI Image Generation
,VLM as Classifier
,Multi-Label Classification Model
,Image Convert Grayscale
,SIFT Comparison
,Twilio SMS Notification
,Crop Visualization
,Classification Label Visualization
,Stability AI Inpainting
,Depth Estimation
,OpenAI
,LMM For Classification
,Anthropic Claude
,Mask Visualization
,Image Slicer
,Camera Calibration
,Perspective Correction
,Camera Focus
,Model Monitoring Inference Aggregator
,CSV Formatter
,Background Color Visualization
,Bounding Box Visualization
,Object Detection Model
,Dot Visualization
,Reference Path Visualization
,Roboflow Custom Metadata
,Ellipse Visualization
,Blur Visualization
,Google Vision OCR
,Llama 3.2 Vision
,Roboflow Dataset Upload
,CogVLM
,Triangle Visualization
,VLM as Detector
,Model Comparison Visualization
,Dynamic Crop
,Stitch Images
,Stitch OCR Detections
,Image Threshold
,Image Slicer
,Slack Notification
,Polygon Zone Visualization
,Corner Visualization
,OCR Model
,Local File Sink
,LMM
,Google Gemini
,Stability AI Outpainting
,Webhook Sink
,Image Preprocessing
,Label Visualization
,Keypoint Detection Model
,Absolute Static Crop
,Florence-2 Model
,Instance Segmentation Model
- outputs:
OpenAI
,Perception Encoder Embedding Model
,Email Notification
,Property Definition
,Detections Filter
,Pixelate Visualization
,Byte Tracker
,Circle Visualization
,SIFT
,YOLO-World Model
,Color Visualization
,Object Detection Model
,Image Blur
,Detections Merge
,Gaze Detection
,Buffer
,Template Matching
,Halo Visualization
,Byte Tracker
,Moondream2
,Polygon Visualization
,Stability AI Image Generation
,VLM as Classifier
,JSON Parser
,Image Convert Grayscale
,SIFT Comparison
,Detection Offset
,Twilio SMS Notification
,Classification Label Visualization
,Crop Visualization
,OpenAI
,Depth Estimation
,LMM For Classification
,Cosine Similarity
,Anthropic Claude
,Mask Visualization
,Perspective Correction
,Multi-Label Classification Model
,Camera Calibration
,Barcode Detection
,Background Color Visualization
,QR Code Detection
,Object Detection Model
,Dominant Color
,Dot Visualization
,Data Aggregator
,Bounding Rectangle
,Google Vision OCR
,Single-Label Classification Model
,Llama 3.2 Vision
,Roboflow Dataset Upload
,CogVLM
,Model Comparison Visualization
,Dynamic Crop
,Image Threshold
,First Non Empty Or Default
,Size Measurement
,Time in Zone
,Keypoint Detection Model
,Polygon Zone Visualization
,Continue If
,OCR Model
,Local File Sink
,Cache Set
,Detections Consensus
,Webhook Sink
,Stability AI Outpainting
,Velocity
,Distance Measurement
,Label Visualization
,SIFT Comparison
,Line Counter
,Cache Get
,Keypoint Detection Model
,Clip Comparison
,Instance Segmentation Model
,Absolute Static Crop
,Dimension Collapse
,Florence-2 Model
,Relative Static Crop
,Line Counter Visualization
,Image Contours
,Expression
,Single-Label Classification Model
,Clip Comparison
,Time in Zone
,Trace Visualization
,Detections Classes Replacement
,Keypoint Visualization
,Detections Stitch
,Byte Tracker
,Florence-2 Model
,Overlap Filter
,Grid Visualization
,Roboflow Dataset Upload
,OpenAI
,Dynamic Zone
,Rate Limiter
,Path Deviation
,Qwen2.5-VL
,Multi-Label Classification Model
,Detections Transformation
,SmolVLM2
,Segment Anything 2 Model
,Stability AI Inpainting
,Image Slicer
,VLM as Detector
,Camera Focus
,Model Monitoring Inference Aggregator
,CSV Formatter
,PTZ Tracking (ONVIF)
.md),Bounding Box Visualization
,Identify Outliers
,Line Counter
,Pixel Color Count
,Reference Path Visualization
,Roboflow Custom Metadata
,Ellipse Visualization
,Blur Visualization
,CLIP Embedding Model
,Triangle Visualization
,VLM as Detector
,Stitch Images
,Stitch OCR Detections
,Image Slicer
,Slack Notification
,Detections Stabilizer
,Corner Visualization
,VLM as Classifier
,LMM
,Delta Filter
,Google Gemini
,Identify Changes
,Image Preprocessing
,Path Deviation
,Instance Segmentation Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}