Skip to content

miladfa7/PickAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PickAgent: OpenVLA-powered Pick and Place Agent (Simulation)

OpenVLA is a open-source Vision-Language-Action (VLA) model with 7 billion parameters. Designed to empower robots with human-like perception and decision-making, it seamlessly integrates visual inputs and natural language instructions to perform diverse manipulation tasks. Trained on nearly a million episodes from the Open X-Embodiment dataset, OpenVLA sets a new standard for generalist robotic control. With a robust architecture combining SigLIP, DINOv2, and Llama 2 7B, it offers unparalleled adaptability and can be fine-tuned efficiently on consumer-grade GPUs, making advanced robotics more accessible than ever. Project Page

🎬 1. Demo

🌐 Gradio App Screenshot

Screenshot

Watch in Youtube!

PickAgent: OpenVLA-powered Pick and Place Agent(Simulation) PickAgent: OpenVLA-powered Pick and Place Agent(Simulation)

Gradio Demo

πŸš€ PickAgent is an AI-driven pick-and-place system powered by OpenVLA, showcasing advanced vision-language models in action. This simulation demonstrates how LLMs and computer vision work together for precise object manipulation.


πŸŽ₯ Video Results

Prompt: Pick up the salad dressing and place it in the basket

video1.mp4

Prompt: Pick up the tomato sauce and place it in the baske.

video2.mp4

Prompt: pick up the cream cheese and place it in the baske

video3.mp4

Prompt: Pick up the alphabet soup and place it in the bask.

video4.mp4

πŸ”§ 2. Installation

# Create and activate conda environment
conda create -n openvla python=3.11 -y
conda activate openvla

# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y  # UPDATE ME!

# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Additionally, install other required packages for simulation:

git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO
pip install -e .

cd openvla
pip install -r experiments/robot/libero/libero_requirements.txt

πŸš€ 3. Inference

Gradio App

Run the Gradio app for inference:

   python3 gradio_demo.py

Here’s a summary of the Gradio inputs and outputs Inputs:

  • Task: Selects the task type for the simulation.
  • Task ID: Specifies the task instance ID.
  • Prompt: Input for natural language instructions to control the robot.
  • Preview Button: Updates the environment preview based on selected task.
  • Run Simulation Button: Run the simulation with the given prompt.

Outputs:

  • Preview Image: Shows the environment's first frame.
  • Simulation Video: Shows the simulation result video.

Command Line Interface

Run the python script for inference:

   python3 inference.py --prompt="pick up the salad dressing and place it in the basket" --task="libero_object" --task_id=2 --image_resize=1024 --output_video="outputs/videos"

4. OpenVLA Models

Model Download
General OpenVLA πŸ€— HuggingFace
OpenVLA - Finetuned Libero Spatial πŸ€— HuggingFace
OpenVLA - Finetuned Libero Object πŸ€— HuggingFace
OpenVLA - Finetuned Libero Goal πŸ€— HuggingFace
OpenVLA - Finetuned Libero 10 πŸ€— HuggingFace

πŸ™ 5. Acknowledgement

models are borrowed from OpenVLA

About

PickAgent: OpenVLA-powered Pick and Place Agent | Gradio&Simulation | Vision Language Action Model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy