Lecture

Vision-Language-Action models

13 Sep 2025 • Richard Kuo

Introduction to VLA

Vision-Language-Action models (VLA)

Arxiv: Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Open X-Embodiment

Arxiv: Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Robotics Transformer (RT-1)

BridgeData V2

Arxiv: BridgeData V2: A Dataset for Robot Learning at Scale

Github: https://github.com/rail-berkeley/bridge_data_v2

OpenVLA

Arxiv: OpenVLA: An Open-Source Vision-Language-Action Model

Installation

!git clone https://github.com/openvla/openvla
%cd openvla

!pip install -r requirements-min.txt

For example, to load openvla-7b for zero-shot instruction following in the BridgeData V2 environments with a WidowX robot:

# Install minimal dependencies (`torch`, `transformers`, `timm`, `tokenizers`, ...)
# > pip install -r https://raw.githubusercontent.com/openvla/openvla/main/requirements-min.txt
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

import torch

# Load Processor & VLA
processor = AutoProcessor.from_pretrained("openvla/openvla-7b", trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(
    "openvla/openvla-7b", 
    attn_implementation="flash_attention_2",  # [Optional] Requires `flash_attn`
    torch_dtype=torch.bfloat16, 
    low_cpu_mem_usage=True, 
    trust_remote_code=True
).to("cuda:0")

# Grab image input & format prompt
image: Image.Image = get_from_camera(...)
prompt = "In: What action should the robot take to {<INSTRUCTION>}?\nOut:"

# Predict Action (7-DoF; un-normalize for BridgeData V2)
inputs = processor(prompt, image).to("cuda:0", dtype=torch.bfloat16)
action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)

# Execute...
robot.act(action, ...)

OpenVLA-OFT

Arxiv: Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Github: https://github.com/moojink/openvla-oft

SmolVLA

HuggingFace”: lerobot/smolvla_base
**Arxiv: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

SO-101 robot

SO-ARM101 AI 機器手臂PRO套件 for LeRobot

This site was last updated October 02, 2025.