October 2025 — Cambridge, MA — Together, Liquid AI, AMD, and Robotec.ai have deployed compact foundation models for autonomous agentic robotics:  showcasing a specialized 3-billion parameter Liquid vision-language model (LFM2-VL-3B), running efficiently on AMD Ryzen™ AI processors, to enable real-time multimodal perception and decision-making at the edge. Using Robotec.ai's RAI framework—a flexible AI agent platform designed for developing and deploying Embodied AI features—the team validated autonomous inspection capabilities through hardware-in-the-loop simulation across hundreds of warehouse scenarios. This approach enabled rigorous testing of the complete system before physical deployment, significantly reducing development time and risk while ensuring robust real-world performance.

The demonstration, to be presented at ROSCon 2025 in Singapore, aims to highlight recent progress in combining multimodal reasoning and edge inference for agentic robotics in  industrial environments.

System Overview

The prototype system consists of a mobile robot operating within a simulated warehouse environment with mixed human and machine traffic. The robot is equipped with onboard vision and control modules. All inference runs locally on AMD Ryzen™ AI processors without any internet or network connectivity.

LFM2-VL-3B serves as an agentic perception and reasoning tool within the system's decision-making pipeline. The model receives natural language instructions and raw images as input, then outputs structured assessments identifying package defects, warehouse hazards, OSHA violations, and operational anomalies, along with recommended corrective actions.

LFM2-VL-3B’s visual understanding and language comprehension enable contextual reasoning and autonomous planning. A key advantage of LFM2-VL-3B is its ability to translate visual observations into structured, parsable formats that downstream agents can act upon. This achieves seamless integration between perception and planning: the VLM detects hazards, describes them in language, and proposes appropriate responses all in a single inference pass. For example, when the model identifies a fallen box in a walkway, it outputs structured JSON that both documents the hazard and recommends next steps.

EXAMPLE:

Prompt:

“You are a warehouse 5S/housekeeping auditor…”
Output:
{
  "scene_description": "Wide angle photo of an indoor warehouse aisle. The left side has tall
  metal racks with orange beams, loaded with many cardboard boxes on several shelf levels.
  Some boxes are stored on the floor directly beneath the racks, close to the rack legs. The
  concrete floor is smooth, with faint scuff marks and small stains, otherwise free of liquid. A 
  small cardboard box lies in the center of the main aisle, on its side, creating an obvious trip
  hazard. A few additional small boxes sit on the floor near the foreground, next to the rack feet,
  reducing usable aisle width. On the right side there is a white modular office or container with
  windows and a door. A white plastic canister and a small red object are located at the base of
  the office, on the floor adjacent to the walkway. The roof is supported by exposed metal
  trusses, with a row of overhead lights that appear to provide even illumination down the aisle.
  Racking, boxes, people, and equipment are all within view, there are no forklifts or pallet jacks
  visible in this frame.",
  "anomaly": true,
  "safety_violations": [
    "Fallen box in walkway"
  ],
  "inspection_results": [
    "Remove loose cardboard box from aisle floor",
    "Pick up fallen box near middle of walkway"
  ]
}

Embedded Inference and Efficiency

The model is executed locally through llama.cpp, which supports native AMD acceleration. The system achieves low-latency performance suitable for both visual reasoning and instruction-following tasks. No external inference or cloud services are required. Preliminary measurements show that the combined hardware–model stack operates efficiently within the real-time constraints typical of closed-loop industrial systems.

Simulation Framework

Validation was performed by Robotec.ai using a ROS 2-based hardware-in-the-loop simulator connected directly to the robot’s sensor and actuator interfaces. The simulator reproduces the physical dynamics of a warehouse environment, enabling controlled evaluation of decision-making accuracy, replanning behavior, and perception reliability under a variety of  conditions. This environment served to  benchmark model performance in repeatable scenarios before field deployment, reducing the cost and risk associated with testing embodied AI systems in real settings.

Closing

The demonstration extends previous work between Liquid AI and AMD on efficient foundation model deployment for edge and on-device inference, as part of the LEAP SDK integration announced earlier this year. Together with Robotec.ai’s simulation platform, the collaboration connects research in agentic reasoning, multimodal foundation models, and embedded systems design.

Availability

LFM2-VL-3B is available under the LFM Open License on Hugging Face and can be accessed via the LEAP SDK for deployment on AMD hardware. The model, code examples, and benchmark data are provided for research and integration purposes.

Further reading

Ready to experience AI?

Power your business, workflows, and engineers with Liquid AI.

Manage your preferences

We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

Learn more
  • Essential cookies required