Today, we are completing the LFM2 family with the launch of our most capable model yet: LFM2-24B-A2B. While our Technical Blog dives into the architectural specs, the real story is its versatility. This model is designed to run everywhere, from massive cloud clusters to the next generation of AI PCs and mobile devices.

Thanks to our partner ecosystem, developers can now access LFM2-24B-A2B with day-zero optimization:

Whether you’re building a global enterprise application or a privacy-focused local assistant, LFM2-24B-A2B is ready wherever your users are.

Cloud Partners: Instant, Scalable, and On-Demand

together.ai

Together AI, the AI Native Cloud, is our partner for production-ready serverless agentic deployment. Developers can deploy LFM2-24B-A2B with 99.9% reliability SLA on serverless infrastructure, optimized for high-volume multi-agent workflows. Try the model today in the Together AI Playground or via API (model=“liquid-ai/lfm2-24b-a2b”).

Modal

Modal is best suited for developers seeking to rapidly customize and/or self-deploy Liquid AI models. Modal’s ergonomic Python SDK, global GPU pool, and sub-second cold starts enable developers to instantly scale models in production without complex compute orchestration.

Looking to run LFM2 24B with lightning-fast cold starts and ultra-low latency? Modal has put together a fantastic guide on serving Liquid’s LFM2 models using vLLM. By leveraging Modal’s CPU + GPU memory snapshots and their new low-latency routing service, you can easily deploy endpoints perfectly tailored for latency-sensitive workloads. Check out the step-by-step example to get started.

Edge: Privacy-First AI on Your Device

AMD

Our close partnership with AMD allows us to maximize performance across their entire hardware stack, including CPUs, GPUs, and NPUs. By leveraging the RyzenLLM-AI inference engine, LFM2 24B MoE model delivers a high-efficiency experience for local AI tasks on NPUs.

“AMD is proud to provide day zero support for the latest Liquid Foundation model from Liquid AI. Our broad portfolio of GPUs and NPU-enabled APUs provide users with a wide set of options to use this model with great performance and efficiency.” — Ramine Roane, Corporate Vice President of AI Product Management at AMD.
Intel

We are thrilled to welcome Intel as an official launch partner for our new LFM2 24B MoE model. As part of this collaboration, we are excited to share that Intel’s AI Inference software OpenVINO™ toolkit now supports our LFM2-24B-A2B model. This integration enables developers and enterprises to seamlessly optimize and deploy our models across Intel's extensive hardware range, accelerating inference and improving AI performance on AI PCs, edge devices, and data centers.

You can see the model in action in the demo below, showcasing the seamless performance unlocked by combining Liquid’s architecture with Intel’s OpenVINO™ toolkit on an AI PC powered by intel:

Ollama

For developers who want to integrate LFM2 24B into their scripts or prefer a terminal-based workflow, we are also live on the Ollama library. You can pull and run the model in a single command, making it instantly available for local RAG pipelines or agentic workflows, on the hardware of choice.

Get Started on the Command Line:

ollama run lfm2:24b-a2b
NEXA AI

Running a 24B model on a phone typically drains resources instantly. However, with Nexa AI's optimization SDK, high-quality reasoning becomes sustainable.

See LFM2-24B-A2B running on Qualcomm Snapdragon® 8 Elite For Galaxy device (Samsung Galaxy S25 Ultra) powered by the Qualcomm Hexagon NPU, solving and explaining a difficult math question at a decoding speed of 35.4 tokens/sec.

Try it out today with NexaSDK — a unified local inference framework that runs any model on any backend. Featured on the official Qualcomm website as the simplest way to bring on-device AI to Snapdragon, NexaSDK lets you deploy the latest models across NPU, GPU, and CPU with just a few lines of code. With 7.7K+ GitHub stars and an active open-source community, it's trusted by leading technology partners worldwide.

Cactus

We are excited to share that the Cactus engine has been updated to fully support the launch of LFM2 24B MoE model. The performance on Apple Silicon is particularly impressive. Running INT8 quantization on an M4 Pro CPU, the Cactus engine achieves a blazing 229 tokens/sec prefill and 27 tokens/sec decode (based on 1k prefill, 100 decode benchmarks).

"LFM2-24B-A1 excels at coding, keen to see on-device coding agents built with these."— Henry Ndubuaku, Co-Founder, Cactus

To get started with Cactus, check out the example How to build on-device coding agents with Cactus & LFM2-24B-A2B

LM Studio

We are thrilled to welcome LM Studio as an official launch partner for the LFM2 series. LM Studio has defined the standard for discovering and running local LLMs with an easy-to-use interface, and we have worked closely with their team to ensure LFM2 24B is optimized from Day 1.

This partnership brings the power of Liquid’s architecture to a broad audience of developers and power users who want a visual way to manage, chat with, and test models locally.

Start Building with LFM2 24B Instruct MoE Today

From cloud deployments to running 100% locally on the latest AI PCs, the LFM2 24B MoE model is built to run anywhere without compromising on performance.

Get Started Now:

  • Cloud: Test it instantly on Together AI or Modal.
  • Local: Across mobile, desktop, and terminal via Cactus, Ollama, LM Studio, and Nexa AI.

The agentic AI future is here. We can't wait to see what you build. Share your projects with us and join the Liquid AI community on Discord!

Ready to experience AI?

Power your business, workflows, and engineers with Liquid AI.

Manage your preferences

We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

Learn more about our Privacy Policy
  • Essential cookies required