LFM2-VL-3B: A New Efficient Vision-Language for the Edge

We’re excited to release LFM2-VL-3B, the newest and most capable addition to our family of vision LFMs (450M and 1.6B). Built on the LFM2-2.6B backbone, this 3B parameter model targets applications that require more accuracy while maintaining the speed advantage of the LFM2 architecture. It is available today on LEAP and Hugging Face.

Flexible Architecture

LFM2-VL-3B follows the recipe adopted for our previous VLMs. It builds on our most powerful dense model, LFM2-2.6B, and integrates a SigLIP2 400M NaFlex encoder. This enables image processing at native resolutions with variable aspect ratios. Its flexible architecture allows developers to balance performance and speed by adjusting the number of vision tokens per image. This offers finer control for deployment, especially in edge environments.

You can find more information about the architecture in our LFM2-VL blog post.

Broader Capabilities

LFM2-VL-3B delivers competitive results across open-source evaluations, achieving an impressive 51.8% on MM-IFEval and 71.4% on RealWorldQA. The model shows strong performance in single- and multi-image comprehension and English OCR, with low hallucination rates on the POPE benchmark.

It maintains comparable language-only knowledge benchmark scores to its backbone, LFM2-2.6B, with 30% on GPQA and 63% on MMLU. In addition, we have significantly expanded multilingual capabilities, extending visual understanding beyond English to include Japanese, French, Spanish, German, Italian, Portuguese, Arabic, Chinese, and Korean.

Model	Average	MMStar	MMMU (val)	MathVista	BLINK	InfoVQA (val)	MMBench (dev en)	OCRBench	POPE	RealWorldQA	MME	MM-IFEval	SEEDBench
InternVL3_5-2B	66.63	57.67	51.78	61.60	50.97	69.29	78.18	834.00	87.17	60.78	2,128.83	47.31	75.41
Qwen2.5-VL-3B	66.61	56.13	51.67	62.50	48.97	76.12	80.41	824.00	86.17	65.23	2,163.29	38.62	73.88
InternVL3-2B	66.46	61.10	48.70	57.60	53.10	66.10	81.10	831.00	90.10	65.10	2,186.40	38.49	74.95
SmolVLM2-2.2B	54.85	46.00	41.60	51.50	42.30	37.75	69.24	725.00	85.10	57.50	1792.50	19.42	71.30
LFM2-VL-3B	67.31	57.73	45.33	62.20	51.03	67.37	79.81	822.00	89.01	71.37	2,050.90	51.83	76.55

We calculated the scores for all models using VLMEvalKit. We couldn’t include Qwen3-VL-2B in this table, as its release occurred the day before.

Open and Available

LFM2-VL-3B is now available on Hugging Face under our LFM Open License, and through our LEAP platform, making cutting-edge efficient AI accessible to developers and researchers worldwide.

The LFM2 series continues to push the boundaries of efficient AI. We're proving that with the right architecture and approach, smaller models can deliver enterprise-grade performance without the computational overhead. In the future, we will continue to scale our foundation models to bring this level of efficiency to more devices and unlock new applications.

Stay up to date with Liquid

LFM2-VL-3B: A New Efficient Vision-Language for the Edge

Flexible Architecture

Broader Capabilities

Open and Available

Power your business, workflows, and engineers with Liquid AI.

Manage your preferences