ModelsJUN 18, 2026

LFM2.5 Retrievers: Bi-directional LFMs for Fast Multilingual Search

Today, we’re releasing two new multilingual retrieval models: LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M. Both are 350M-parameter models and the first bidirectional members of the LFM family, building on our LFM2.5-350M-Base from March. They are built for fast and reliable multilingual and cross-lingual search across 11 languages, with a footprint small enough to run almost anywhere.

They are especially well-suited for short-context search: product catalogs, FAQ knowledge bases, support docs, and other collections that need to be searched quickly, cost-effectively, and reliably across languages.

The two models suit different needs:

LFM2.5-Embedding-350M turns each document into a single vector. Pick it when you want the fastest search and the smallest, cheapest index.
LFM2.5-ColBERT-350M converts each token into a vector rather than a single vector per document. This lets it match queries word-by-word for higher accuracy and better generalization, at the cost of a larger index. Pick it when accuracy matters more than storage.

Bar chart titled “LFM2.5-Embedding-350M Retrieval Benchmarks” comparing bi-encoder retrieval quality across NanoBEIR ndcg@10, MKQA recall@20, and MKQA recall@100. LFM2.5-Embedding-350M achieves leading or near-leading scores, including 0.58 on NanoBEIR, 0.69 on MKQA recall@20, and 0.76 on MKQA recall@100.

Architecture Updates

Both models are built from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. We apply a small set of bidirectional patches to the LFM2 architecture, adapting it from a causal decoder to a bidirectional encoder.

In the causal setup, each token can only use information from itself and previous tokens, which is ideal for left-to-right generation but less natural for retrieval. We replace the causal attention mask with a bidirectional one (figure below, left side), so every token can attend to both left and right context. We also make the LFM2 short convolutions non-causal (figure below, right side), so they mix local information symmetrically around each token rather than only from the past. This preserves the efficiency of the LFM2 backbone while producing the full-context representations retrieval tasks need.

Diagram titled “Bi-directional patches LFM 2.5-350M” showing how attention and ShortConv blocks change from causal processing to full-context bidirectional processing, allowing tokens to use context from both past and future positions.

From this shared bi-directional encoder, the two models differ only in how they represent text. LFM2.5-Embedding-350M uses CLS-style pooling to produce a single dense embedding, while LFM2.5-ColBERT-350M keeps compact per-token embeddings for MaxSim late interaction.

Diagram comparing LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M retrieval methods. The embedding model pools query and document token sequences into single vectors for fast scoring, while the ColBERT model uses multi-vector late interaction with MaxSim matching for more accurate word-by-word retrieval.

Compared with LFM2-ColBERT-350M, this release uses the newer LFM2.5 checkpoint, expands language coverage, and adds explicit multilingual and cross-lingual retrieval training. It also introduces a companion bi-encoder built on the same backbone and recipe.

Training and Data

Both models follow the same three-stage training recipe: (1) large-scale contrastive pretraining in English, (2) multilingual and cross-lingual distillation from a strong teacher (across all 11 supported languages), and (3) final fine-tuning on hard-mined negatives. The staged structure was also inspired by LightOn’s LateOn and DenseOn release, which also separate broad contrastive pretraining from later specialization stages.

LFM2.5-Embedding-350M receives slightly more cross-lingual data than LFM2.5-ColBERT-350M, since cross-lingual retrieval emerges more naturally in the late-interaction setup and benefits less from additional supervision.

The training data combines curated internal data with open-source English retrieval datasets. We leverage LLM-based translation of queries and documents to expand multilingual and cross-lingual pairs used during the second and third training phases.

Benchmarks

We report fine-grained benchmark results across all 11 supported languages: Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish. Our evaluation focuses on two capabilities: multilingual retrieval with NanoBEIR, and cross-lingual open-domain QA with MKQA-11. Together, they test whether the models can retrieve relevant documents within a language and across language boundaries.

Overall, both LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M show best-in-class multilingual and cross-lingual performance. Their results remain consistently competitive across all 11 supported languages, highlighting the robustness of the retrieval quality beyond English.

NanoBEIR Multilingual Extended (NDCG@10)
Model		AVG	ar	de	en	es	fr	it	ja	ko	no	pt	sv
LiquidAI/LFM2.5-ColBERT-350M	colbert	0.605	0.551	0.606	0.687	0.607	0.622	0.606	0.614	0.590	0.570	0.613	0.586
LiquidAI/LFM2.5-Embedding-350M	bienc	0.577	0.529	0.581	0.644	0.581	0.592	0.583	0.575	0.563	0.557	0.581	0.566
Qwen/Qwen3-Embedding-0.6B	bienc	0.556	0.514	0.560	0.649	0.568	0.565	0.565	0.551	0.530	0.516	0.571	0.525
LiquidAI/LFM2-ColBERT-350M	colbert	0.540	0.491	0.563	0.661	0.563	0.564	0.543	0.557	0.527	0.449	0.547	0.480
Alibaba-NLP/gte-multilingual-base	bienc	0.528	0.477	0.523	0.624	0.537	0.542	0.528	0.511	0.494	0.516	0.534	0.526
lightonai/GTE-ModernColBERT-v1	colbert	0.489	0.309	0.499	0.680	0.525	0.546	0.516	0.459	0.368	0.465	0.530	0.483
lightonai/LateOn	colbert	0.484	0.307	0.505	0.690	0.531	0.537	0.514	0.442	0.326	0.465	0.533	0.475
lightonai/DenseOn	bienc	0.432	0.178	0.474	0.676	0.496	0.520	0.487	0.378	0.197	0.422	0.493	0.433
Alibaba-NLP/gte-modernbert-base	bienc	0.383	0.112	0.449	0.666	0.448	0.475	0.408	0.275	0.180	0.376	0.431	0.391
BAAI/bge-large-en-v1.5	bienc	0.359	0.059	0.419	0.642	0.445	0.475	0.431	0.198	0.132	0.358	0.434	0.353

Table 1: Per-language NanoBEIR Multilingual Extended (NDCG@10). Best score per model type in bold.

MKQA-11 (Recall@20)

Model	Type	AVG	ar	de	en	es	fr	it	ja	ko	no	pt	sv
LiquidAI/LFM2.5-ColBERT-350M	colbert	0.694	0.608	0.709	0.748	0.711	0.715	0.707	0.703	0.640	0.689	0.703	0.700
LiquidAI/LFM2.5-Embedding-350M	bienc	0.691	0.610	0.709	0.738	0.708	0.715	0.703	0.685	0.630	0.691	0.710	0.708
Alibaba-NLP/gte-multilingual-base	bienc	0.675	0.567	0.692	0.741	0.705	0.703	0.697	0.655	0.563	0.698	0.700	0.699
LiquidAI/LFM2-ColBERT-350M	colbert	0.646	0.554	0.696	0.754	0.711	0.710	0.667	0.658	0.558	0.541	0.669	0.589
Qwen/Qwen3-Embedding-0.6B	bienc	0.638	0.520	0.671	0.723	0.678	0.672	0.671	0.635	0.543	0.620	0.667	0.620
lightonai/GTE-ModernColBERT-v1	colbert	0.459	0.092	0.532	0.754	0.552	0.615	0.510	0.275	0.166	0.503	0.524	0.524
lightonai/LateOn	colbert	0.454	0.157	0.492	0.755	0.537	0.577	0.481	0.316	0.209	0.472	0.502	0.501
lightonai/DenseOn	bienc	0.435	0.165	0.482	0.751	0.491	0.553	0.457	0.325	0.222	0.438	0.443	0.453
BAAI/bge-large-en-v1.5	bienc	0.413	0.133	0.471	0.748	0.450	0.531	0.461	0.208	0.172	0.456	0.443	0.467
Alibaba-NLP/gte-modernbert-base	bienc	0.295	0.060	0.333	0.736	0.273	0.417	0.291	0.100	0.052	0.332	0.326	0.330

Table 2: Per-Language MKQA-11 (Recall@20). Best score per model type in bold.

Different from LightOn’s work, we find that NanoBEIR English provides a sufficient evaluation signal. Across the models we evaluated, NanoBEIR English and the more expensive full BEIR remain highly correlated, with NanoBEIR scoring a near-constant ~15% higher. We therefore use NanoBEIR as a practical proxy for full BEIR when iterating across training runs.

Inference

We evaluate end-to-end latency in the retrieval regimes that matter in practice: query embedding with cached documents, query embedding plus MaxSim, and query embedding plus document embedding plus MaxSim when documents are not cached.

For portable deployment, we release LFM2.5-ColBERT-350M-GGUF and LFM2.5-Embedding-350M-GGUF for llama.cpp, so the models can run nearly anywhere (CPUs, laptops, and edge devices) at near-zero cost and with compelling latency.

Model	Setup	Docs Cache	p50	p95
LFM2.5-Embedding-350M	Query embedding	yes	7.3 ms	9.6 ms
LFM2.5-ColBERT-350M	Query embedding	yes	8.1 ms	8.5 ms
LFM2.5-ColBERT-350M	Query embedding + MaxSim	yes	8.2 ms	15.2 ms
LFM2.5-ColBERT-350M	Query + Doc embedding + MaxSim	no	34.3 ms	36.3 ms

Table 3: llama.cpp end-to-end latency on MacBook M4 Max at FP16. 32 tokens query, 256 tokens document.

For large-scale production-grade enterprise deployments, we also develop an internal GPU stack to deliver extremely low-latency serving under high inbound load.

Model	Setup	p50	p95	p99
LFM2.5-Embedding-350M	Query embedding	1.5 ms	1.6 ms	1.7 ms
LFM2.5-ColBERT-350M	Query embedding	1.3 ms	1.4 ms	1.5 ms
LFM2.5-ColBERT-350M	Query embedding + MaxSim	2.5 ms	2.7 ms	2.8 ms
LFM2.5-ColBERT-350M	Query + Doc Embedding + MaxSim	22.8 ms	24.1 ms	26.4 ms

Table 4: Internal inference stack, end-to-end latency on H100 at BF16. 32 tokens query, 256 tokens document.

Training your own

While these models perform strongly out of the box, we encourage you to fine-tune either model on your own data for domain-specific retrieval. We especially recommend this for LFM2.5-Embedding-350M, for which our Hugging Face model card provides simple fine-tuning snippets with sentence-transformers.

Get Started

The LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M models are available today on Hugging Face. For teams looking to deploy retrieval at enterprise scale, contact us to learn more.

Download LFM2.5-Embedding-350M on Hugging Face

Download LFM2.5-ColBERT-350M on Hugging Face

Try the ColBERT model

Citation

Please cite this article using the following reference or BibTeX citation:

Liquid AI, "LFM2.5 Retrievers: Bi-directional LFMs for Fast Multilingual Search", Liquid AI Blog, Jun 2026.

@article{liquidAI2026Retrievers,
  author = {Liquid AI},
  title = {LFM2.5 Retrievers: Bi-directional LFMs for Fast Multilingual Search},
  journal = {Liquid AI Blog},
  year = {2026},
  note = {www.liquid.ai/blog/lfm2-5-retrievers},
}