Today, we’re releasing two new multilingual retrieval models: LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M. Both are 350M-parameter models and the first bidirectional members of the LFM family, building on our LFM2.5-350M-Base from March. They are built for fast and reliable multilingual and cross-lingual search across 11 languages, with a footprint small enough to run almost anywhere.

They are especially well-suited for short-context search: product catalogs, FAQ knowledge bases, support docs, and other collections that need to be searched quickly, cost-effectively, and reliably across languages.

The two models suit different needs:

  • LFM2.5-Embedding-350M turns each document into a single vector. Pick it when you want the fastest search and the smallest, cheapest index.
  • LFM2.5-ColBERT-350M converts each token into a vector rather than a single vector per document. This lets it match queries word-by-word for higher accuracy and better generalization, at the cost of a larger index. Pick it when accuracy matters more than storage.

Architecture Updates

Both models are built from LFM2.5-350M-Base, a mid-trained general-purpose checkpoint. We apply a small set of bidirectional patches to the LFM2 architecture, adapting it from a causal decoder to a bidirectional encoder.

In the causal setup, each token can only use information from itself and previous tokens, which is ideal for left-to-right generation but less natural for retrieval. We replace the causal attention mask with a bidirectional one (figure below, left side), so every token can attend to both left and right context. We also make the LFM2 short convolutions non-causal (figure below, right side), so they mix local information symmetrically around each token rather than only from the past. This preserves the efficiency of the LFM2 backbone while producing the full-context representations retrieval tasks need.

From this shared bi-directional encoder, the two models differ only in how they represent text. LFM2.5-Embedding-350M uses CLS-style pooling to produce a single dense embedding, while LFM2.5-ColBERT-350M keeps compact per-token embeddings for MaxSim late interaction.

Compared with LFM2.5-ColBERT-350M, this release uses the newer LFM2.5 checkpoint, expands language coverage, and adds explicit multilingual and cross-lingual retrieval training. It also introduces a companion bi-encoder built on the same backbone and recipe.

Training and Data

Both models follow the same three-stage training recipe: (1) large-scale contrastive pretraining in English, (2) multilingual and cross-lingual distillation from a strong teacher (across all 11 supported languages), and (3) final fine-tuning on hard-mined negatives. The staged structure was also inspired by LightOn’s LateOn and DenseOn release, which also separate broad contrastive pretraining from later specialization stages.

LFM2.5-Embedding-350M receives slightly more cross-lingual data than LFM2.5-ColBERT-350M, since cross-lingual retrieval emerges more naturally in the late-interaction setup and benefits less from additional supervision.

The training data combines curated internal data with open-source English retrieval datasets. We leverage LLM-based translation of queries and documents to expand multilingual and cross-lingual pairs used during the second and third training phases.

Benchmarks

We report fine-grained benchmark results across all 11 supported languages: Arabic, German, English, Spanish, French, Italian, Japanese, Korean, Norwegian, Portuguese, and Swedish. Our evaluation focuses on two capabilities: multilingual retrieval with NanoBEIR, and cross-lingual open-domain QA with MKQA-11. Together, they test whether the models can retrieve relevant documents within a language and across language boundaries.

Overall, both LFM2.5-Embedding-350M and LFM2.5-ColBERT-350M show best-in-class multilingual and cross-lingual performance. Their results remain consistently competitive across all 11 supported languages, highlighting the robustness of the retrieval quality beyond English.

NanoBEIR Multilingual Extended (NDCG@10)
Model AVG ar de en es fr it ja ko no pt sv
LiquidAI/LFM2.5-ColBERT-350M colbert 0.605 0.551 0.606 0.687 0.607 0.622 0.606 0.614 0.590 0.570 0.613 0.586
LiquidAI/LFM2.5-Embedding-350M bienc 0.577 0.529 0.581 0.644 0.581 0.592 0.583 0.575 0.563 0.557 0.581 0.566
Qwen/Qwen3-Embedding-0.6B bienc 0.556 0.514 0.560 0.649 0.568 0.565 0.565 0.551 0.530 0.516 0.571 0.525
LiquidAI/LFM2-ColBERT-350M colbert 0.540 0.491 0.563 0.661 0.563 0.564 0.543 0.557 0.527 0.449 0.547 0.480
Alibaba-NLP/gte-multilingual-base bienc 0.528 0.477 0.523 0.624 0.537 0.542 0.528 0.511 0.494 0.516 0.534 0.526
lightonai/GTE-ModernColBERT-v1 colbert 0.489 0.309 0.499 0.680 0.525 0.546 0.516 0.459 0.368 0.465 0.530 0.483
lightonai/LateOn colbert 0.484 0.307 0.505 0.690 0.531 0.537 0.514 0.442 0.326 0.465 0.533 0.475
lightonai/DenseOn bienc 0.432 0.178 0.474 0.676 0.496 0.520 0.487 0.378 0.197 0.422 0.493 0.433
Alibaba-NLP/gte-modernbert-base bienc 0.383 0.112 0.449 0.666 0.448 0.475 0.408 0.275 0.180 0.376 0.431 0.391
BAAI/bge-large-en-v1.5 bienc 0.359 0.059 0.419 0.642 0.445 0.475 0.431 0.198 0.132 0.358 0.434 0.353

Table 1: Per-language NanoBEIR Multilingual Extended (NDCG@10). Best score per model type in bold.

MKQA-11 (Recall@20)
Model AVG ar de en es fr it ja ko no pt sv
LiquidAI/LFM2.5-ColBERT-350M colbert 0.694 0.608 0.709 0.748 0.711 0.715 0.707 0.703 0.640 0.689 0.703 0.700
LiquidAI/LFM2.5-Embedding-350M bienc 0.691 0.610 0.709 0.738 0.708 0.715 0.703 0.685 0.630 0.691 0.710 0.708
Alibaba-NLP/gte-multilingual-base bienc 0.675 0.567 0.692 0.741 0.705 0.703 0.697 0.655 0.563 0.698 0.700 0.699
LiquidAI/LFM2-ColBERT-350M colbert 0.646 0.554 0.696 0.754 0.711 0.710 0.667 0.658 0.558 0.541 0.669 0.589
Qwen/Qwen3-Embedding-0.6B bienc 0.638 0.520 0.671 0.723 0.678 0.672 0.671 0.635 0.543 0.620 0.667 0.620
lightonai/GTE-ModernColBERT-v1 colbert 0.459 0.092 0.532 0.754 0.552 0.615 0.510 0.275 0.166 0.503 0.524 0.524
lightonai/LateOn colbert 0.454 0.157 0.492 0.755 0.537 0.577 0.481 0.316 0.209 0.472 0.502 0.501
lightonai/DenseOn bienc 0.435 0.165 0.482 0.751 0.491 0.553 0.457 0.325 0.222 0.438 0.443 0.453
BAAI/bge-large-en-v1.5 bienc 0.413 0.133 0.471 0.748 0.450 0.531 0.461 0.208 0.172 0.456 0.443 0.467
Alibaba-NLP/gte-modernbert-base bienc 0.295 0.060 0.333 0.736 0.273 0.417 0.291 0.100 0.052 0.332 0.326 0.330

Table 2: Per-Language MKQA-11 (Recall@20). Best score per model type in bold.

Different from LightOn’s work, we find that NanoBEIR English provides a sufficient evaluation signal. Across the models we evaluated, NanoBEIR English and the more expensive full BEIR remain highly correlated, with NanoBEIR scoring a near-constant ~15% higher. We therefore use NanoBEIR as a practical proxy for full BEIR when iterating across training runs.

Inference

We evaluate end-to-end latency in the retrieval regimes that matter in practice: query embedding with cached documents, query embedding plus MaxSim, and query embedding plus document embedding plus MaxSim when documents are not cached.

For portable deployment, we release LFM2.5-ColBERT-350M-GGUF and LFM2.5-Embedding-350M-GGUF for llama.cpp, so the models can run nearly anywhere (CPUs, laptops, and edge devices) at near-zero cost and with compelling latency.

Model Setup Docs Cache p50 p95
LFM2.5-Embedding-350M Query embedding yes 7.3 ms 9.6 ms
LFM2.5-ColBERT-350M Query embedding yes 8.1 ms 8.5 ms
LFM2.5-ColBERT-350M Query embedding + MaxSim yes 8.2 ms 15.2 ms
LFM2.5-ColBERT-350M Query + Doc embedding + MaxSim no 34.3 ms 36.3 ms

Table 3: llama.cpp end-to-end latency on MacBook M4 Max at FP16. 32 tokens query, 256 tokens document.

For large-scale production-grade enterprise deployments, we also develop an internal GPU stack to deliver extremely low-latency serving under high inbound load.

Workload Setup p50 p95 p99
LFM2.5-Embedding-350M Query embedding 1.5 1.6 1.7
LFM2.5-ColBERT-350M Query embedding 1.3 1.4 1.5
LFM2.5-ColBERT-350M Query embedding + MaxSim 2.5 2.7 2.8
LFM2.5-ColBERT-350M Query + Doc Embedding + MaxSim 22.8 24.1 26.4

Table 4: Internal inference stack, end-to-end latency on H100 at FP16. 32 tokens query, 256 tokens document.

Training your own

While these models perform strongly out of the box, we encourage you to fine-tune either model on your own data for domain-specific retrieval. We especially recommend this for LFM2.5-Embedding-350M, for which our Hugging Face model card provides simple fine-tuning snippets with sentence-transformers.

Get Started 

The LFM2.5-ColBERT-350M and LFM2.5-Embedding-350M models are available today on Hugging Face. For teams looking to deploy retrieval at enterprise scale, contact us to learn more.

Citation

Please cite this article as:

Liquid AI, "LFM2.5 Retrievers: Bi-directional LFMs for Fast Multilingual Search", Liquid AI Blog, Jun 2026.

Or use the BibTeX citation:

@article{liquidAI2026Retrievers,
  author = {Liquid AI},
  title = {LFM2.5 Retrievers: Bi-directional LFMs for Fast Multilingual Search},
  journal = {Liquid AI Blog},
  year = {2026},
  note = {www.liquid.ai/blog/lfm2-5-retrievers},
}
Ready to experience AI?

Power your business, workflows, and engineers with Liquid AI.

Manage your preferences

We use cookies to enhance your browsing experience and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

Learn more about our Privacy Policy
  • Essential cookies required