NewsResearch

News and Updates

03.26.2026
The Key to State Reduction in Linear Attention: A Rank-based Perspective
By Philipp Nazari and T. Konstantin Rusch
Linear attention offers a computationally efficient yet expressive alternative to softmax attention, maintaining a recurrent state that functions as a linear associative memory. However, recent empirical results indicate that the associative memory of trained linear attention models often exhibits a low-rank structure, suggesting that these models underexploit their capacity in practice. To illuminate this phenomenon, we provide a theoretical analysis of the role of rank in linear attention, revealing that low effective rank can affect retrieval error by amplifying query noise, as well as poorly condition query gradients.
03.02.2026
AlphaQ: Calibration-Free Bit Allocation for Mixture-of-Experts Quantization
By Wanqi Yang, Yuexiao Ma, Alexander Conzelmann, Xiawu Zheng, Michael W. Mahoney, and 2 others
Mixture-of-Experts (MoE) architectures scale computation via sparse expert activations, yet they remain memory-bound because all expert weights must reside in memory. Mixed-precision quantization can substantially reduce this footprint, but existing quantization methods estimate expert importance and assign bits based on calibration data. For frontier MoE LLMs, however, the original training data (and thus its true training distribution) is proprietary and inaccessible. Thus, any calibration set is at best a surrogate and may yield a biased or incomplete view of expert utilization, leading to suboptimal bit allocation. To address these problems, we propose AlphaQ, a novel calibration-free bit-allocation method for MoE quantization.
03.02.2026
Low-Pass Flow Matching
By Francesco M. Ruscio and T. Konstantin Rusch
Flow Matching typically relies on white noise sources, a choice often misaligned with the power spectra of natural data, which tend to decay with frequency. To address this, we introduce , a variant of Flow Matching based on an operator-modulated interpolant. This formulation induces a time-varying spectral bias that transitions from the source spectrum to a frequency-decaying bias as the path approaches the data. We validate our method on unconditional image generation tasks, including the scientific Galaxy10 dataset. Empirically, we show that our method is particularly effective when paired with adaptive ODE solvers, where it improves or preserves sample quality while substantially reducing sampling cost compared to standard baselines.
01.26.2026
Zero-Overhead Introspection for Adaptive Test-Time Compute
By Rohin Manvi, Joey Hong, Tim Seyde, Maxime Labonne, Mathias Lechner, and 1 other
Large language models excel at reasoning but lack key aspects of introspection, including the ability to anticipate their own success and the computation required to achieve it. Humans use real-time introspection to decide how much effort to invest, when to make multiple attempts, when to stop, and when to signal success or failure. Without this ability, LLMs struggle to make intelligent meta-cognition decisions. Test-time scaling methods such as Best-of-N drive up cost and latency by using a fixed budget of samples regardless of the marginal benefit of each one at any point in generation, and the absence of confidence signals can mislead people, prevent appropriate escalation to better tools, and undermine trustworthiness.
01.26.2026
Learning residue level protein dynamics with multiscale Gaussians
By Mihir Bafna, Bowen Jing, and Bonnie Berger
Many methods have been developed to predict static protein structures, however understanding the dynamics of protein structure is essential for elucidating biological function. While molecular dynamics (MD) simulations remain the in silico gold standard, its high computational cost limits scalability. We present DynaProt, a lightweight, SE(3)-invariant framework that predicts rich descriptors of protein dynamics directly from static structures.
01.26.2026
Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models
By Samuel J Paech, Allen G Roush, Judah Goldfeder, and Ravid Shwartz-Ziv
Repetitive lexical patterns in LLM output, termed "slop," degrade writing quality through over-use and make AI-generated text immediately recognizable. We present Antislop, a comprehensive framework providing tools to both detect and eliminate these overused patterns. Our approach combines three innovations: (1) The Antislop Sampler, which uses backtracking to suppress unwanted strings at inference time without destroying vocabulary. (2) An automated pipeline that profiles model-specific slop against human baselines and generates training data. and, (3) Final Token Preference Optimization (FTPO), a novel fine-tuning method that operates in logit-space on individual tokens, surgically adjusting logits wherever a banned pattern has appeared in an inference trace.
01.26.2026
The Curious Case of In-Training Compression of State Space Models
By Makram Chahine, Philipp Nazari, Daniela Rus, and T. Konstantin Rusch
State Space Models (SSMs), developed to tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference. At their core are recurrent dynamical systems that maintain a hidden state, with update costs scaling with the state dimension. A key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden. Control theory, and more specifically Hankel singular value analysis, provides a potent framework for the measure of energy for each state, as well as the balanced truncation of the original system down to a smaller representation with performance guarantees.
12.01.2025
LFM2 Technical Report
By Alexander Amini, Anna Banaszak, Harold Benoit, Arthur Böök, Tarek Dakhran, and 28 others
We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a small number of grouped query attention blocks, delivering up to 2x faster prefill and decode on CPUs compared to similarly sized models.
09.25.2025
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
By Kohsei Matsutani, Shota Takashiro, Gouki Minegishi, Takeshi Kojima, Yusuke Iwasawa, and 1 other
Large language models (LLMs) are typically trained by reinforcement learning (RL) with verifiable rewards (RLVR) and supervised fine-tuning (SFT) on reasoning traces to improve their reasoning abilities. However, how these methods shape reasoning capabilities remains largely elusive. Going beyond an accuracy-based investigation of how these two components sculpt the reasoning process, this paper introduces a novel analysis framework that quantifies reasoning paths and captures their qualitative changes under each training process (with models of 1.5B, 7B, and 14B parameters on mathematical domains).
06.24.2025
LFM-1B-Math: Can Small Models Be Concise Reasoners?
By Tim Seyde, Rohin Manvi, Maxime Labonne, and Liquid AI Team
We share insights based on our exploration of reasoning recipes for small language models, with the example of LFM-1.3B. We demonstrate how a general chat model, without any math-specific pre-training, can acquire concise reasoning abilities through a combination of extensive fine-tuning and short-horizon reinforcement learning. By sharing our findings with our model, pre-training mix, and architecture, we aim to broaden the understanding of how to effectively instill reasoning abilities into small language models suitable for edge deployment.
06.13.2025
Exploring Diffusion Transformer Designs via Grafting
By Keshigeyan Chandrasegaran and Michael Poli
Designing model architectures is a core part of building modern AI systems, alongside data, algorithms, compute, and benchmarks. Model architecture defines a learnable function and involves key choices—such as which operators to use (e.g., attention, convolution) and how to configure them (e.g., model depth, width). Despite its critical role, insight into architectures—what works and what doesn’t—is difficult to obtain, due to the prohibitive cost of training models from scratch, especially in today’s foundation model era. As a result, exploring new architectures remains a major challenge, particularly for generative models.
04.28.2025
How Effectively Does a Model Use Its Memory
By Rom Parnichkun, Neehal Tumma, Stefano Massaroli, Michael Poli, Armin Thomas, and 1 other
Even with the same state/cache size, models can differ significantly in how well they utilize memory—impacting recall, compression, and trainability. We introduce Effective State-Size (ESS): A proxy metric for memory utilization.
04.25.2025
Convolutional Multi-Hybrids for Edge Devices
By Armin Thomas, Stefano Massaroli, Michael Poli, and Liquid Edge Team
Today, we introduce a Liquid architecture called Hyena Edge, a convolution-based multi-hybrid model that not only matches but outperforms strong Transformer-based baselines in computational efficiency and model quality on edge hardware, benchmarked on the Samsung S24 Ultra smartphone. To design Hyena Edge, we use our recently proposed end-to-end automated model design framework.
12.02.2024
Automated Architecture Synthesis via Targeted Evolution
Today, we report advances in automated neural network architecture design and customization. We developed algorithms for the synthesis of tailored architectures (STAR), based on evolutionary algorithms applied to a numerical representation for model architectures derived from a new design theory. STAR automates the process of architecture discovery and optimization, turning it into an end-to-end process. With these methods, we have been able to tailor architectures to custom tasks, metrics, and hardware. We used STAR to synthesize hundreds of different designs that outperform strong Transformer and hybrid architectures in quality, with smaller caches and number of parameters.
09.30.2024
From Liquid Neural Networks to Liquid Foundation Models
We invented liquid neural networks, a class of brain-inspired systems that can stay adaptable and robust to changes even after training [R. Hasani, PhD Thesis] [Lechner et al. Nature MI, 2020] [pdf] (2016-2020). We then analytically and experimentally showed they are universal approximators [Hasani et al. AAAI, 2021], expressive continuous-time machine learning systems for sequential data [Hasani et al. AAAI, 2021] [Hasani et al. Nature MI, 2022], parameter efficient in learning new skills [Lechner et al. Nature MI, 2020] [pdf], causal and interpretable [Vorbach et al. NeurIPS, 2021] [Chahine et al. Science Robotics 2023] [pdf], and when linearized they can efficiently model very long-term dependencies in sequential data [Hasani et al. ICLR 2023].