
Faster and more accurate function calling on the edge with .txt’s structured outputs and Liquid Foundation Models
Introduction
In practical edge applications like smart home systems, industrial IoT sensors, or mobile assistants, we need models to generate function calls as accurately and quickly as possible. These devices face three critical challenges:
- Limited processing power - The iPhone 15 Pro achieves 2.15 TFLOPS (FP32), which is 31 times less than a single H200 GPU
- Latency requirements - Users expect near-instant responses (less than 300 ms)
- Reliability demands - Function calls must work correctly the first time to meet the latency requirements.
Traditional approaches to function calling with LLMs often fail to meet these constraints. Models are too large, generation is too slow, and outputs can be inconsistent. This creates a significant barrier to deploying AI assistants that need to interact reliably with hardware and software systems at the edge.
The Solution: LFM2-350M + dotgrammar
LFM2-350M: AI That Fits on the Edge
LFM2-350M is an LLM developed by Liquid AI specifically for high-performance edge deployment with a custom lightweight architecture:
- Minimal footprint: It uses less than 1 GB of RAM without quantization, with minimal increase with long inputs
- Responsive performance: Delivers sub-100ms inference times on common edge hardware
- High quality: Achieves performance in terms of knowledge and reasoning on par with significantly larger models.
dotgrammar: Grammar-Based Generation
dotgrammar is a library developed by .txt for high-performance structured outputs using context-free grammars (CFGs). Complementing LFM2-350M's efficiency, dotgrammar ensures reliable function calling through:
- CFG constraints: It guarantees syntactically valid outputs every time
- Token-efficient function formats: Reducing generation time and improving responsiveness with Pythonic function calls.
- Zero runtime overhead: It enforces constraints without increasing inference latency
Let's explore how these technologies work together.
What is function calling?
Function calling is a technique where language models output structured function calls rather than natural language text. It enables AI assistants to interact with external tools, APIs, and hardware in a reliable, programmatic way.
For instance, if you want a model to control a media player, you need it to generate precise commands that your application can parse and execute:
User: "Play Spotify at high volume"
AI: play_media(device="speaker", source="spotify", volume=1)
This approach effectively makes LLMs speak your application's native language, bridging the gap between natural language understanding and actionable commands.
The traditional JSON approach
Most LLM providers implement function calling using JSON objects. While functional, this approach creates significant inefficiencies for edge deployment:
{
"name": "play_media",
"parameters": {
"device": "speaker",
"source": "spotify",
"volume": 1
}
This JSON representation requires 37 tokens to generate, where each token increases latency. Moreover, LLMs are not reliable when it comes to generating properly formatted nested structures, such as JSON.
The pythonic alternative
By contrast, a more compact Pythonic representation:
play_media(device="speaker", source="spotify", volume=1)
Requires just 14 tokens. It’s a 2.6x reduction that directly translates to faster generation times without sacrificing expressivity. Because LLMs are trained on a lot of code, it’s also a more reliable and natural approach to implement function calling.
The need for structured generation
Despite this training, LLMs can still make mistakes. These errors become especially problematic in edge applications where:
- Parameters have strict constraints (e.g., volume must be between 0-1)
- Parameter combinations must follow business logic (e.g., "speaker" and "netflix" is an invalid combination)
- Latency requirements leave no room for error handling and retries
For truly reliable function calls on the edge, we need structured generation that enforces output format in a deterministic way. This is where context-free grammars (CFGs) come in.
A context-free grammar (CFG) is a set of rules that defines exactly what combinations of characters or words are allowed, creating a "railroad track" that generation must follow.
Think of a CFG like a system for composing music with clear rules. Technically, a CFG consists of:
- Terminal symbols: The actual characters that appear in the final output
- Non-terminal symbols: Variables that get replaced according to rules
- Production rules: Instructions for replacing non-terminals with terminals or other non-terminals
For example, this grammar ensures correct play_media function calls:
Call ::= "play_media" "(" Arguments ")"
Arguments ::= "device=\"" Device "\"" ", " "source=\"" Source "\"" ", " "volume=" Volume
Device ::= "speaker" | "tv"
Source ::= "spotify" | "netflix" | "youtube"
Volume ::= "0" | "0.1" | "0.2" | ... | "1.0"
In this example:
- Terminal symbols include characters like "play_media", "(", and ")"
- Non-terminal symbols include Call, Arguments, Device, etc.
- The symbol "::=" means "can be replaced with"
- The symbol "|" means "OR" (indicating a choice)
Grammar-based structured generation provides the following guarantees:
- Security: Prevents injection attacks by strictly limiting what can be generated
- Reliability: Guarantees parseable output
- Validation: Enforces business rules at the generation stage
Demo: Smart Home Control
Let's see how LFM2-350M and dotgrammar work together in a practical edge application: a smart home assistant running locally on a hub device.
In this use case, the model will take in user queries and perform actions to set the state of various systems at home, like dimming lights, closing blinds, and playing media. Some functions like <code>`theater_mode()`</code> don’t require any arguments, but others like <code>`set_display()`</code> need more fine-grained control.
We will implement this using .txt’s dotgrammar library, which provides latency-free grammar-based structured generation.
Example 1: Simple function with unconstrained arguments
Let’s say a user just finished watching a movie, and they want to create a note about it. The smart home might have access to a <code>`save_note(str)`</code> function, which takes an arbitrary string as input.
In this case, we still want to constrain the output to ensure the function call is properly formatted, but we do not want to constrain the argument. This can be achieved in dotgrammar like this:
note_grammar = """
?start: "save_note(" UNESCAPED_STRING ")"
%import common.UNESCAPED_STRING
"""
Example 2: One function with constrained arguments
In many cases, the arguments will be constrained. For instance, a <code>`set_display()`</code> function can be called for one specific device in a list, like <code>‘tv’</code> and <code>‘projector’</code>. In Python, we indicate this via string literals. In a CFG, we enforce this with pipes (“|”).
Here’s what such a grammar could look like for a function with type signature <code>`set_display(screen: Literal[‘tv’, ‘projector’])`</code>:
display_grammar = """
?start: "set_display(device = " device ")"
?device: "'tv'" | "'projector'"
"""
Note the double quotations wrapping <code>‘tv’</code> and <code>‘projector’</code>: the outer double quotations indicate that everything inside should be treated as a constant in the CFG. The inner single quotations indicate that it should be a string in the Python function call.
A similar effect can be achieved for Boolean arguments. Let’s add an argument “night_mode” to control how bright to set the display. For this new version of the function with signature <code>`set_display(screen: Literal[‘tv’, ‘projector’], night_mode: bool)`</code>, we’ll generalize our notation a bit:
display_grammar = """
?start: "set_display(" arguments )"
?arguments: "device = " device ", night_mode = " night_mode
?device: "'tv'" | "'projector'"
?night_mode: "True" | "False"
"""
Example 3: One function with mixed constraints
Constrained generation also works with optional arguments. Suppose the smart home can close the blinds, either entirely or some fraction of the way. By default, calling the function <code>`close_blinds()`</code> closes the blinds completely, but the fraction can be controlled by an optional argument <code>`percentage`</code>, which can take values 0 through 100. We can create a grammar for this function with signature <code>`close_blinds(percentage: int = 100)`</code> (although we would need something like a Pydantic schema to specify bounds on the integer value) via:
percentages = list(range(101))
percentages_options = ' | '.join(percentages)
blinds_grammar = f"""
?start: "close_blinds(" arguments )"
?arguments: ("percentage = " percentage)
?percentage: {percentages_options}
"""
Note how the parentheses on the second-to-last line are not wrapped in quotation marks. In the CFG, this marks the content inside these parentheses as optional. Also note that we have enumerated all allowed values for <code>`percentages`</code> because the cardinality of the set is relatively small (101 items). In practice, it is not always possible to perfectly constrain numerical values with CFGs, but they can be constrained when the specific values can be enumerated, or when you only care about the number of digits.
Example 4: Multiple functions
Thus far, we’ve treated each function and its grammar in isolation, but as we mentioned above, our smart home AI assistant would likely need access to all of these systems simultaneously. Each of the functions we’ve discussed could be treated as a tool. We can write a grammar that encapsulates a set of tools by adding some lightweight logic to connect the dots. If we wanted to allow just one function call at a time, we could pipe the function options:
percentages = [str(i) for i in range(101)]
percentages_options = ' | '.join(percentages)
one_function_grammar = f"""
?start: "<|tool_call_start|>[" function "]<|tool_call_end|>"
?function: save_note | set_display | close_blinds
?save_note: "save_note(" UNESCAPED_STRING ")"
?set_display: "set_display(" display_arguments ")"
?display_arguments: "device = " device ", night_mode = " night_mode
?device: "'tv'" | "'projector'"
?night_mode: "True" | "False"
?close_blinds: "close_blinds(" blinds_arguments ")"
?blinds_arguments: "percentage = " percentage
?percentage: {percentages_options}
%import common.UNESCAPED_STRING
"""
We can also accommodate multi-function-call responses in dotgrammar. While we’re at it, let’s wrap the (potentially multiple) function calls in begin <code>`<|tool_call_start|>`</code> and end <code>`<|tool_call_end|>`</code> tokens to simplify parsing:
percentages = [str(i) for i in range(101)]
percentages_options = ' | '.join(percentages)
multi_function_grammar = f"""
?start: "<|tool_call_start|>[" tool_calls "]<|tool_call_end|>"
?tool_calls: (function ", ")* function
?function: save_note | set_display | close_blinds
?save_note: "save_note(" UNESCAPED_STRING ")"
?set_display: "set_display(" display_arguments ")"
?display_arguments: "device = " device ", night_mode = " night_mode
?device: "'tv'" | "'projector'"
?night_mode: "True" | "False"
?close_blinds: "close_blinds(" blinds_arguments ")"
?blinds_arguments: ("percentage = " percentage)
?percentage: {percentages_options}
%import common.UNESCAPED_STRING
"""
The line <code>`?tool_calls: (function ", ")* function`</code> uses parentheses in the same way as we did for optional arguments earlier, with the “*” specifying that this pattern can appear multiple times.
Results
Let's see this system in action with two real-world examples. First, let’s ask the model to prepare a movie night setup by the projector display to night mode. To run LFM2-350M with dotgrammar, we only need a few lines:
import outlines
from transformers import AutoModelForCausalLM, AutoTokenizer
from outlines.types import CFG
MODEL_NAME = "LiquidAI/LFM2-350M"
model = outlines.from_transformers(
AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto"),
AutoTokenizer.from_pretrained(MODEL_NAME)
)
PROMPT = "It's movie time! Set the projector to night mode."
result = model(
PROMPT,
CFG(GRAMMAR),
max_new_tokens=64
)
Here is the output from the model, stored in <code>`sequence`</code>:
<|tool_call_start|>[set_display(device = 'projector', night_mode = True)]<|tool_call_end|>
Here, the model correctly called a list of two functions <code>`close_blinds(percentage = 90)`</code> and <code>`set_display(device=’projector’, night_mode=True)`</code>, as requested.
Now, let’s try a more free-form example. We’ll ask the model to create a note about a movie, but we won’t explicitly dictate what should be in the note. The model has to figure it out based on the context:
PROMPT = "Create a reminder for me to look into other movies by this person."
result = model(
PROMPT,
CFG(GRAMMAR),
max_new_tokens=64
)
This returns the following output:
<|tool_call_start|>[save_note("Please remember to check other movies by this person!")]<|tool_call_end|>
The model called the appropriate function and managed to summarize the message conveyed by the user.
Conclusion: Edge AI that just works
By combining LFM2-350M with .txt's structured outputs, we've created a solution that makes function calling on edge devices not just possible, but also practical:
- Ultra-efficient: Pythonic function calls use ~2.6x fewer tokens than JSON, making generation faster and less resource-intensive
- Completely reliable: Grammar-based structured generation ensures outputs match your application's requirements perfectly
- Edge-optimized: LFM2-350M is designed for on-device use. When paired with .txt's latency-free grammar constraints, it excels in edge applications.