Asking a Local LLM to Calculate Car Travel Costs

Posted on 11 June, 2026 at 16:24 UTC by strk

I asked a locally-downloaded LLM to compute the cost of a car trip using natural language. No calls to any cloud service, just a model queried by an inference tool running on my own machine.

The Setup

The machine used for this excercise is a LemurPro laptop from around 2020, has a 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz CPU and 16GB of RAM. No GPU.

The Procedure

I downloaded the unsloth/gemma-4-E4B-it-qat-UD-Q4_K_XL model from Hugging Face - about 4GB in size.
I built and installed llama-server from the llama.cpp project - version: 9570 (3ac3c20c9).
I ran llama-server -m gemma-4-E4B-it-qat-UD-Q4_K_XL.gguf
I built and installed ferrum from the Ferrum project - version 0.4.17.
I configured ferrum to use the local llama as provider
I invoked: time ferrum -p "<prompt>"

The Prompt (in Italian)

ho percorso 165.5km avendo un consumo medio di 5.9L/100 - il carburante costa 1.99€ al litro, quanto ho speso?

Translation: “I drove 165.5km with an average consumption of 5.9L/100 - fuel costs 1.99€ per liter, how much did I spend?”

The Response

The model answered:

Ho speso circa 19,43 €

Translation: “I spent about 19,43€”

And then provided a few details on how it computed that value.

I cross-checked with a specialized online service that calculates travel costs given start/destination addresses, and it gave the same figure (about 20 €). The model got it right.

Timing

real: 1m43.607s
user: 0m0.029s
sys:  0m0.005s

Note that the user and sys times don’t make much sense as the timed process was the client while most of the work was done by the server ( the inference tool ).

This experience shows:

That we can control the inference tool (llama.cpp is Free Software, distributed under the MIT License).
What timings we can expect with a 6-years old CPU-only laptop and a freely available LLM model.
How natural language (even if non-english) can be used to interact with the model.

What’s still to be found out:

I still don’t know how to control the LLM model itself

Running the same prompt a second time might produce a different output (this may be obtainable, although at this stage I still don’t know how to do it).
The output we get depends on the “procedure” used to create the LLM, that procedure is opaque to us (what is the source code of an LLM model?)

Comment welcome on the Fediverse: https://floss.social/@strk/116732586142201548