Report on running LLM inference on FreeBSD

This week-end, I installed FreeBSD on one of my homelabs and tried to run llama.cpp inferences on it.

TL;DR: Almost there! Actually, with recent GPU, it may already be usable.

My Hardware

I have two homelabs with strictly identical hardware so it makes comparing with Linux easy:

- MZ01-CE1 mobo
- AMD EPYC 7601 processor
- 4x Nvidia P40 GPUs
- 64GB DDR4 RAM
- 1TB M.2 SSD

My Software

My tests have been made with a Llama-3.3 model, 70B Instruct, quantized as IQ4 NL, in GGUF format.

More importantly, I'm using llama.cpp as inference engine. This is a stable C++ engine, allowing to avoid having to deal with the ML Python ecosystem and their dependency graphs breaking every other day.

The Linux box is a Gentoo, with nvidia-drivers-580.95.05, and using nvidia-cuda-toolkit-12.9.0.

On FreeBSD, I'm using the same model, and the same inference engine, but built with Vulkan support instead of CUDA.

The nvidia driver is nvidia-driver-580.119.02_1, and the Vulkan libs are vulkan-headers-1.4.336, vulkan-loader-1.4.336 and shaderc-2025.5_1. The only other specific thing I had to do was to build llama.cpp with -DGGML_VULKAN=1 instead of -DGGML_CUDA=1.

The Result

It works! I could run inferences on pure FreeBSD. Inference was a bit slower, averaging on 5.1 tokens per second, against 5.7 tokens per second on Linux/CUDA with the same prompts, so that's about 10% slower. I could live with that.

… but the deal breaker was prompt processing. It's about 10 times slower with Vulkan. This is not going to work for me, because one of my most common tasks is feeding the model legal documents, RFCs, DND campaign logs, etc and asking it questions about it. And there, it grinds so slow that it's not practical.

Asked about it, Gemini tells me it's a Pascal problem (the architecture of the P40 cards), and that more recent cards don't have it. I won't take its word for it, but I guess we'll see when I'll be rich enough to buy something else.

Still, it means that if you don't need big prompts and / or have more recent hardware, pure FreeBSD inferences may just work.

For me… it's time to learn about Linuxator. Apparently, I'm going to install a Ubuntu. Which is quite a fail for me, migrating from Gentoo to FreeBSD to have a less agitated system. 😂

PS: not sure in which category of the forum I should put this post, none really matches… Server would be the closest one, I guess, since homelabs are usually providing inferences for other machines on the local network. But then none of the subcategories matches. So I put it in off-topic.

Edit: Oh wow. So Linuxator is a translation layer implementing Linux syscall API and converting them to FreeBSD kernel syscalls, am I reading that correctly? That's… some WINE-level amount of work, both initially and at each kernel update, I imagine. Thanks a lot to FreeBSD developers for doing it.

Also, I'm totally installing a Gentoo in that compat directory. :P The chroot method is basically begging for it. But let's install a Ubuntu first to figure out everything there is to figure out in the documented way.
 
Let us know how it goes. It has been some time since somebody had CUDA running through the Linuxulator.

Good thinking about Vulkan instead of CUDA. I'll have to try that.
 
That's… some WINE-level amount of work

No, not exactly. Wine reconciliates two completely incompatible OSes.
Linuxulator is one of many Unix-on-Unix compatibility layers. Early FreeBSD had System V Release 4 compatibility as far as I remember. Linux also had a number of binary-compatibility layers, as did commercial Unix.

For example all Unix OS use system calls, but they don't completely follow the naming or the slot numbering, so the layer will translate in between.

Wine is the biggest-scoped reimplementation ever, I think nothing comes close.

Some time ago I took a box of old SCO OpenDesktop system that claims Windows compatibility. The hardware support for then-standard PC devices I found staggering. It's all supported, video, sound, networking. They have a 3rd party DOS on Unix emulator and a ton of scaffolding around it to enable running Windows 95 apps on the same X desktop as native. It's all on Motif "user friendly" GUI. The graphics capability of their DOS box I found excellent for the software of the age - it was able to run graphics as well as early 486 on a Pentium host.

Take into account all the code SCO and 3rd party made for that purpose, add all code of DOS and Windows 95.
I don't think it even touches 10% of the scope of the Wine project.
 
Back
Top