LLM models on FreeBSD

ogogon · Apr 15, 2026

Colleagues, could you please tell me if it's currently possible to use the FreeBSD platform to effectively deploy LLM models? That is, use the GPU not through some emulators or hacks, but with the help of native code?
It seems to me that there is a consensus in the information sources that FreeBSD is not suitable for use as an artificial intelligence server.

Is it really all is that bad?

Ogogon.

cracauer@ · Apr 15, 2026

llama.cpp with Vulkan on NVidia's binary drivers works just fine for me. No Linuxulator involved.

ogogon · Apr 15, 2026

cracauer@ said:
llama.cpp with Vulkan on NVidia's binary drivers works just fine for me. No Linuxulator involved.

Could you please clarify a few points based on your experience:
Toolchain: Do you run llama.cpp directly (as a server/CLI) or have you managed to build the Ollama wrapper on FreeBSD with Vulkan support?
Drivers: Are you using the standard x11/nvidia-driver from ports, or did you need a specific version/branch for stable Vulkan compute?
Performance: How does the performance (Tokens/sec) compare to Linux/CUDA for mid-sized models (like Llama 3 8B)? Is the overhead of Vulkan on FreeBSD significant?
Stability: Have you encountered any issues with shader compilation or memory management when using large context windows (e.g., 8k+ or 32k tokens)?
Hardware: Does FreeBSD correctly handle Re-Size BAR for NVIDIA cards in your setup, and is it mandatory for a smooth Vulkan experience?

I'm trying to decide if I should go the "FreeBSD + Vulkan" route or settle for a headless Linux distro. Any insights would be greatly appreciated.

rcbsdpge · Apr 15, 2026

Honestly I rather the FreeBSD community develop our own technologies in this space.

You probably can but the last time I checked GPU drivers aren’t much of interest @ FreeBSD afaik

Plug and play for Linux would be my best bet or fire up a virtual machine best of luck

MG · Apr 15, 2026

rcbsdpge said:
Honestly I rather the FreeBSD community develop our own technologies in this space.

You probably can but the last time I checked GPU drivers aren’t much of interest @ FreeBSD afaik

Plug and play for Linux would be my best bet or fire up a virtual machine best of luck

It makes me wonder what the elementary operations of the process are. Is it binary instructions on a very wide register like GPU's have, and Nvidia keeps it proprietary and obscure? It must be possible to replace that with simple logic. Do we really need a 3d perspective projection machine to get any significant results?

ogogon · Apr 15, 2026

rcbsdpge said:
Honestly I rather the FreeBSD community develop our own technologies in this space.

What exactly do you mean?

cracauer@ · Apr 15, 2026

ogogon said:
Could you please clarify a few points based on your experience:
Toolchain: Do you run llama.cpp directly (as a server/CLI) or have you managed to build the Ollama wrapper on FreeBSD with Vulkan support?
Drivers: Are you using the standard x11/nvidia-driver from ports, or did you need a specific version/branch for stable Vulkan compute?
Performance: How does the performance (Tokens/sec) compare to Linux/CUDA for mid-sized models (like Llama 3 8B)? Is the overhead of Vulkan on FreeBSD significant?
Stability: Have you encountered any issues with shader compilation or memory management when using large context windows (e.g., 8k+ or 32k tokens)?
Hardware: Does FreeBSD correctly handle Re-Size BAR for NVIDIA cards in your setup, and is it mandatory for a smooth Vulkan experience?

I'm trying to decide if I should go the "FreeBSD + Vulkan" route or settle for a headless Linux distro. Any insights would be greatly appreciated.

llama.cpp from pkg. It is compiled wiith vulkan support.

NVidia drivers from ports. They have perfectly usable Vulkan.

On Linux Vulkan is about 8% slower than CUDA. FreeBSD/Vulkan is 4% slower than Linux Vulkan.

No problems observed. I run QWEN Quant 6 something with 64 k window size. Behaves the same as in Linux.

I didn't have to mess with the BAR and can use all of my 32 GB card.

rcbsdpge · Apr 15, 2026

ogogon said:
What exactly do you mean?

Most LLM’s compile from Python with CPython -> C/C++ down to the metal.

Why not develop our own technologies in this space. Nothing cookie cutter. Real engineering applied.

rcbsdpge · Apr 15, 2026

MG said:
It makes me wonder what the elementary operations of the process are. Is it binary instructions on a very wide register like GPU's have, and Nvidia keeps it proprietary and obscure? It must be possible to replace that with simple logic. Do we really need a 3d perspective projection machine to get any significant results?

Most if not all of these technologies are not backward compatible to iOS 15 except for 1 LLM that semi-works. I’ve tested it myself on an old Apple device

Cutting edge 3D graphics I think would be awesome for FreeBSD however for many other applications outside of this space