LLM models on FreeBSD

Colleagues, could you please tell me if it's currently possible to use the FreeBSD platform to effectively deploy LLM models? That is, use the GPU not through some emulators or hacks, but with the help of native code?
It seems to me that there is a consensus in the information sources that FreeBSD is not suitable for use as an artificial intelligence server.

Is it really all is that bad?

Ogogon.
 
llama.cpp with Vulkan on NVidia's binary drivers works just fine for me. No Linuxulator involved.
Could you please clarify a few points based on your experience:
Toolchain: Do you run llama.cpp directly (as a server/CLI) or have you managed to build the Ollama wrapper on FreeBSD with Vulkan support?
Drivers: Are you using the standard x11/nvidia-driver from ports, or did you need a specific version/branch for stable Vulkan compute?
Performance: How does the performance (Tokens/sec) compare to Linux/CUDA for mid-sized models (like Llama 3 8B)? Is the overhead of Vulkan on FreeBSD significant?
Stability: Have you encountered any issues with shader compilation or memory management when using large context windows (e.g., 8k+ or 32k tokens)?
Hardware: Does FreeBSD correctly handle Re-Size BAR for NVIDIA cards in your setup, and is it mandatory for a smooth Vulkan experience?

I'm trying to decide if I should go the "FreeBSD + Vulkan" route or settle for a headless Linux distro. Any insights would be greatly appreciated.
 
Honestly I rather the FreeBSD community develop our own technologies in this space.

You probably can but the last time I checked GPU drivers aren’t much of interest @ FreeBSD afaik

Plug and play for Linux would be my best bet or fire up a virtual machine best of luck
 
Honestly I rather the FreeBSD community develop our own technologies in this space.

You probably can but the last time I checked GPU drivers aren’t much of interest @ FreeBSD afaik

Plug and play for Linux would be my best bet or fire up a virtual machine best of luck
It makes me wonder what the elementary operations of the process are. Is it binary instructions on a very wide register like GPU's have, and Nvidia keeps it proprietary and obscure? It must be possible to replace that with simple logic. Do we really need a 3d perspective projection machine to get any significant results?
 
Could you please clarify a few points based on your experience:
Toolchain: Do you run llama.cpp directly (as a server/CLI) or have you managed to build the Ollama wrapper on FreeBSD with Vulkan support?
Drivers: Are you using the standard x11/nvidia-driver from ports, or did you need a specific version/branch for stable Vulkan compute?
Performance: How does the performance (Tokens/sec) compare to Linux/CUDA for mid-sized models (like Llama 3 8B)? Is the overhead of Vulkan on FreeBSD significant?
Stability: Have you encountered any issues with shader compilation or memory management when using large context windows (e.g., 8k+ or 32k tokens)?
Hardware: Does FreeBSD correctly handle Re-Size BAR for NVIDIA cards in your setup, and is it mandatory for a smooth Vulkan experience?

I'm trying to decide if I should go the "FreeBSD + Vulkan" route or settle for a headless Linux distro. Any insights would be greatly appreciated.

llama.cpp from pkg. It is compiled wiith vulkan support.

NVidia drivers from ports. They have perfectly usable Vulkan.

On Linux Vulkan is about 8% slower than CUDA. FreeBSD/Vulkan is 4% slower than Linux Vulkan.

No problems observed. I run QWEN Quant 6 something with 64 k window size. Behaves the same as in Linux.

I didn't have to mess with the BAR and can use all of my 32 GB card.
 
It makes me wonder what the elementary operations of the process are. Is it binary instructions on a very wide register like GPU's have, and Nvidia keeps it proprietary and obscure? It must be possible to replace that with simple logic. Do we really need a 3d perspective projection machine to get any significant results?
Most if not all of these technologies are not backward compatible to iOS 15 except for 1 LLM that semi-works. I’ve tested it myself on an old Apple device

Cutting edge 3D graphics I think would be awesome for FreeBSD however for many other applications outside of this space
 
Back
Top