Hello,
Previously I tried to use Ollama on FreeBSD with my Nvidia GPU and it did not work, it uses CPU only, on the host or in a jail.
Recently I have been using llama.cpp (version 8182, 28 Feb 2026) in a jail with Nvidia GPU support without issue, I will dump some
notes for experienced users that might need some help to setup llama.cpp with a Nvidia GPU inside a jail.
I am not going into deep detail, this is not a guide.
My host:
- FreeBSD 15
- GPU Nvidia 3060 12gb
I am using BastilleBSD as my jail manager but you can use any that you want.
Install and setup the official Nvidia driver on your host.
Now there are a few things you need to do.
1. You need to edit /etc/devfs.rules on the host and unhide your nvidia settings.
/etc/devfs.rules
2. The llama.cpp jail needs to be a THICK jail, this won't work on a thin jail.
3. Make sure the new jail is using the proper devfs rules:
jail.conf
Boot up your new jaill and install llama.cpp and nvidia drivers:
Add llama.cpp service config to rc.conf. You can adjust these as you need, obviously you need to download your own .gguf model.
Start llama.cpp, if it silently fails check the permissions of the .gguf file and make sure they are correct.
To make sure the model is using my Nvidia 3060, I monitor the GPU usage with this command (from host or jail):
You should see ram usage and when using the model, gpu/power usage.
llama.cpp comes with it's own web ui, you can connect to 'http://yourip:11434' to chat with the model.
I use nginx on my host to proxy the AI web interface over ssl to my network.
I hope this is useful to someone, google searchs do not come up with much info on this topic.
Previously I tried to use Ollama on FreeBSD with my Nvidia GPU and it did not work, it uses CPU only, on the host or in a jail.
Recently I have been using llama.cpp (version 8182, 28 Feb 2026) in a jail with Nvidia GPU support without issue, I will dump some
notes for experienced users that might need some help to setup llama.cpp with a Nvidia GPU inside a jail.
I am not going into deep detail, this is not a guide.
My host:
- FreeBSD 15
- GPU Nvidia 3060 12gb
I am using BastilleBSD as my jail manager but you can use any that you want.
Install and setup the official Nvidia driver on your host.
Code:
pkg install nvidia-driver
Now there are a few things you need to do.
1. You need to edit /etc/devfs.rules on the host and unhide your nvidia settings.
/etc/devfs.rules
Code:
# Allow GPU access
[bastille_gpu=14]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path 'nvidia*' unhide
add path 'dri*' unhide
add path 'drm*' unhide
2. The llama.cpp jail needs to be a THICK jail, this won't work on a thin jail.
3. Make sure the new jail is using the proper devfs rules:
jail.conf
Code:
devfs_ruleset = 14
Boot up your new jaill and install llama.cpp and nvidia drivers:
Code:
pkg install -y llama-cpp nvidia-driver
Add llama.cpp service config to rc.conf. You can adjust these as you need, obviously you need to download your own .gguf model.
Code:
sysrc llama_server_enable="YES"
sysrc llama_server_user="ai-user"
sysrc llama_server_args="--port 11434"
sysrc llama_server_model="/usr/home/ai-user/qwen2.5-coder-7b.gguf"
Start llama.cpp, if it silently fails check the permissions of the .gguf file and make sure they are correct.
Code:
service llama-server start
To make sure the model is using my Nvidia 3060, I monitor the GPU usage with this command (from host or jail):
Code:
nvidia-smi --loop=1
You should see ram usage and when using the model, gpu/power usage.
llama.cpp comes with it's own web ui, you can connect to 'http://yourip:11434' to chat with the model.
I use nginx on my host to proxy the AI web interface over ssl to my network.
I hope this is useful to someone, google searchs do not come up with much info on this topic.