llama.cpp AI working in a jail using Nvidia GPU

Hello,

Previously I tried to use Ollama on FreeBSD with my Nvidia GPU and it did not work, it uses CPU only, on the host or in a jail.
Recently I have been using llama.cpp (version 8182, 28 Feb 2026) in a jail with Nvidia GPU support without issue, I will dump some
notes for experienced users that might need some help to setup llama.cpp with a Nvidia GPU inside a jail.

I am not going into deep detail, this is not a guide.

My host:
- FreeBSD 15
- GPU Nvidia 3060 12gb

I am using BastilleBSD as my jail manager but you can use any that you want.
Install and setup the official Nvidia driver on your host.

Code:
pkg install nvidia-driver

Now there are a few things you need to do.

1. You need to edit /etc/devfs.rules on the host and unhide your nvidia settings.

/etc/devfs.rules
Code:
# Allow GPU access
[bastille_gpu=14]
add include $devfsrules_hide_all
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path 'nvidia*' unhide
add path 'dri*' unhide
add path 'drm*' unhide

2. The llama.cpp jail needs to be a THICK jail, this won't work on a thin jail.
3. Make sure the new jail is using the proper devfs rules:

jail.conf
Code:
devfs_ruleset = 14

Boot up your new jaill and install llama.cpp and nvidia drivers:

Code:
pkg install -y llama-cpp nvidia-driver

Add llama.cpp service config to rc.conf. You can adjust these as you need, obviously you need to download your own .gguf model.

Code:
sysrc llama_server_enable="YES"
sysrc llama_server_user="ai-user"
sysrc llama_server_args="--port 11434"
sysrc llama_server_model="/usr/home/ai-user/qwen2.5-coder-7b.gguf"

Start llama.cpp, if it silently fails check the permissions of the .gguf file and make sure they are correct.

Code:
service llama-server start

To make sure the model is using my Nvidia 3060, I monitor the GPU usage with this command (from host or jail):

Code:
nvidia-smi --loop=1

You should see ram usage and when using the model, gpu/power usage.

llama.cpp comes with it's own web ui, you can connect to 'http://yourip:11434' to chat with the model.

I use nginx on my host to proxy the AI web interface over ssl to my network.

I hope this is useful to someone, google searchs do not come up with much info on this topic.
 
Back
Top