general/other Cuda now working on Freebsd with a Rocky Linux Podman container

I now have Cuda working on Freebsd using a Rocky Linux Podman container

This is the python command run inside the Podman container

python3 -c "import torch; print('CUDA Available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')

And here's the output

CUDA Available: True
Device: NVIDIA GeForce GTX 1650

Output of nvidia-smi in the Podman container

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 595.58.03 Driver Version: 595.58.03 CUDA Version: 13.2 |
+-----------------------------------------+------------------------+----------------------+


I just installed Whisperx which uses Cuda in a Rocky Linux Podman container for the audio transcription

Heres the crazy part there is no Nvidia driver installed in the Podman container

i map the Linuxulator libraries from Freebsd into the container,
use devfs.rules and set up matching groups and ids so the container is running with exact same user and permissions as on the Freebsd host

as well as some other tricks to get it all working

And persistent storage as well, so files are stored on the Freebsd host and arent wiped when you restart the container

Im using Rocky Linux 9.3 which is the same version as in the Linuxulator so we have maximum compatibility


So far i have in the containers i have the following working

1) ffmpeg nvenv encoding

2) Firefox with widevine for drm playback with hardware accelerated video
wayland and pulseaudio sockets mounted from Freebsd into the container to create the window and audio with zero latency
downloads directory mounted from Freebsd inside the container

3) Whisperx using Cuda for audio transcription with speaker diarization


So the upshot of this is we now have Cuda on Freebsd using Podman

I can install any command line or gui application in a Rocky Linux Podman container
Including python applications that use torch


You can see the thread here which is about getting Firefox working widevine for drm playback in a Podman container



Im created a github repo for the podman containers

So im writing the theme tune and singing the theme tune
Little Britain reference

Screenshot of the subtitles created with Whisperx


input-[00:01:37.000]v1.jpg
 
I only have a GTX 1650 Nvidia card with 4gb of vram

But have managed to run the medium whisper model

Code:
whisperx input.wav \
  --device cuda \
  --model medium \
  --compute_type int8 \
  --batch_size 4 \
  --threads 1 \
  --diarize \
  --highlight_words True \
  --language en

which is more accurate than the small model

[SPEAKER_01]: Mr. Macbeth is a naughty man.
[SPEAKER_01]: Do, do, do, do, do.
[SPEAKER_01]: He gone and killed another man.
[SPEAKER_01]: Do, do, do, do.
[SPEAKER_01]: I hath a good idea.
[SPEAKER_01]: Just thou keep me near, I'll be so good for the Scottish play.
 
Can you elaborate on that?

I use an env file to map the username and userid from the Freebsd host to the Podman container
So the container runs as the same user as on Freebsd and is in the same groups

i also set the HOST_DBUS_SESSION_BUS_ADDRESS from Freebsd as an env
which provides access to dbus so things like desktop notifications work in firefox running in a podman container

devfs.rules to access the gpu

Podman containers are actually oci jails which use devfs_jail=4

Code:
[localrules=5]
add path 'da*' mode 0660 group operator
add path 'dri/*' mode 0660 group video
add path 'drm/*' mode 0660 group video
add path 'input/*' mode 0660 group video
add path 'input/event*' mode 0660 group video
add path 'nvidia*' mode 0660 group video

[devfsrules_jail=4]
add include $devfsrules_unhide_basic
add include $devfsrules_unhide_login
add path 'mixer*' unhide
add path 'dsp*' unhide
add path 'dri*' unhide
add path 'drm*' unhide
add path 'nvidia*' unhide
add path 'speaker*' unhide

Freebsd use group 44 for the video but the container use group 39

so by adding the podman user to both video groups
and using devfs.rules which allow access to the video group

the podman users has access to the gpu

the xdg runtime dir which contains the wayland socket mounted in the container as read only
and then i create another directory which i set as the xdg runtime dir in the container

and symlink the wayland socket, pulse audio and dbus directories into that directory

pulseaudio on freebsd using default.pa creates a pulseaudio.socket in /tmp which then mounted into the container
and the container has a client.conf pulseaudio config that set the pulseaudio server to the /tmp/pulseaudio.socket

i set the Linuxulator rocky linux libraies as a volume in the podman container
/compat/linux/usr/lib64 which is then mounted to /usr/lib64/nvidia-host

then i append the that location to the LD_LIBRARY_PATH in the container

export LD_LIBRARY_PATH=/usr/lib64:/usr/lib64/nvidia-host

that way we have the original library path in the container
and the mounted Linuxulator library path that contains the nvidia libraries

I also use the dummy-uvm.so in the container

by using the devfs rules and unhiding the nvidia devices means the podman container can see the hardware
then because we are mounting the Linuxulator nvidia directory from freebsd to the container it can then see the nvidia libraries

i also mount a directory from the freebsd host like the download directory into the container
and because the podman user shares the same username, id and group id all the permissions are correct

so with firefox running in the podman container you have access to the mapped download directory from freebsd
and because the pemissions are the same when you download files using firefox in the container

they are then owned by your user on the freebsd side

thats a brief overview and there is some more withcraft to get it all working

i have to write up the some of the install documentation
but i have a full guide on setting up podman

then i can push the repo to github
and do a long techy video explaining how it all works

i have only been back on freebsd for about 2 and half weeks
and working on this for about 5 days

i have managed to get the following working

1) ffmpeg hardware acclerated encoding using nvenc

2) firefox with widevine drm and hardware acclerated playback

3) Cuda working with python and torch with whisperx

writing all the documentation is the boring bit

but just like audio is half the picture

documentation is half the project
 
Regarding the shim written by shkhln against the UVM ioctl interface of nvidia driver higher than 525/535 :

python3 -c "import torch; print('CUDA Available:', torch.cuda.is_available()); print('Device:', torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None')

and the nvidia driver starting from 550 series that gives error,the PyTorch version isn't a problem — neither the Python version. Error 304 at cudaGetDeviceCount is the UVM shim failing, not a wheel mismatch.

Root cause

dummy-uvm.so (and the equivalent code in libc6-shim) was written by shkhln against the UVM ioctl interface of driver 525/535. NVIDIA changed the UVM ioctl layout starting with the 550 series, and every driver since — 550, 565, 570, 575 — breaks PyTorch's CUDA init the same way, regardless of which torch wheel you install. Python using the linuxulator doesn't work with CUDA except with the 535 driver. The exact same error 304 has been reported by other users on 570.124.04 + CUDA 12.8, and on the RTX 5070/Blackwell with driver 575 the trap is even more explicit: shim_ioctl_impl(-1, 0x27, _) is not implemented coming straight out of libnvidia-ml.so through libc6-shim. Even the latest libc6-shim release, 20251025 (October 2025), doesn't fix it for the post-535 UVM ABI. FreeBSD + 2

nvidia-smi keeps printing a clean table on 570 because it only needs NVML, not UVM — so a green nvidia-smi is misleading. The first thing PyTorch does is touch UVM, and that's where it dies.

Why changing PyTorch won't help

Switching from nightly cu128 to 2.4.0+cu121, or downgrading Python from 3.10 to 3.9, or trying 1.12.1+cu113 — none of that changes the syscall path. They all call into libcuda.so → UVM ioctls → linuxulator → shim. The shim is the bottleneck. Verm's tutorial worked in 2022 because driver 525 was current and the shim matched it; the wheel version is incidental.

Two practical options

  1. Downgrade to nvidia-driver 535.x (and matching linux-nvidia-libs-535.x). The known-good stack confirmed by multiple users is PyTorch 2.4.0 py3.10_cuda12.1_cudnn9.1.0, torchvision 0.19.0 py310_cu121, torchaudio 2.4.0 py310_cu121, Python 3.10, driver 535.146.02. Don't bother with nightly cu128 — cu121 is what works. FreeBSD
  2. Stay on 570 and accept PyTorch GPU is broken via Linuxulator until someone updates uvm_ioctl_override.c for the post-535 UVM ABI. Native FreeBSD CUDA apps that don't go through UVM (some Blender Cycles paths, ffmpeg NVENC) may still function; PyTorch/TensorFlow/JAX won't.
Sundry

  • nvcc not found is expected and unrelated — FreeBSD has no native CUDA toolkit port, and you don't need nvcc to run prebuilt PyTorch wheels, only to compile CUDA code yourself.
  • The GTX 1060 3GB is too tight for SDXL anyway (3 GB VRAM), but the RTX 2080 Ti / 11 GB is fine — once the driver situation is sorted.

What dummy-uvm.so really does :

The original source is 60 lines. It doesn't implement UVM — it lies to CUDA by saying UVM isn't supported, so the userland library takes the non-managed-memory path:

Code:
if (request == NV_UVM_INITIALIZE) {params->status = NV_ERR_NOT_SUPPORTED; // <-- the "lie"return 0;}

Plus, it redirects open("/dev/nvidia-uvm") to /dev/null (so the open doesn't fail) and silences /proc/self/task/*/comm (linux-only path).

Why it breaks on 550s and up

We don't know exactly without a trace from your system. There are three plausible hypotheses, in order of probability:

1) New unmanaged ioctls. Since 550, libcuda/libnvidia-ml calls additional ioctls (on the UVM file system or /dev/nvidiactl) that the old shim doesn't recognize. When the shim passes them to the kernel via libc_ioctl(), the file system is actually /dev/null (for UVM) or the real file system doesn't handle the new request → error 304.

2) libcuda no longer respects NOT_SUPPORTED. The modern driver treats UVM as mandatory for certain APIs even if the app doesn't use managed memory.

3) Ioctl encoding changed. FreeBSD bug 287895 shows that shim_ioctl_impl(-1, 0x27, _) is not implemented by libnvidia-ml on 575 — request 0x27 raw (not _IOR-encoded, outside the 0x46xx range). This means that NVIDIA has added new escape commands with different encodings.

If the cause is (1) or (3) and the new ioctls are "informative" (they ask for data, not perform DMA operations), they can be spoofed. If they require real UVM operations (page faulting, GPU page tables, memory pinned to BAR1), no user-space patch will suffice—we need the nvidia-uvm kmod ported to FreeBSD, which is hundreds of kilobytes of GPL/proprietary code that's not easily portable. Realistic probability of a user-space patch fixing the issue: 40-60%

How to fix it.

3-Steps Strategy

1) Compile an instrumented version of dummy-uvm.so that logs every ioctl/open/close around /dev/nvidia-uvm and /dev/nvidiactl
2) Run python3 -c 'import torch;2 torch.cuda.is_available()' under it.
3) Based on the log, we can write targeted handlers for the ioctls that fail

To Be Continued...
 
The shim is already fixed by North_Promise_9835 on reddit

so you dont need to worry about all that


works on Driver Version: 595.58.03

heres the shim


these work in the podman containers, nvenc encoding with ffmpeg, hardware acceleration for video playback in firefox
and cuda working with python torch with whisperx

if the new shim didnt work then none of those things would work
and they do, so its not an issue

 
Install podman so you are ready to go when i release freebsd-podman


follow along until this section and run the test which will print hello world


here's the video to go with the notes
you dont need to install searxng


heres my config files for reference



 
Back
Top