Solved quarterly now using Nvidia 570.124.04 and Cuda 12.8

Can u explain what worked and what didn't work for you ?
when system boots and has already loaded 570.124 nvidia driver it refuses to load 570-144 nvidia-drm and complain that the two modules are incompatible . Need to run "dmesg -a | less " to see this error MSG.
 
It didn't happen to me. On 14.2 I have just upgraded 570.124 to 570.144 without complains by the system. What I did ? This easy procedure :

Code:
# cd /usr/ports/x11/linux-nvidia-libs
# nano Makefile
DISTVERSION?=   570.144

# make deinstall
# make makesum
# make
# make install

# cd /usr/ports/x11/nvidia-driver
# nano Makefile
DISTVERSION?=   570.144

# make deinstall
# make makesum
# make
# make install

REBOOT

marietto# dmesg -a | grep drm
[drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[drm] Initialized nvidia-drm 0.0.0 20160202 for nvidia0 on minor 0

marietto# nvidia-smi
Thu May 15 17:17:35 2025      
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144                Driver Version: 570.144        CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 3GB    Off |   00000000:01:00.0  On |                  N/A |
| 56%   39C    P8              9W /  120W |     270MiB /   3072MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5174      G   /usr/local/libexec/Xorg                 104MiB |
|    0   N/A  N/A            5324      G   firefox                                 161MiB |
+-----------------------------------------------------------------------------------------+

marietto# nv-sglrun nvidia-smi
/usr/local/lib/libc6-shim/libc6.so: shim init
Thu May 15 17:18:17 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144                Driver Version: 570.144        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 3GB    Off |   00000000:01:00.0  On |                  N/A |
| 56%   39C    P8              7W /  120W |     252MiB /   3072MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
 
It didn't happen to me. On 14.2 I have just upgraded 570.124 to 570.144 without complains by the system. What I did ? This easy procedure :

Code:
# cd /usr/ports/x11/linux-nvidia-libs
# nano Makefile
DISTVERSION?=   570.144

# make deinstall
# make makesum
# make
# make install

# cd /usr/ports/x11/nvidia-driver
# nano Makefile
DISTVERSION?=   570.144

# make deinstall
# make makesum
# make
# make install

marietto# dmesg -a | grep drm
[drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[drm] Initialized nvidia-drm 0.0.0 20160202 for nvidia0 on minor 0

marietto# nvidia-smi
Thu May 15 17:17:35 2025      
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144                Driver Version: 570.144        CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 3GB    Off |   00000000:01:00.0  On |                  N/A |
| 56%   39C    P8              9W /  120W |     270MiB /   3072MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            5174      G   /usr/local/libexec/Xorg                 104MiB |
|    0   N/A  N/A            5324      G   firefox                                 161MiB |
+-----------------------------------------------------------------------------------------+

REBOOT.

Yes thats what I should have done , but initially i did not replace 570.124 nvidia-driver with 570.144 version. so there was a version mismatch.
 
im closing this thread

I don't think its a good idea. The problem is unfixed,so "we" should go ahead (at least me), trying and trying until we find a solution to achieve the goal. That's our passion and our "mission" , the power to serve.

Do you want to give a look at this thread ?


specially where he says :

Same environment but setting visible devices to only 1 works fine (e.g. export CUDA_VISIBLE_DEVICES=0; python train.py …). Seems to error out in DDP? I can get past the original error by specifying up to 4 GPUs in CUDA_VISIBLE_DEVICES, but then I get a “CUDA error: an illegal memory access was encountered” error for 2 or more GPUs w/DDP.<br><br>Smoke test (another data point during debugging):<br>export CUDA_VISIBLE_DEVICES=0,1,2,3; python -c ‘import torch; torch.cuda.is_available()’<br>works fine, but then adding more than 4 GPUs fails:<br>export CUDA_VISIBLE_DEVICES=0,1,2,3,4; python -c ‘import torch; torch.cuda.is_available()’<br>/home/adaboost/miniconda3/envs/mustango/lib/python3.10/site-packages/torch/cuda/init.py:181: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)<br>return torch._C._cuda_getDeviceCount() &gt; 0

this is interesting :

Code:
export CUDA_VISIBLE_DEVICES=0,1,2,3; python -c ‘import torch; torch.cuda.is_available()’
works fine

so,this could be the reason and the solution of the error....
 
Last edited:
I know believe thats it /boot/modules/dmabuf.ko that is not getting rebuildt somehow, by rebuilding the two /usr/ports/graphics packages:

nvidia-drm-61-kmod
nvidia-drm-kmod

When installing from scratch the dmabuf.ko finds its way to /boot/modules
but where does it come from ?
 
% pkg which -o /boot/modules/dmabuf.ko
/boot/modules/dmabuf.ko was installed by package graphics/drm-61-kmod
Thank's Yes so it says on my machine too......
but for some reason reinstalling a new veersion of drm-61-kmod ( the update from 570.124 to 570.144 )
does not reinstall dmabuf.ko
This happenden on one of my machines that I now reinstalled to 14.3-Stable so I cant test it again.....
 
I haven't still tested this argument :

export CUDA_VISIBLE_DEVICES=0,1,2,3;

but if it works,is not enough to add that prefix only,instead of apporting some heavy changes to a system that works great ?
 
I haven't still tested this argument :



but if it works,is not enough to add that prefix only,instead of apporting some heavy changes to a system that works great ?
Not sure as I never tried CUDA, but beware to your shell you're configuring.
The syntax may differ depending on shells. The syntax you noted is for /bin/sh or any other POSIX compliant sh.
 
It does not work :
Code:
(pytorch) marietto# export CUDA_VISIBLE_DEVICES=0,1; LD_PRELOAD="/mnt/da0p2/CG/Tools/Stable-Diffusion/dummy-uvm.so" python3 -c 'import torch; torch.cuda.is_available()'

/home/marietto/miniconda3/envs/pytorch/lib/python3.12/site-packages/torch/cuda/__init__.py:181: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
 

Good starting point to be able to debug what's broken with pytorch,cuda and the nvidia driver...​


NVIDIA CUDA Support

If you want to compile pytorch with CUDA support, select a supported version of CUDA from our support matrix, then install the following:



Note: You could refer to the cuDNN Support Matrix for cuDNN versions with the various supported CUDA, CUDA driver and NVIDIA hardware


If you want to disable CUDA support, export the environment variable USE_CUDA=0.Other potentially useful environment variables may be found in setup.py.


If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here :

source :

 
I downloaded the CUDA toolkit
cuda-repo-ubuntu2404-13-0-local_13.0.0-580.65.06-1_amd64.deb
and unpacked all packages into the same folder.

nvidia-smi fails when run as root:

# usr/bin/nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system

The latest nvidia-driver-580.76.05.1403505 is installed on 14.3 and the NVidia card works fine.

What might be causing this error, and how to make it run?
 
Can these tutorials be useful for you ?


 
Is CUDA support actually present in the FreeBSD NVidia driver for sure?

I have the latest NVidia driver, and the no-too-old NVidia card supporting CUDA, but am getting the error "Failed to initialize NVML: GPU access blocked by the operating system"

Perhaps some sysctl variable is needed?
 
Is CUDA support actually present in the FreeBSD NVidia driver for sure?

I have the latest NVidia driver, and the no-too-old NVidia card supporting CUDA, but am getting the error "Failed to initialize NVML: GPU access blocked by the operating system"

Perhaps some sysctl variable is needed?

The version of the nvidia driver installed inside the Linuxulator should be the same as the version of the nvidia driver installed on FreeBSD.
 
Back
Top