T-Aoki , please give a look below :
Code:
root@marietto:/compat/linux # nv-sglrun nvidia-smi
/usr/local/lib/libc6-shim/libc6.so: shim init
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.154
@marietto:/compat/linux # nvidia-smi
Thu May 8 14:54:13 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: N/A |
marietto:/compat # ./start-noble-bash-no-jail
marietto@marietto:/$ nvidia-smi
Thu May 8 14:55:45 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 |
For some reason,when I make the installation of a new nvidia driver,a part of the old one is not updated...that could be the reason why greater versions of the nvidia driver is never detected by pytorch+cu...or...for some reason the files of the 535 driver are installed by default in some way and they stick there forever ?
And it's not a coincidence that the residual part is 535,just the last driver version that works.
I cannot know exactly what's happening in your environment (as I don't have environment that 100% match, not only hardware but including installing/upgrading history of softwares / firmwares), but some predictions are possible.
Possibilities would be, as I can imagine of right now:
- Deinstall of previous version was somehow incomplete (interrupted, crashed, ...) and later installs are done overthere.
- Deinstall and reinstall are all went fine for x11/nvidia-driver and x11/linux-nvidia-libs, but anything depending upon them (libc6-shim here?) are not rebuilt against them. And old libraries are preserved in /usr/local/lib/compat/pkg/ and for example, libc6-shim still requires the preserved ones.
Quite unfortunately, recent softwares in ports often installs their libraries in their own subdirectories under /usr/local/lib/ and these appears later than /usr/local/lib/compat/pkg/ in library search path, thus, preserved ones are missingly loaded and linked.
The "workaround" for the latter is to remove (or move to elseware that aren't searched) preserved libraries which are causing mis-behaviours.
Fixing this could be non-trivial and shall be done in base, not ports.
Place for preserved libraries (/usr/local/lib/compat/pkg/ with historical reason) shall be always at the end of the library search path, but I'm not sure there's way to do so "always sanely all over the system at the exactly same time". If the library search path is somehow cached by something outside base, how can we assure all of them are promised to be updated all at once?
/usr/local/lib/compat/pkg/ used to work perfectly fine when it was first introduced (IIRC, by almost at the first import of ports-mgmt/portupgrade, at the time, the category ports-mgmt was not existed and maybe sysutils was used), as all libraries from ports are installed directly under /usr/local/lib/.