nvidia-drm: Machine crashed in Xfce

Hey guys,
please, take my question with some care, I'm not using FreeBSD on a daily basis but would love to and I'd like to help to debug the current stuff.

On my Lenovo P1 Gen2 Intel Xeon machine with NVIDIA GPU running 14.1-RELEASE.

Code:
vendor     = 'NVIDIA
device     = 'TU117GLM [Quadro T2000 Mobile / Max-Q]'

I use nvidia-drm module and driver.
Just for a test I upgraded to latest and ran
Code:
time tree /
in the Xfce terminal in Xfce which was run via startxfce4. Machine completely frozen after a second and automatically restarted in, say, 10-15 s. I was not able to ssh into it in this time window.

Please, what are the steps one may take to a) investigate the problem b) report the problem. I want to learn how to deal with this kind of situations.
Sorry for such a newbie question but it might also help the others.
 
I use nvidia-drm module and driver.
Just for a test I upgraded to latest and ran
Did you install the package? Keep in mind that packages in the repositories are still for 14.0, especially kernel modules are likely to cause problems on 14.1. Build them from ports.
 
Did you install the package? Keep in mind that packages in the repositories are still for 14.0, especially kernel modules are likely to cause problems on 14.1. Build them from ports.
Yes, package 99% of time just works fine. Typing this message in Xfce. Interesting, I did not know about this discrepancy. That's very unpredictable, logical and clear now yes but unpredictable. Out of curiosity, not having such important kernel modules rebuilt for a new RELEASE, is that a matter of computing power or is that a conceptual decision that packages are built with mostly independent pace?
 
That's very unpredictable, logical and clear now yes but unpredictable.
It's predictable. This always happens in the three month transition period. There's one repository (two if you count quarterly and latest) for each major version.
 
It's predictable. This always happens in the three month transition period. There's one repository (two if you count quarterly and latest) for each major version.
Thanks so much! Great to know that. Module from ports works like a charm so far.
 
It's predictable. This always happens in the three month transition period. There's one repository (two if you count quarterly and latest) for each major version.
Well, even with nvidia-drm-kmod compiled from ports against releng/14.1 I still experience random reboots. Cannot see any other possible root cause than the NVIDIA driver. Please, any suggestions how to tackle that? They're random, a few minutes ago my P1 restarted idling in Xfce. Colleagues having the very same P1 experienced issues with 550 drivers even in Arch Linux etc. Is there any way how to prepare a module connected to e.g. v535?
 
Note that graphics/nvidia-drm-kmod is just a kind of metaport which determines matching implementation and install it as its dependency.

The actual implementations are one of (conflicting each other):
and you should rebuild/reinstall the one actually installed for you now.

(The logic for selection is outdated, and the author/maintainer (ashafer) has a review D45400 in conjunction with the preparation for next major update of the driver.)
 
And current versions of graphics/nvidia-drm-[510|515|61]-kmod ports are intended to be built with x11/nvidia-driver, which is now 550.54.14.

Starting from 550 series (current production branch) of drivers, codes to support DRM by ashafer are incorporated, but I heared from him that it is intentionally NOT built by default and graphics/nvidia-drm-*-kmod ports should keep on doing the job, as the codes requires codes from outside (graphics/drm-*-kmod) to be built and run.

So sticking with 535 series of old drivers is a bad idea and strongly discouraged.
 
So sticking with 535 series of old drivers is a bad idea and strongly discouraged.
Valuable insight, thanks for that! Interesting that, for example, Ubuntu 24.04 LTS sticked to 535 by default, so this was the source of that idea. Btw, 61 kmod still considered experimental for 14.1? I've seen the if in the meta-port which enables 61 only for BSD >= 15.
 
Where's that?

graphics/drm-kmod/Makefile defaulted to 6.1 on 14-STABLE before 14.1 was branched.
Yes, it is. But currently graphics/nvidia-drm-kmod is not yet catch up with it.
Review D45400 that I already noted includes the update for it, too.
I've reviewed and tested the diffs.
Working fine (without unknown new issue) for me both with 550 series (production) and with 555 series (beta) using [nvidia-]drm-61-kmod ports.
Tested on stable/14 and main, 550.54.14, 550.67, 550.78, 555.42.02 and 555.52.04 of drivers.

But unfortunately, I'm not a committer. And at the same time, not entitiled as reviewer of the review (and cannot assign myself as reviewer), so cannot mark it as accepted.

Drivers other than 550.54014 (currently in-tree version) are tested with overriding DISTVERSION by editing x11/nvidia-driver/Makefile.version and specifying NO_CHECKSUM=YES on build.
Note that I've tested with x11/mate DE, not x11-wm/xfce4.
 
Sorry, I lost sight of the NVIDIA context. I see it now,

Code:
…
.  elif ${OSVERSION} >= 1500008
RUN_DEPENDS+=  ${KMODDIR}/nvidia-drm.ko:graphics/nvidia-drm-61-kmod
.  endif
…
 
Back
Top