Solved 14.0 - AMDGPU Hard Crash / Reboots drm-515-kmod

Hi thedaemon

I believe I have the same issue here. Would you be so kind to share the steps you took to fix this? Also, how did you confirm this was a GPU issue?

Thanks a lot!

Edit: Please ignore the second question. I see you've used the core dump to check the problem. :)
 
I've changed from version drm-515-kmod-5.15.118_4 to version drm-510-kmod-5.10.163_9 and so far it has been stable... not sure why this is happening though. But this might be a quick fix for someone having the same problem like me.

Cheers!
 
I've changed from version drm-515-kmod-5.15.118_4 to version drm-510-kmod-5.10.163_9 and so far it has been stable... not sure why this is happening though. But this might be a quick fix for someone having the same problem like me.

Cheers!
Problem with this workaround is that it makes my life a pain everytime I want to pkg autoremove

This makes pkg try to remove all amdgpu-firmware-... away because they're linked to drm-515-kmod-5.15.118_4 :(

Any suggestions are highly appreciated 👍
 
… all amdgpu-firmware-... away because they're linked to drm-515-kmod-5.15.118_4

For the latter:

1715938975282.png


Try temporarily locking graphics/gpu-firmware-kmod

Postscript: see below for the better approach.
 
This makes pkg try to remove all amdgpu-firmware-... away because they're linked to drm-515-kmod-5.15.118_4
You don't need them all, and they are not linked to a specific drm version.

Check which firmware is loaded

Example:
Code:
 # kldstat | grep amdgpu

0    1 0xffffffff83600000   4129b8 amdgpu.ko
17    1 0xffffffff83a9f000     64e0 amdgpu_renoir_sdma_bin.ko
18    1 0xffffffff83aa6000    2c2e0 amdgpu_renoir_asd_bin.ko
19    1 0xffffffff83ad3000     7560 amdgpu_renoir_pfp_bin.ko
20    1 0xffffffff83adb000     6560 amdgpu_renoir_me_bin.ko
21    1 0xffffffff83ae2000     4560 amdgpu_renoir_ce_bin.ko
22    1 0xffffffff83ae7000     bcd8 amdgpu_renoir_rlc_bin.ko
23    1 0xffffffff83af3000    43800 amdgpu_renoir_mec_bin.ko
24    1 0xffffffff83b37000    43800 amdgpu_renoir_mec2_bin.ko
25    1 0xffffffff83b7b000    1fbe8 amdgpu_renoir_dmcub_bin.ko
26    1 0xffffffff83b9b000    645a0 amdgpu_renoir_vcn_bin.ko

afterwards check if the firmware is on the pkg autoremove list.

If it is listed, set the package from automatic to non-automatic, to prevent autoremove
Code:
 # pkg set -A 0 gpu-firmware-amd-kmod-renoir

Replace 'renoir' with the GPU name of your system.

For documentation see pkg-set(8)
 
I do suspect that he is using graphics/drm-kmod which sets the gpu-firmware as dependency.

So when the removes drm-515 then the metapackage gets removed too and then the gpu-firmware with it.
Not sure what happens there on jcamos system, I can't reproduce the issue described.

On a VM I installed graphics/drm-kmod, which pulled in graphics/drm-515-kmod and all firmware of graphics/gpu-firmware-amd-kmod (The host has AMD 'Lucienne' GPU).

Then:
Code:
 # pkg del -f drm-515-kmod
 # pkg install drm-510-kmod

Running pkg autoremove doesn't list anything.

By the way, on the host I'm running amd-510-kmod instead of drm-515-kmod. With drm-515-kmod Xorg in time becomes sluggish to a point locking up. I might try the version in the PR you linked. Thanks for the suggestion.
 
How exactly do you 'use' this branch? I have the same issue, in console there is no problem, but in GNOME, it just locks up and reboots in minutes.

i9 13900k on asus B760 prime with RX580.
I cloned it, switched to the required branch, ran make and make install. At first make failed because I did not have system sources installed. After installation everything was fine. Though I was running 4.1-BETA1
 
Thanks, I was able to compile, the driver loads, but again after only a few minutes, locks up and reboots again.
 
You don't need them all, and they are not linked to a specific drm version.

Check which firmware is loaded

Example:
Code:
 # kldstat | grep amdgpu

0    1 0xffffffff83600000   4129b8 amdgpu.ko
17    1 0xffffffff83a9f000     64e0 amdgpu_renoir_sdma_bin.ko
18    1 0xffffffff83aa6000    2c2e0 amdgpu_renoir_asd_bin.ko
19    1 0xffffffff83ad3000     7560 amdgpu_renoir_pfp_bin.ko
20    1 0xffffffff83adb000     6560 amdgpu_renoir_me_bin.ko
21    1 0xffffffff83ae2000     4560 amdgpu_renoir_ce_bin.ko
22    1 0xffffffff83ae7000     bcd8 amdgpu_renoir_rlc_bin.ko
23    1 0xffffffff83af3000    43800 amdgpu_renoir_mec_bin.ko
24    1 0xffffffff83b37000    43800 amdgpu_renoir_mec2_bin.ko
25    1 0xffffffff83b7b000    1fbe8 amdgpu_renoir_dmcub_bin.ko
26    1 0xffffffff83b9b000    645a0 amdgpu_renoir_vcn_bin.ko

afterwards check if the firmware is on the pkg autoremove list.

If it is listed, set the package from automatic to non-automatic, to prevent autoremove
Code:
 # pkg set -A 0 gpu-firmware-amd-kmod-renoir

Replace 'renoir' with the GPU name of your system.

For documentation see pkg-set(8)
nice! thanks! that's a nice work-around for the moment :)
 
I too just git cloned that repo and built it then located the files and overwrote them manually. I'm on 14 latest updates not quarterly, and I'm not sure if it's upstreamed or what yet. I do know that I still come to my screen being off sometimes and unable to wake it still, but not all the time.
 
I booted into single user mode to do so, btw. Not sure if it's 100% required but if you are using the kernel modules, I don't think you can overwrite them.
 
Back
Top