amdgpu: error results in black screen

Today the screen on my workstation was off (black) and wouldn't turn on when I moved the mouse or hit keys on the keyboard. ssh'ed in from another machine, and found this in /var/log/messages
Code:
Mar 12 00:33:22 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=136693003, emitted seq=136693005
Mar 12 00:33:22 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 100287 thread  pid 100287
Mar 12 00:33:22 kg-core2 kernel: [drm] GPU recovery disabled.
Mar 12 19:28:12 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=136693003, emitted seq=136693005
Mar 12 19:28:12 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 102059 thread  pid 102059
Mar 12 19:28:12 kg-core2 kernel: [drm] GPU recovery disabled.
recovery is set to auto, but that is apparently not enough
Code:
root@kg-core2:~ # sysctl  hw.amdgpu.gpu_recovery
hw.amdgpu.gpu_recovery: -1
root@kg-core2:~ # sysctl -d hw.amdgpu.gpu_recovery
hw.amdgpu.gpu_recovery: Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)
root@kg-core2:~ # sysctl compat.linuxkpi.amdgpu_gpu_recovery
compat.linuxkpi.amdgpu_gpu_recovery: -1
root@kg-core2:~ # sysctl -d compat.linuxkpi.amdgpu_gpu_recovery
compat.linuxkpi.amdgpu_gpu_recovery: Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)
in the end I just had to shutdown and reboot the machine.
Details:
Code:
root@kg-core2:~ # freebsd-version -ku
13.2-RELEASE-p10
13.2-RELEASE-p10
root@kg-core2:~ # uname -a
FreeBSD kg-core2.kg4.no 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64
relevant packages
Code:
root@kg-core2:~ # pkg info drm\*
drm-fbsd13-kmod-5.4.191.g20220604_1
root@kg-core2:~ # pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20230210_1,1
 
It happened again. This time I was using the machine, and the display just froze. I could ssh into the machine from another, and everything except the display was not working. From /var/log/messages:
Code:
Mar 24 20:56:45 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=708697721, emitted seq=708697723
Mar 24 20:56:45 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 100293 thread  pid 100293
Mar 24 20:56:45 kg-core2 kernel: [drm] GPU recovery disabled.
a reboot wasn't enough, I had to physically power off the on the machine to bring the display back.
The recovery sysctl are set to auto
Code:
root@kg-core2:~ # sysctl hw.amdgpu.gpu_recovery; sysctl compat.linuxkpi.amdgpu_gpu_recovery
hw.amdgpu.gpu_recovery: -1
compat.linuxkpi.amdgpu_gpu_recovery: -1
even if I have this in /boot/loader.conf:
Code:
root@kg-core2:~ # cat /boot/loader.conf
hw.amdgpu.gpu_recovery=1
still running
Code:
root@kg-core2:~ # freebsd-version -ku
13.2-RELEASE-p10
13.2-RELEASE-p10
root@kg-core2:~ # uname -a
FreeBSD kg-core2.kg4.no 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64
 
Looks like it happens a lot these days, sorry mate I can't help you but could you share the model of your graphic card please.
 
Sure. It is a Sapphire Radeon R7 240 with 4G DDR3 memory. It shows up like this in pciconf output
Code:
root@kg-core2:~ # pciconf -lv | grep -B 4 VGA
vgapci0@pci0:7:0:0:    class=0x030000 rev=0xc7 hdr=0x00 vendor=0x1002 device=0x6617 subvendor=0x1da2 subdevice=0xe263
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Oland LE [Radeon R7 240]'
    class      = display
    subclass   = VGA
 
I've looked over bugtracker and mailing lists without finding anything, I don't think I'll be useful mate, hope someone will be able to help you a bit.
 
relevant packages
Code:
root@kg-core2:~ # pkg info drm\*
drm-fbsd13-kmod-5.4.191.g20220604_1
root@kg-core2:~ # pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20230210_1,1
A lot of time has passed since ... I don't know if your issue is fixed now, but I was looking for something lately and found out the package drm-fbsd13-kmod is gone long time ago, it could be a part of your problem.
The package can be replaced by drm-510-kmod instead (if you are still on 13) and/or (I am still not sure) by drm-kmod, personally I have both installed ...

Code:
~ > freebsd-version -u ; uname -rms
13.2-RELEASE-p11
FreeBSD 13.2-RELEASE-p11 amd64
~ >
~ > pkg info drm\*
drm-510-kmod-5.10.163_9
drm-kmod-20220907_3
~ >
~ > pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20240401,1
~ >
 
Haven't happened again (knock on wood...). The gpu-firmware package is updated
Code:
root@kg-core2:~ # pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20240401,1
 
and it happened again
Code:
May  6 06:32:48 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=1100102395, emitted seq=1100102397
May  6 06:32:48 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 100273 thread  pid 100273
May  6 06:32:48 kg-core2 kernel: drmn0: GPU recovery disabled.
currently running
Code:
root@kg-core2:~ # freebsd-version -ku
13.4-RELEASE-p3
13.4-RELEASE-p5
root@kg-core2:~ # uname -a
FreeBSD kg-core2.kg4.no 13.4-RELEASE-p3 FreeBSD 13.4-RELEASE-p3 GENERIC amd64
packages
Code:
root@kg-core2:~ # pkg info drm\*
drm-510-kmod-5.10.163.1304000_11
root@kg-core2:~ # pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20241114,1
 
Hello,

can you explain in more detail what kind of desktop you are using and what you did in the last seconds when (#2) happened, to maybe find an potential trigger for that?

BTW, it might be interesting to see if this timeout is mesa related and maybe probably fixed in future versions.

Install graphics/mesa-devel and then reboot or restart your session. If this then doesn't happen again then its fixed.

Note: If your session doesn't start then you need to apply the workaround below.

graphics/mesa-devel is able to co-exist with the regular mesa packages but somewhere there is still an issue regarding that.

The workaround is rm -r -d /usr/local/lib/dri && cp -r '/usr/local/lib/dri-devel' '/usr/local/lib/dri'

To revert this workaround do: pkg install -f mesa-dri
 
can you explain in more detail what kind of desktop you are using and what you did in the last seconds when (#2) happened, to maybe find an potential trigger for that?
the machine info is here. I wasn't using the machine (actively) at the time, it is my main workstation, it is on 24/7 (well, as long as mains power is) and has Xfce, firefox and my other applications always running.
If this then doesn't happen again then its fixed.
yes, the problem is that this happens very infrequently (you can read more in the machine notes, under FreeBSD if you are interested).
 
Back
Top