amdgpu: error results in black screen

Today the screen on my workstation was off (black) and wouldn't turn on when I moved the mouse or hit keys on the keyboard. ssh'ed in from another machine, and found this in /var/log/messages
Code:
Mar 12 00:33:22 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=136693003, emitted seq=136693005
Mar 12 00:33:22 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 100287 thread  pid 100287
Mar 12 00:33:22 kg-core2 kernel: [drm] GPU recovery disabled.
Mar 12 19:28:12 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=136693003, emitted seq=136693005
Mar 12 19:28:12 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 102059 thread  pid 102059
Mar 12 19:28:12 kg-core2 kernel: [drm] GPU recovery disabled.
recovery is set to auto, but that is apparently not enough
Code:
root@kg-core2:~ # sysctl  hw.amdgpu.gpu_recovery
hw.amdgpu.gpu_recovery: -1
root@kg-core2:~ # sysctl -d hw.amdgpu.gpu_recovery
hw.amdgpu.gpu_recovery: Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)
root@kg-core2:~ # sysctl compat.linuxkpi.amdgpu_gpu_recovery
compat.linuxkpi.amdgpu_gpu_recovery: -1
root@kg-core2:~ # sysctl -d compat.linuxkpi.amdgpu_gpu_recovery
compat.linuxkpi.amdgpu_gpu_recovery: Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)
in the end I just had to shutdown and reboot the machine.
Details:
Code:
root@kg-core2:~ # freebsd-version -ku
13.2-RELEASE-p10
13.2-RELEASE-p10
root@kg-core2:~ # uname -a
FreeBSD kg-core2.kg4.no 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64
relevant packages
Code:
root@kg-core2:~ # pkg info drm\*
drm-fbsd13-kmod-5.4.191.g20220604_1
root@kg-core2:~ # pkg info gpu-firmware-kmod\*
gpu-firmware-kmod-20230210_1,1
 
It happened again. This time I was using the machine, and the display just froze. I could ssh into the machine from another, and everything except the display was not working. From /var/log/messages:
Code:
Mar 24 20:56:45 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] ring gfx timeout, signaled seq=708697721, emitted seq=708697723
Mar 24 20:56:45 kg-core2 kernel: [drm ERROR :amdgpu_job_timedout] Process information: process  pid 100293 thread  pid 100293
Mar 24 20:56:45 kg-core2 kernel: [drm] GPU recovery disabled.
a reboot wasn't enough, I had to physically power off the on the machine to bring the display back.
The recovery sysctl are set to auto
Code:
root@kg-core2:~ # sysctl hw.amdgpu.gpu_recovery; sysctl compat.linuxkpi.amdgpu_gpu_recovery
hw.amdgpu.gpu_recovery: -1
compat.linuxkpi.amdgpu_gpu_recovery: -1
even if I have this in /boot/loader.conf:
Code:
root@kg-core2:~ # cat /boot/loader.conf
hw.amdgpu.gpu_recovery=1
still running
Code:
root@kg-core2:~ # freebsd-version -ku
13.2-RELEASE-p10
13.2-RELEASE-p10
root@kg-core2:~ # uname -a
FreeBSD kg-core2.kg4.no 13.2-RELEASE-p10 FreeBSD 13.2-RELEASE-p10 GENERIC amd64
 
Looks like it happens a lot these days, sorry mate I can't help you but could you share the model of your graphic card please.
 
Sure. It is a Sapphire Radeon R7 240 with 4G DDR3 memory. It shows up like this in pciconf output
Code:
root@kg-core2:~ # pciconf -lv | grep -B 4 VGA
vgapci0@pci0:7:0:0:    class=0x030000 rev=0xc7 hdr=0x00 vendor=0x1002 device=0x6617 subvendor=0x1da2 subdevice=0xe263
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Oland LE [Radeon R7 240]'
    class      = display
    subclass   = VGA
 
I've looked over bugtracker and mailing lists without finding anything, I don't think I'll be useful mate, hope someone will be able to help you a bit.
 
Back
Top