AMD vega crashing

Hi all,

I have moved my desktop to freebsd 14.0 for a few weeks now but my system keeps crashing every few days. It used to happen more frequently when i had a 1440p monitor plugged in, and the most recent time it crashed was when i was trying to get a llama file to use gpu acceleration so i believe the cause is my graphics card.

I have a AMD Vega 64 and i have installed the graphics drivers according to the freebsd handbook, i can load a desktop fine and any apps but after a few days of being on it will just hang and i can no longer ssh to it or when i manage to get physical access it will not respond to any input and has to be hard power cycled.

I have ran sysrc kld_list+=amdgpu and can see the drm and relevant vega drivers are being loaded ok.

Code:
Id Refs Address                Size Name
 1  120 0xffffffff80200000  1d34598 kernel
 2    1 0xffffffff81f35000    1c3a8 geom_eli.ko
 3    1 0xffffffff81f52000     7718 cryptodev.ko
 4    1 0xffffffff81f5a000   5d51c8 zfs.ko
 5    1 0xffffffff83200000   504958 amdgpu.ko
 6    2 0xffffffff83110000    7c050 drm.ko
 7    1 0xffffffff8318d000     22b8 iic.ko
 8    3 0xffffffff83190000     3080 linuxkpi_hdmi.ko
 9    3 0xffffffff83194000     6350 dmabuf.ko
10    3 0xffffffff8319b000     3378 lindebugfs.ko
11    1 0xffffffff8319f000     b360 ttm.ko
12    1 0xffffffff831ab000     2220 amdgpu_vega10_gpu_info_bin.ko
13    1 0xffffffff831ae000     64e0 amdgpu_vega10_sdma_bin.ko
14    1 0xffffffff831b5000     64e0 amdgpu_vega10_sdma1_bin.ko
15    1 0xffffffff831bc000    2ac70 amdgpu_vega10_sos_bin.ko
16    1 0xffffffff83705000    2c2e0 amdgpu_vega10_asd_bin.ko
17    1 0xffffffff831e7000     7560 amdgpu_vega10_pfp_bin.ko
18    1 0xffffffff831ef000     6560 amdgpu_vega10_me_bin.ko
19    1 0xffffffff831f6000     4560 amdgpu_vega10_ce_bin.ko
20    1 0xffffffff83732000     63e0 amdgpu_vega10_rlc_bin.ko
21    1 0xffffffff83739000    43800 amdgpu_vega10_mec_bin.ko
22    1 0xffffffff8377d000    43800 amdgpu_vega10_mec2_bin.ko
23    1 0xffffffff837c1000    423e0 amdgpu_vega10_acg_smc_bin.ko
24    1 0xffffffff83804000    5f080 amdgpu_vega10_uvd_bin.ko
25    1 0xffffffff83864000    2c800 amdgpu_vega10_vce_bin.ko
26    1 0xffffffff831fb000     3390 acpi_wmi.ko
27    1 0xffffffff83891000     3220 intpm.ko
28    1 0xffffffff83895000     2178 smbus.ko
29    1 0xffffffff83898000     2260 pflog.ko
30    1 0xffffffff8389b000    4d038 pf.ko
31    1 0xffffffff838e9000     3360 uhid.ko
32    1 0xffffffff838ed000     4364 ums.ko
33    1 0xffffffff838f2000     33c0 usbhid.ko
34    1 0xffffffff838f6000     3380 hidbus.ko
35    1 0xffffffff838fa000     3360 wmt.ko

These are the drm versions installed currently

Code:
drm-515-kmod-5.15.160          DRM drivers modules
drm-kmod-20220907_3            Metaport of DRM modules for the linuxkpi-based KMS components
gpu-firmware-kmod-20240401,1   Firmware modules for the drm-kmod drivers
libdrm-2.4.122,1               Direct Rendering Manager library and headers

Any suggestions of how to fix or help diagnose this would be much appreciated
 
Here is my dmesg output, as well as current xorg log
 

Attachments

  • dmesg.txt
    16.6 KB · Views: 13
  • pciconf.txt
    36.7 KB · Views: 12
  • xorgLog.txt
    26.6 KB · Views: 15
Hi,

RX Vega 64 owner here too, indeed your situation doesn't feel right.
I can't tell for sure that will work with 14.1-RELEASE because I currently use 13.3-RELEASE, but I think that it might help a lot if you build those:
graphics/gpu-firmware-kmod, graphics/drm-kmod, graphics/drm-515-kmod from port directly instead of installing them from pkg, why that? hopefully someone more experienced than me will jump in to explain it to you more correctly than I could ever do but if you allow me to summarize they are outdated.
So at this point there are two choices:
_ use the port collection, compile and install the port yourself.
_ use a builder called ports-mgmt/poudriere which will create a package that can be installed via pkg.
Personally I use the 2nd solution.
 
Back
Top