Hi, I'm a computer science student and in the upcoming semester I'm enrolled in the advanced OS curse at my university - so to get a little practice a thought to tackle an issue annoys me for a while on FreeBSD - it's the amdgpu module. Which has a pkg build https://www.freshports.org/graphics/drm-510-kmod/ so presumably it's works - but for which type of gpus is the question intel radeon or amd... In my testing it doesn't work with amdgpu. I tested drm-510-kmod build from src and from pkg and drm-61-kmod build from src and multiple GPU's of different generations - same failure mode on all of them. To be clear I don't need this to work - it would be nice, however I could just run linux which works fine with the same hardware, this is more for fun and education.
HD7750
hdac0: <ATI (0xaab0) HDA Controller> mem 0x80060000-0x80063fff irq 1044473 at device 0.1 numa-domain 0 on pci1
hdac0: hdac_get_capabilities: Invalid corb size (0)
device_attach: hdac0 attach returned 6
<6>[drm] amdgpu kernel modesetting enabled.
drmn0: <drmn> numa-domain 0 on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
<6>[drm] initializing kernel modesetting (VERDE 0x1002:0x683F 0x1043:0x0459 0x00).
drmn0: Trusted Memory Zone (TMZ) feature not supported
<6>[drm] register mmio base: 0x00000000
<6>[drm] register mmio size: 262144
<6>[drm] add ip block number 0 <si_common>
<6>[drm] add ip block number 1 <gmc_v6_0>
<6>[drm] add ip block number 2 <si_ih>
<6>[drm] add ip block number 3 <gfx_v6_0>
<6>[drm] add ip block number 4 <si_dma>
<6>[drm] add ip block number 5 <si_dpm>
<6>[drm] add ip block number 6 <dce_v6_0>
<6>[drm] add ip block number 7 <uvd_v3_1>
<6>[drm] BIOS signature incorrect 26 cc
drmn0: Fetched VBIOS from ROM BAR
lkpi_iicbb0: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb0: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb0
iicbus5: <Philips I2C bus> on iicbb0 addr 0x0
iic5: <I2C generic I/O> on iicbus5
lkpi_iicbb1: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb1: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb1
iicbus6: <Philips I2C bus> on iicbb1 addr 0x0
iic6: <I2C generic I/O> on iicbus6
lkpi_iicbb2: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb2: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb2
iicbus7: <Philips I2C bus> on iicbb2 addr 0x0
iic7: <I2C generic I/O> on iicbus7
lkpi_iicbb3: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb3: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb3
iicbus8: <Philips I2C bus> on iicbb3 addr 0x0
iic8: <I2C generic I/O> on iicbus8
lkpi_iicbb4: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb4: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb4
iicbus9: <Philips I2C bus> on iicbb4 addr 0x0
iic9: <I2C generic I/O> on iicbus9
lkpi_iicbb5: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb5: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb5
iicbus10: <Philips I2C bus> on iicbb5 addr 0x0
iic10: <I2C generic I/O> on iicbus10
lkpi_iicbb6: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb6: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb6
iicbus11: <Philips I2C bus> on iicbb6 addr 0x0
iic11: <I2C generic I/O> on iicbus11
lkpi_iicbb7: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb7: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb7
iicbus12: <Philips I2C bus> on iicbb7 addr 0x0
iic12: <I2C generic I/O> on iicbus12
<6>[drm] vm size is 512 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
drmn0: successfully loaded firmware image 'amdgpu/verde_mc.bin'
drmn0: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
drmn0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
<6>[drm] Detected VRAM RAM=1024M, BAR=256M
<6>[drm] RAM width 128bits GDDR5
[drm ERROR :amdgpu_ttm_init] failed initializing buffer object driver(-12).
[drm ERROR :amdgpu_device_ip_init] sw_init of IP block <gmc_v6_0> failed -12
drmn0: amdgpu_device_ip_init failed
drmn0: Fatal error during GPU init
drmn0: amdgpu: finishing device.
iic5: detached
iicbus5: detached
iicbb0: detached
lkpi_iicbb0: detached
fatal kernel trap:
exception = 0x300 (data storage interrupt)
virtual address = 0x68
dsisr = 0x40000000
srr0 = 0xc0080001126c7888 (0x8000110097888)
srr1 = 0x9000000000009033
current msr = 0x9000000000009033
lr = 0xc00800011217d1d8 (0x800010fb4d1d8)
frame = 0xc00800011164f6b0
curthread = 0xc0080000e749a140
pid = 1742, comm = kldload
panic: data storage interrupt trap
cpuid = 13
time = 1756665681
KDB: stack backtrace:
#0 0xc000000002f5f784 at kdb_backtrace+0x84
#1 0xc000000002eee468 at vpanic+0x1b8
#2 0xc000000002eee290 at panic+0x40
#3 0xc0000000034c3a18 at trap+0x308
#4 0xc0000000034b7134 at powerpc_interrupt+0x1b4
Uptime: 25m13s
Dumping 116 out of 130727 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
So i thought this would be pretty straight forward: pull out the kernel debugger and find the issue, however the dump seems to be corrupted:
root@blackbird:/var/crash # kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
Failed to open vmcore: invalid corefile
(kgdb)
panic: data storage interrupt trap
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
Failed to open vmcore: invalid corefile
(kgdb) No stack.
(kgdb) (kgdb) Python Exception <class 'AttributeError'>: 'NoneType' object has no attribute 'switch'
Error occurred in Python: 'NoneType' object has no attribute 'switch'
(kgdb)
------------------------------------------------------------------------
ps -axlww
ps: invalid corefile
------------------------------------------------------------------------
vmstat -s
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -m
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -z
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -i
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
pstat -T
pstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
pstat -s
pstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
iostat
iostat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
ipcs -a
ipcs: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
ipcs -T
ipcs: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
netstat -s
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -m
netstat: kvm not available: invalid corefile
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -anA
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -aL
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
fstat
fstat: kvm_openfiles(): invalid corefile
fstat: procstat_open()
------------------------------------------------------------------------
dmesg
dmesg: invalid corefile
------------------------------------------------------------------------
kernel config
options CONFIG_AUTOGENERATED
ident GENERIC
So I'm pretty much out of options. And I don't feel like debugging the kernel debugger...
If anybody has some suggestions it'll be appreciated
HD7750
hdac0: <ATI (0xaab0) HDA Controller> mem 0x80060000-0x80063fff irq 1044473 at device 0.1 numa-domain 0 on pci1
hdac0: hdac_get_capabilities: Invalid corb size (0)
device_attach: hdac0 attach returned 6
<6>[drm] amdgpu kernel modesetting enabled.
drmn0: <drmn> numa-domain 0 on vgapci0
vgapci0: child drmn0 requested pci_enable_io
vgapci0: child drmn0 requested pci_enable_io
<6>[drm] initializing kernel modesetting (VERDE 0x1002:0x683F 0x1043:0x0459 0x00).
drmn0: Trusted Memory Zone (TMZ) feature not supported
<6>[drm] register mmio base: 0x00000000
<6>[drm] register mmio size: 262144
<6>[drm] add ip block number 0 <si_common>
<6>[drm] add ip block number 1 <gmc_v6_0>
<6>[drm] add ip block number 2 <si_ih>
<6>[drm] add ip block number 3 <gfx_v6_0>
<6>[drm] add ip block number 4 <si_dma>
<6>[drm] add ip block number 5 <si_dpm>
<6>[drm] add ip block number 6 <dce_v6_0>
<6>[drm] add ip block number 7 <uvd_v3_1>
<6>[drm] BIOS signature incorrect 26 cc
drmn0: Fetched VBIOS from ROM BAR
lkpi_iicbb0: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb0: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb0
iicbus5: <Philips I2C bus> on iicbb0 addr 0x0
iic5: <I2C generic I/O> on iicbus5
lkpi_iicbb1: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb1: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb1
iicbus6: <Philips I2C bus> on iicbb1 addr 0x0
iic6: <I2C generic I/O> on iicbus6
lkpi_iicbb2: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb2: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb2
iicbus7: <Philips I2C bus> on iicbb2 addr 0x0
iic7: <I2C generic I/O> on iicbus7
lkpi_iicbb3: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb3: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb3
iicbus8: <Philips I2C bus> on iicbb3 addr 0x0
iic8: <I2C generic I/O> on iicbus8
lkpi_iicbb4: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb4: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb4
iicbus9: <Philips I2C bus> on iicbb4 addr 0x0
iic9: <I2C generic I/O> on iicbus9
lkpi_iicbb5: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb5: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb5
iicbus10: <Philips I2C bus> on iicbb5 addr 0x0
iic10: <I2C generic I/O> on iicbus10
lkpi_iicbb6: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb6: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb6
iicbus11: <Philips I2C bus> on iicbb6 addr 0x0
iic11: <I2C generic I/O> on iicbus11
lkpi_iicbb7: <LinuxKPI I2CBB> numa-domain 0 on drmn0
iicbb7: <I2C bit-banging driver> numa-domain 0 on lkpi_iicbb7
iicbus12: <Philips I2C bus> on iicbb7 addr 0x0
iic12: <I2C generic I/O> on iicbus12
<6>[drm] vm size is 512 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
drmn0: successfully loaded firmware image 'amdgpu/verde_mc.bin'
drmn0: VRAM: 1024M 0x000000F400000000 - 0x000000F43FFFFFFF (1024M used)
drmn0: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF
<6>[drm] Detected VRAM RAM=1024M, BAR=256M
<6>[drm] RAM width 128bits GDDR5
[drm ERROR :amdgpu_ttm_init] failed initializing buffer object driver(-12).
[drm ERROR :amdgpu_device_ip_init] sw_init of IP block <gmc_v6_0> failed -12
drmn0: amdgpu_device_ip_init failed
drmn0: Fatal error during GPU init
drmn0: amdgpu: finishing device.
iic5: detached
iicbus5: detached
iicbb0: detached
lkpi_iicbb0: detached
fatal kernel trap:
exception = 0x300 (data storage interrupt)
virtual address = 0x68
dsisr = 0x40000000
srr0 = 0xc0080001126c7888 (0x8000110097888)
srr1 = 0x9000000000009033
current msr = 0x9000000000009033
lr = 0xc00800011217d1d8 (0x800010fb4d1d8)
frame = 0xc00800011164f6b0
curthread = 0xc0080000e749a140
pid = 1742, comm = kldload
panic: data storage interrupt trap
cpuid = 13
time = 1756665681
KDB: stack backtrace:
#0 0xc000000002f5f784 at kdb_backtrace+0x84
#1 0xc000000002eee468 at vpanic+0x1b8
#2 0xc000000002eee290 at panic+0x40
#3 0xc0000000034c3a18 at trap+0x308
#4 0xc0000000034b7134 at powerpc_interrupt+0x1b4
Uptime: 25m13s
Dumping 116 out of 130727 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete
So i thought this would be pretty straight forward: pull out the kernel debugger and find the issue, however the dump seems to be corrupted:
root@blackbird:/var/crash # kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb (GDB) 13.2 [GDB v13.2 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "powerpc64le-portbld-freebsd14.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
Failed to open vmcore: invalid corefile
(kgdb)
panic: data storage interrupt trap
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
Failed to open vmcore: invalid corefile
(kgdb) No stack.
(kgdb) (kgdb) Python Exception <class 'AttributeError'>: 'NoneType' object has no attribute 'switch'
Error occurred in Python: 'NoneType' object has no attribute 'switch'
(kgdb)
------------------------------------------------------------------------
ps -axlww
ps: invalid corefile
------------------------------------------------------------------------
vmstat -s
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -m
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -z
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
vmstat -i
vmstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
pstat -T
pstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
pstat -s
pstat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
iostat
iostat: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
ipcs -a
ipcs: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
ipcs -T
ipcs: kvm_openfiles: invalid corefile
------------------------------------------------------------------------
netstat -s
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -m
netstat: kvm not available: invalid corefile
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -anA
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
netstat -aL
netstat: kvm not available: invalid corefile
------------------------------------------------------------------------
fstat
fstat: kvm_openfiles(): invalid corefile
fstat: procstat_open()
------------------------------------------------------------------------
dmesg
dmesg: invalid corefile
------------------------------------------------------------------------
kernel config
options CONFIG_AUTOGENERATED
ident GENERIC
So I'm pretty much out of options. And I don't feel like debugging the kernel debugger...
If anybody has some suggestions it'll be appreciated