Hi,
I have a Radeon 6750 graphics card and I am running 15.0-RELEASE. I could use some help with looking at a crash that happens reliably when llamacpp is stopped and I try to restart it:
I tried playing with the following which didn't seem to help at all:
The reset method never changed with these, so apparently the card decides it. llamacpp seemed to work without any issue for some time and now I have this behavior. I have no clue what changed. I notice it when I stop llama-server to change models, the service crashes, etc. I run llamacpp with vulkan. I have tried many different versions of llamacpp, including 8182 in ports and all have the same behavior. Only a reboot seems to help.
What am I missing?
I have a Radeon 6750 graphics card and I am running 15.0-RELEASE. I could use some help with looking at a crash that happens reliably when llamacpp is stopped and I try to restart it:
Code:
kernel: drmn0: [gfxhub] page fault (src_id:0 ring:40 vmid:1 pasid:32777, for process pid 102859 thread pid 102859)
kernel: drmn0: in page starting at address 0x00008001001f7000 from client 0x1b (UTCL2)
kernel: drmn0: GCVM_L2_PROTECTION_FAULT_STATUS:0x00140A50
kernel: drmn0: Faulty UTCL2 client ID: CPC (0x5)
kernel: drmn0: MORE_FAULTS: 0x0
kernel: drmn0: WALKER_ERROR: 0x0
kernel: drmn0: PERMISSION_FAULTS: 0x5
kernel: drmn0: MAPPING_ERROR: 0x0
kernel: drmn0: RW: 0x1
kernel: [drm ERROR :amdgpu_job_timedout] ring comp_1.1.0 timeout, signaled seq=5546, emitted seq=5548
kernel: [drm ERROR :amdgpu_job_timedout] Process information: process pid 102859 thread pid 102859
kernel: drmn0: GPU reset begin!
kernel: drmn0: MODE1 reset
kernel: drmn0: GPU mode1 reset
kernel: drmn0: GPU smu mode1 reset
kernel: hdac0: Unexpected unsolicited response from address 0: 00000000
syslogd: last message repeated 7 times
kernel: drmn0: GPU mode1 reset failed
kernel: drmn0: ASIC reset failed with error, -60 for drm dev, drmn0
kernel: drmn0: GPU reset succeeded, trying to resume
kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
kernel: [drm] VRAM is lost due to GPU reset!
kernel: [drm] PSP is resuming...
kernel: [drm ERROR :psp_hw_start] PSP create ring failed!
kernel: [drm ERROR :psp_resume] PSP resume failed
kernel: [drm ERROR :amdgpu_device_fw_loading] resume of IP block <psp> failed -60
kernel: drmn0: GPU reset(1) failed
kernel: drmn0: GPU reset end with ret = -60
kernel: [drm ERROR :amdgpu_job_timedout] GPU Recovery Failed: -60
kernel: [drm ERROR :amdgpu_job_timedout] ring comp_1.1.0 timeout, signaled seq=5548, emitted seq=5548
kernel: [drm ERROR :amdgpu_job_timedout] Process information: process pid 102859 thread pid 102859
kernel: drmn0: GPU reset begin!
Code:
hw.amdgpu.vm_fault_stop="1"
hw.amdgpu.lockup_timeout="10000,10000,10000,10000"
hw.amdgpu.bad_page_threshold="-1"
hw.amdgpu.reset_method="2"
hw.amdgpu.enforce_isolation="1"
hw.amdgpu.runpm="0"
hw.amdgpu.timeout_fatal_disable="1"
hw.amdgpu.sched_hw_submission="1"
What am I missing?