Hello,
I am trying to get OpenCL running on my Freebsd 15 PC. I have 2 AMD MI50 GPU (vega20, radeon pro VII bios). These work great to run llama.cpp with the vulkan backend for general LLM, so I know the hardware works.
I would like to also run some OpenCL stuff on it. This PC is only used headless, I do not use the video output.
I installed clover and the opencl-headers, compiled a few examples from https://github.com/rsnemmen/OpenCL-examples, but any that I try to run just hangs (the process becomes unkillable). FBSD keeps running normally but those processes are just stuck.
/var/log/messages shows some errors when starting the test opencl program:
Any ideas ? also pointers to what should be happening here, where to look to debug this ? I once (maybe 18years ago haha) wrote kernel modules for virtualized sound card and more recently (10 years ago
) worked on some opengl drivers for MacOS so I am not a total stranger to kernel debugging but it would help to have some idea what is going on where to look, how deep the rabbit hole goes etc. before I embark on of this.
thanks for any and all help, cheers
I am trying to get OpenCL running on my Freebsd 15 PC. I have 2 AMD MI50 GPU (vega20, radeon pro VII bios). These work great to run llama.cpp with the vulkan backend for general LLM, so I know the hardware works.
I would like to also run some OpenCL stuff on it. This PC is only used headless, I do not use the video output.
I installed clover and the opencl-headers, compiled a few examples from https://github.com/rsnemmen/OpenCL-examples, but any that I try to run just hangs (the process becomes unkillable). FBSD keeps running normally but those processes are just stuck.
/var/log/messages shows some errors when starting the test opencl program:
Code:
Mar 20 23:27:31 bigboss kernel: [drm ERROR :amdgpu_job_timedout] ring comp_1.2.0 timeout, signaled seq=2, emitted seq=3
Mar 20 23:27:31 bigboss kernel: [drm ERROR :amdgpu_job_timedout] Process information: process pid 0 thread pid 0
Mar 20 23:27:31 bigboss kernel: drmn0: GPU reset begin!
Mar 20 23:27:31 bigboss kernel: [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
Mar 20 23:27:31 bigboss kernel: drmn0: BACO reset
Mar 20 23:27:33 bigboss kernel: drmn0: GPU reset succeeded, trying to resume
Mar 20 23:27:33 bigboss kernel: [drm] PCIE GART of 512M enabled.
Mar 20 23:27:33 bigboss kernel: [drm] PTB located at 0x0000008000000000
Mar 20 23:27:33 bigboss kernel: [drm] VRAM is lost due to GPU reset!
Mar 20 23:27:33 bigboss kernel: [drm] PSP is resuming...
Mar 20 23:27:33 bigboss kernel: [drm] reserve 0x400000 from 0x83fec00000 for PSP TMR
Mar 20 23:27:33 bigboss kernel: drmn0: HDCP: optional hdcp ta ucode is not available
Mar 20 23:27:33 bigboss kernel: drmn0: DTM: optional dtm ta ucode is not available
Mar 20 23:27:33 bigboss kernel: drmn0: RAP: optional rap ta ucode is not available
Mar 20 23:27:33 bigboss kernel: drmn0: SECUREDISPLAY: securedisplay ta ucode is not available
Mar 20 23:27:33 bigboss kernel: [drm] kiq ring mec 2 pipe 1 q 0
Mar 20 23:27:33 bigboss kernel: [drm] UVD and UVD ENC initialized successfully.
Mar 20 23:27:33 bigboss kernel: [drm] VCE initialized successfully.
Mar 20 23:27:33 bigboss kernel: drmn0: ring gfx uses VM inv eng 0 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring gfx_low uses VM inv eng 1 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring gfx_high uses VM inv eng 4 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.0.0 uses VM inv eng 5 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.1.0 uses VM inv eng 6 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.2.0 uses VM inv eng 7 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.3.0 uses VM inv eng 8 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.0.1 uses VM inv eng 9 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.1.1 uses VM inv eng 10 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.2.1 uses VM inv eng 11 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring comp_1.3.1 uses VM inv eng 12 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
Mar 20 23:27:33 bigboss kernel: drmn0: ring sdma0 uses VM inv eng 0 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring page0 uses VM inv eng 1 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring sdma1 uses VM inv eng 4 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring page1 uses VM inv eng 5 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_0 uses VM inv eng 6 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_1 uses VM inv eng 9 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_enc_1.0 uses VM inv eng 10 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring uvd_enc_1.1 uses VM inv eng 11 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring vce0 uses VM inv eng 12 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring vce1 uses VM inv eng 13 on hub 8
Mar 20 23:27:33 bigboss kernel: drmn0: ring vce2 uses VM inv eng 14 on hub 8
Mar 20 23:27:34 bigboss kernel: drmn0: recover vram bo from shadow start
Mar 20 23:27:34 bigboss kernel: drmn0: recover vram bo from shadow done
Mar 20 23:27:34 bigboss kernel: drmn0: GPU reset(1) succeeded!
Mar 20 23:28:34 bigboss kernel: [drm ERROR :amdgpu_job_timedout] ring comp_1.2.0 timeout, signaled seq=4, emitted seq=4
Mar 20 23:28:34 bigboss kernel: [drm ERROR :amdgpu_job_timedout] Process information: process pid 0 thread pid 0
Mar 20 23:28:34 bigboss kernel: drmn0: GPU reset begin!
Any ideas ? also pointers to what should be happening here, where to look to debug this ? I once (maybe 18years ago haha) wrote kernel modules for virtualized sound card and more recently (10 years ago
thanks for any and all help, cheers