Vulkan Driver Only Loaded on Some GPUs

I am experimenting with local AI models on FreeBSD, and I must say so far it has been a joy!

There is but one key issue. My test server has four Radeon MI50 GPUs with 32GB, all identical models and all reporting the same BIOS version and DeviceID.

I'm running 15.0-RELEASE with AMDGPU from drm-kmod which attaches just fine and I have four /dev/dri/cardX and /dev/dri/renderX devices. I can see the cards in pciconf. However, when I run any vulkan application only two devices have drivers attached.

I tried updating to drm-latest-kmod, but then none of the GPUs get a vulkan driver attached, and I am stuck with llvm-pipe.

I have tried running Fedora Server on the hardware and that does correctly provide four vulkan devices, so the hardware seems fine. I see that a similar problem have plagued NVidia in the past, but I have not found anything to correct the issue.

Any thoughts on how I might coerce vulkan into loading drivers for all the cards will be greatly appreciated.

Bash:
$ MESA_VK_DEVICE_SELECT=list vulkaninfo
WARNING: [Loader Message] Code 0 : Path to given binary /usr/local/lib/libvulkan_intel.so was found to differ from OS loaded path /usr/local/lib/l
ibvulkan_intel-devel.so
WARNING: [Loader Message] Code 0 : Path to given binary /usr/local/lib/libvulkan_radeon.so was found to differ from OS loaded path /usr/local/lib/
libvulkan_radeon-devel.so
WARNING: [Loader Message] Code 0 : Path to given binary /usr/local/lib/libvulkan_intel_hasvk.so was found to differ from OS loaded path /usr/local
/lib/libvulkan_intel_hasvk-devel.so
'DISPLAY' environment variable not set... skipping surface info
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
selectable devices:
  GPU 0: 1002:66a0 "AMD Radeon Graphics (RADV VEGA20)" discrete GPU 0000:83:00.0
  GPU 1: 1002:66a0 "AMD Radeon Graphics (RADV VEGA20)" discrete GPU 0000:c6:00.0
  GPU 2: 10005:0 "llvmpipe (LLVM 19.1.7, 256 bits)" CPU 0000:00:00.0

Bash:
$ pciconf -l -vvv |grep -B 4  VGA
vgapci0@pci0:198:0:0:   class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x66a0 subvendor=0x1002 subdevice=0x081e
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Vega 20 [Radeon Pro/Radeon Instinct]'
    class      = display
    subclass   = VGA
--
vgapci1@pci0:200:0:0:   class=0x030000 rev=0x41 hdr=0x00 vendor=0x1a03 device=0x2000 subvendor=0x1a03 subdevice=0x2000
    vendor     = 'ASPEED Technology, Inc.'
    device     = 'ASPEED Graphics Family'
    class      = display
    subclass   = VGA
--
vgapci2@pci0:131:0:0:   class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x66a0 subvendor=0x1002 subdevice=0x081e
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Vega 20 [Radeon Pro/Radeon Instinct]'
    class      = display
    subclass   = VGA
--
vgapci3@pci0:67:0:0:    class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x66a0 subvendor=0x1002 subdevice=0x081e
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Vega 20 [Radeon Pro/Radeon Instinct]'
    class      = display
    subclass   = VGA
--
vgapci4@pci0:3:0:0:     class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x66a0 subvendor=0x1002 subdevice=0x081e
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'Vega 20 [Radeon Pro/Radeon Instinct]'
    class      = display
    subclass   = VGA
 
I am very interested in your setup as I will have the opportunity to experiment with AMD MI50 cards and LLMs (llama in my case) as well.
Could you detail if you solved the issue, and in general, what you installed and what you had to change or configure in order to get vulkan working on those GPU on FreeBSD ?
 
have you installed gpu-firmware-amd-kmod ?
P.s. i have volta gpu and i always run into the issue if i use latest drm-kmod`s.
Is your krenel modules loaded if yo u check kldstat ?
P.p.s. how does biger llm modules loads into the multiple gpu with pcie lanes ? maybe your model only requires 2 models ? maybe some of them stay in RAM ?
Did you tried to load model to fit into 1 gpu ?
Its finicky with llm`s and freebsd... my one with ollama and nvidia crashes and does not matter if its 8b or 30b model.
 
I am very interested in your setup as I will have the opportunity to experiment with AMD MI50 cards and LLMs (llama in my case) as well.
Could you detail if you solved the issue, and in general, what you installed and what you had to change or configure in order to get vulkan working on those GPU on FreeBSD ?
I'm running stock 15-RELEASE, and the amdgpu drivers with drm-kmod and all the vulkan packages.

llama-cpp runs from pkg using the vulkan backend. It runs well. I have compiled stable-diffusion.cpp from source and using it to run a number of DiT models works well too. drm-latest-kmod fails to provide any vulkan interfaces.

ROCm can be made to work via the linux layers, but I have no tried it. My interest is not really in LLMs, so I have limited information.
have you installed gpu-firmware-amd-kmod ?
P.s. i have volta gpu and i always run into the issue if i use latest drm-kmod`s.
Is your krenel modules loaded if yo u check kldstat ?
P.p.s. how does biger llm modules loads into the multiple gpu with pcie lanes ? maybe your model only requires 2 models ? maybe some of them stay in RAM ?
Did you tried to load model to fit into 1 gpu ?
Its finicky with llm`s and freebsd... my one with ollama and nvidia crashes and does not matter if its 8b or 30b model.
Thanks for your suggestions.

I have installed the gpu-firmware-amd-kmod. Two of the four cards work well, so I don't think any drivers are missing.

Kldstat shows amdgpu and /dev/dri has entries for card0, card1, card2 and card3, so the hardware is recognized. Interestingly sysctl's only exist for hw.dri.0 and hw.dri.1, for the other two I get an error that a leaf cannot be re-used. vulkaninfo also shows only two out of the four devices.

I have tried running glm-4.7 using llama-cpp and that seems to run inference across both the available cards, it is the only LLM I have tried. The DiT models, Flux, QwenImage, ZTurbo, WAN2.2, etc. all seem to run on a single GPU only and backup directly to system RAM. I have not tried ollama
 
Thanks for the suggestion. I tried adapting the vk_swiftshader_icd.json to point to /usr/local/lib/libvulkan_radeon.so, but alas it makes not difference.
 
Back
Top