Hello,

I am trying to do some machine vision computation acceleration done by AMD GPUs using OpenCL via ROCm since this is what many machine vision frameworks use for AMD GPUs.

I spoke with the OpenCL group from their IRC channel on Freenode(Libera). They told me that I would need to install ROCm.
They also told me that I can use CUDA code on AMD GPUs thanks to AMD's HIP.
I couldn't find any way on how to install ROCm on FreeBSD 13.1.

From my understanding ROCm is used so that AMD's OpenCL can be used.
I know that ROCm is supported on Linux distros.

Also clinfo shows that the AMD Radeon RX 580 GPU is OpenCL 1.1 when the card is actually supported with OpenCL 2.1:

OpenCV supports only OpenCL 1.2 and not OpenCL 1.1 and my card is capable with OpenCL 2.1.

Do I really need ROCm to get OpenCL GPU acceleration working for OpenCV or dlib?

What are some options since I am using FreeBSD?

Here is the output of clinfo:
Code:
Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 21.3.8
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Portable Computing Language
  Platform Vendor                                 The pocl project
  Platform Version                                OpenCL 2.0 pocl 1.8  Unix, Release+Asserts, RELOC, LLVM 13.0.1, SLEEF, DISTRO, POCL_DEBUG
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_pocl_content_size
  Platform Extensions function suffix             POCL

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD Radeon RX 580 Series (POLARIS10, DRM 3.35.0, 13.1-RELEASE-p1, LLVM 13.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 21.3.8
  Device Numeric Version                          0x401000 (1.1.0)
  Driver Version                                  21.3.8
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               36
  Max clock frequency                             1100MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple (kernel)     64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              4294967296 (4GiB)
  Error Correction support                        No
  Max memory allocation                           3435973836 (3.2GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max number of constant args                     16
  Max constant buffer size                        67108864 (64MiB)
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    ILs with version                              (n/a)
  Built-in kernels with version                   (n/a)
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_extended_versioning
  Device Extensions with Version                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_extended_versioning                                       0x400000 (1.0.0)

Thanks for any info.
 
I've been having a look into this recently, and been having a look at the ROCM code. Though, i'm going to have to wait a week or two to borrow an AMD card from University to have a look into it any deeper. Tensorflow builds with OpenCL support but I have no hardware to test, or experiment with. Are you installing OpenCL from the ports or building your self?
 
I've been having a look into this recently, and been having a look at the ROCM code. Though, i'm going to have to wait a week or two to borrow an AMD card from University to have a look into it any deeper. Tensorflow builds with OpenCL support but I have no hardware to test, or experiment with. Are you installing OpenCL from the ports or building your self?

I installed OpenCL from ports pkg install devel/opencl, if it's better to build myself I would do it.

I'm a little confused since I read that OpenCL is by default installed by the GPU driver (mesa?) but I also read that I would need to install "clover" (OpenCL) from floating tutorials on the web, now the OpenCL group from libera IRC chat is telling me to install ROCm since this is the official AMD OpenCL implementation for GPU acceleration used by machine vision frameworks and said there is no way to run on it on FreeBSD. If you ever do start porting ROCm to FreeBSD, I would like to help and can test code if needed.

Here is were I read AMD GPU users who use their card for GPU acceleration for parallel computations and had to do some "patching" on their own since AMD ROCm dropped continuation support for all AMD GPUs in the GFX803 GPU architecture which are all in "Arctic Islands" and "Polaris" AMD GPU family using the AMD's "GCN 4" ISA:

Also read that the AMD's consumer GPUs which uses the GFX803 architecture are also used in AMD's professional GPU Cards which are used in data centers, such as the "Instinct MI6" and "Instinct MI8":


Thanks will look into it.
 
Back
Top