OpenCL with AMD Radeon RX580 segfaults

fhajji

New Member


Messages: 5

#1
Hello,

I'm trying to run some OpenCL programs but I keep getting segfaults. This is on FreeBSD 11.2,
using an AMD Radeon RX580 with the amdgpu driver and drm-next-kmod.

I have no problems with this GPU running Xorg and OpenGL programs. But OpenCL is kind
of broken, or works only partially. I have no idea what's wrong. Here are some examples:

Code:
$ clinfo
Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.1.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     Radeon RX 580 Series (POLARIS10, DRM 3.8.0, 11.2-BETA2, LLVM 6.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 18.1.3
  Driver Version                                  18.1.3
  Device OpenCL C Version                         OpenCL C 1.1
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               36
  Max clock frequency                             1430MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                
    char                                                16 / 16     
    short                                                8 / 8      
    int                                                  4 / 4      
    long                                                 2 / 2      
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4      
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              17091704832 (15.92GiB)
  Error Correction support                        No
  Max memory allocation                           11964193382 (11.14GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        2147483647 (2GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                               
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                         
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.8.0, 11.2-BETA2, LLVM 6.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.8.0, 11.2-BETA2, LLVM 6.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
    NOTE:    your OpenCL library declares to support OpenCL 2.2,
        but it seems to support up to OpenCL 2.1 only.
Segmentation fault
With benchmarks/clpeak:

Code:
$ clpeak

Platform: Clover
  Device: Radeon RX 580 Series (POLARIS10, DRM 3.8.0, 11.2-BETA2, LLVM 6.0.1)
    Driver version  : 18.1.3 (FreeBSD)
    Compute units   : 36
    Clock frequency : 1430 MHz

    Global memory bandwidth (GBPS)
      float   : 216.07
      float2  : 220.49
      float4  : 223.63
      float8  : 214.87
      float16 : 124.73

    Single-precision compute (GFLOPS)
      float   : 6273.42
      float2  : 6324.87
      float4  : 6348.79
      float8  : 6309.87
      float16 : 6249.02

    half-precision compute (GFLOPS)
      half   : 6359.56
      half2  : 6319.32
      half4  : 6350.19
      half8  : 6327.64
      half16 : 6276.21

    Double-precision compute (GFLOPS)
      double   : 409.68
      double2  : 409.66
      double4  : 409.06
      double8  : 408.41
      double16 : 406.55

    Integer compute (GIOPS)
      int   : 1306.39
      int2  : 1304.44
      int4  : 1306.26
      int8  : 1303.58
      int16 : 1302.69

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 5.18
      enqueueReadBuffer          : 5.59
      enqueueMapBuffer(for read) : 9686.44
        memcpy from mapped ptr   : 5.93
      enqueueUnmap(after write)  : 9552.86
        memcpy to mapped ptr     : 5.17

    Kernel launch latency : 167.84 us

Segmentation fault
Or simple programs like https://github.com/boostorg/compute/blob/master/example/list_devices.cpp

Code:
$ ./list_devices
Platform 'Clover'
  GPU Device: Radeon RX 580 Series (POLARIS10, DRM 3.8.0, 11.2-BETA2, LLVM 6.0.1)
Segmentation fault
Or the GPU-based variant of net-p2p/xmrig from https://github.com/xmrig/xmrig-amd

Code:
$ ./xmrig-amd --config=config.json
* VERSIONS     XMRig/2.7.3-beta libuv/1.22.0 OpenCL/2.0 clang/6.0.0
* CPU          AMD Ryzen Threadripper 1950X 16-Core Processor  x64 AES
* ALGO         cryptonight, donate=5%
* POOL #1      pool.supportxmr.com:5555 variant 1
* POOL #2      pool.monero.hashvault.pro:3333 variant 1
* COMMANDS     hashrate, pause, resume
[2018-07-20 18:23:52] compiling code and initializing GPUs. This will take a while...
[2018-07-20 18:23:52] No AMD OpenCL platform found. Possible driver issues or wrong vendor driver.
[2018-07-20 18:23:52] Selected OpenCL platform index -1 doesn't exist.
[2018-07-20 18:23:52] Failed to start threads
Segmentation fault
Not sure how to interpret this.

Installed ports:

clinfo-2.1.16.01.12
clpeak-1.0g20170524
drm-next-kmod-4.11.g20180619_1
gpu-firmware-kmod-g20180319_1
libdrm-2.4.92,1
ocl-icd-2.2.12
opencl-2.2
xf86-video-amdgpu-1.3.0_1

Loaded modules:

Code:
$ kldstat
Id Refs Address            Size     Name
1  109 0xffffffff80200000 2036448  kernel
2    1 0xffffffff82239000 af98     aesni.ko
3    1 0xffffffff82244000 1e0d8    geom_eli.ko
4    1 0xffffffff82266000 381080   zfs.ko
5    2 0xffffffff825e8000 a380     opensolaris.ko
6    1 0xffffffff825f3000 15da0    fuse.ko
7    1 0xffffffff82821000 155a50   amdgpu.ko
8    1 0xffffffff82977000 714f0    drm.ko
9    4 0xffffffff829e9000 edc8     linuxkpi.ko
10    3 0xffffffff829f8000 d470     linuxkpi_gplv2.ko
11    2 0xffffffff82a06000 6b8      debugfs.ko
12    1 0xffffffff82a07000 8148     amdgpu_polaris10_mc_bin.ko
13    1 0xffffffff82a10000 4400     amdgpu_polaris10_pfp_bin.ko
14    1 0xffffffff82a15000 4400     amdgpu_polaris10_me_bin.ko
15    1 0xffffffff82a1a000 2400     amdgpu_polaris10_ce_bin.ko
16    1 0xffffffff82a1d000 5f30     amdgpu_polaris10_rlc_bin.ko
17    1 0xffffffff82a23000 40400    amdgpu_polaris10_mec_bin.ko
18    1 0xffffffff82a64000 40400    amdgpu_polaris10_mec2_bin.ko
19    1 0xffffffff82aa5000 3318     amdgpu_polaris10_sdma_bin.ko
20    1 0xffffffff82aa9000 3320     amdgpu_polaris10_sdma1_bin.ko
21    1 0xffffffff82aad000 5bc00    amdgpu_polaris10_uvd_bin.ko
22    1 0xffffffff82b09000 28d20    amdgpu_polaris10_vce_bin.ko
23    1 0xffffffff82b32000 1fe18    amdgpu_polaris10_smc_bin.ko
24    1 0xffffffff82b52000 3698     ng_ubt.ko
25    5 0xffffffff82b56000 9a20     netgraph.ko
26    1 0xffffffff82b60000 8e78     ng_hci.ko
27    3 0xffffffff82b69000 95c      ng_bluetooth.ko
28    1 0xffffffff82b6a000 2328     ums.ko
29    1 0xffffffff82b6d000 1780     uhid.ko
30    1 0xffffffff82b6f000 bc0e     ng_l2cap.ko
31    1 0xffffffff82b7b000 176a8    ng_btsocket.ko
32    1 0xffffffff82b93000 1d40     ng_socket.ko
33    1 0xffffffff82b95000 1070     cpuctl.ko
34    1 0xffffffff82b97000 32d048   vmm.ko
35    1 0xffffffff82ec5000 a54      nmdm.ko
36    1 0xffffffff82ec6000 5fb8     if_bridge.ko
37    1 0xffffffff82ecc000 3b78     bridgestp.ko
38    1 0xffffffff82ed0000 24a0     if_tap.ko
39    1 0xffffffff82ed3000 11a0     amdtemp.ko
40    1 0xffffffff82ed5000 628      amdsmn.ko
Interesting dmesg output:

Code:
pid 11541 (xmrig-amd), uid 1001: exited on signal 11
pid 11545 (clinfo), uid 1001: exited on signal 11
pid 11551 (xmrig-amd), uid 0: exited on signal 11
pid 11550 (sudo), uid 0: exited on signal 11
pid 11585 (clinfo), uid 1001: exited on signal 11
drmn0: failed to get a new IB (-22)
[drm:amdgpu_gem_va_update_vm] Couldn't update BO_VA (-22)
pid 12492 (clpeak), uid 1001: exited on signal 11
pid 18970 (clinfo), uid 1001: exited on signal 11
pid 22560 (clinfo), uid 1001: exited on signal 11
drmn0: failed to get a new IB (-22)
[drm:amdgpu_gem_va_update_vm] Couldn't update BO_VA (-22)
pid 23240 (clpeak), uid 1001: exited on signal 11
pid 23287 (list_devices), uid 1001: exited on signal 11
Any help appreciated.

Thanks.
 
OP
OP
fhajji

fhajji

New Member


Messages: 5

#2
I could really need some help here. After upgrading ports tree, drm-next-kmod, xorg, mesa etc, clinfo doesn't even show the GPU anymore. X.org is still working, but no luck with OpenCL at all.

Anyone here successfully using OpenCL on FreeBSD with the amdgpu driver?

Thanks.

Edit: it may be related to this issue. I'm also getting error messages, e.g. when starting blender (but blender still runs):

Code:
amdgpu_device_initialize: amdgpu_get_auth (1) failed (-1)
amdgpu: amdgpu_device_initialize failed.
do_winsys_init: DRM version is 3.10.0 but this driver is only compatible with 2.12.0 (kernel 3.2) or later.
 
Last edited:

roccobaroccoSC

New Member

Thanks: 2
Messages: 8

#3
Hello fhajji,

I also experience the same segmentation faults. Even when calling Xorg -configure it dumps core.
However, OpenCL seems to be working in general. For example, I am able to run GPU code in Java using java/aparapi. There are a number of examples on Maven central in "aparapi-examples", just you need to use your installed aparapi and not the one downloaded by Maven central (I packed the installed JAR in a custom artifact in my local repo).

I assume that the segmentation faults happen mostly when querying the capabilities of the OpenCL devices and listing them. I recall something about improperly reported OpenCL version. In general, RX 580 should support OpenCL 1.0, 1.1 and 1.2 without any problems, but possibly due to incomplete development (clover/gallium compute are under active development at the moment) they report higher version but not all features are actually implemented.

My advice is just to try it out and see what features work for you.
For most purposes, OpenCL 1.0 is just enough. I was able to demonstrate the RX 580 peak performance using the program benchmarks/clpeak
 
OP
OP
fhajji

fhajji

New Member


Messages: 5

#4
Thank you for the feedback.

As said, right now, OpenCL is completely broken for me. It used to work partially (see above), but after upgrading, devel/clinfo doesn't even detect the card at all:

Code:
$ sudo clinfo
amdgpu_device_initialize: amdgpu_get_auth (1) failed (-1)
amdgpu: amdgpu_device_initialize failed.
do_winsys_init: DRM version is 3.10.0 but this driver is only compatible with 2.12.0 (kernel 3.2) or later.
Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.1.5
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 0

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Clover
  clCreateContext(NULL, ...) [default]            No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No devices found in platform

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
        NOTE:   your OpenCL library declares to support OpenCL 2.2,
                but it seems to support up to OpenCL 2.1 only.
Some weird permissions problem perhaps? I'm a member of the video group, nothing changed here, except for updating graphics/drm-next-kmod etc. Oh, wait, isn't that supposed to run only on CURRENT? I'm on STABLE right now... but graphics/drm-stable-kmod uses an older Linux kernel API (4.11 instead of 4.15). I'm confused now. Maybe I should try downgrading to graphics/drm-stable-kmod and see how it goes. Will update this entry as soon as I get a chance to try it.

Thanks again.

Edit: Permissions problem fixed, patch in PR 230967. With that patch, its again like in the first post:

Code:
$ clinfo
Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.1.5
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     Radeon RX 580 Series (POLARIS10, DRM 3.10.0, 11.2-STABLE, LLVM 6.0.1)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 18.1.5
  Driver Version                                  18.1.5
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               36
  Max clock frequency                             1430MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              8589934592 (8GiB)
  Error Correction support                        No
  Max memory allocation                           6441053184 (5.999GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       32768 bits (4096 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        2147483647 (2GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_byte_addressable_store cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.10.0, 11.2-STABLE, LLVM 6.0.1)
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   Radeon RX 580 Series (POLARIS10, DRM 3.10.0, 11.2-STABLE, LLVM 6.0.1)

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2
    NOTE:    your OpenCL library declares to support OpenCL 2.2,
        but it seems to support up to OpenCL 2.1 only.
Segmentation fault
And

Code:
$ clpeak

Platform: Clover
  Device: Radeon RX 580 Series (POLARIS10, DRM 3.10.0, 11.2-STABLE, LLVM 6.0.1)
    Driver version  : 18.1.5 (FreeBSD)
    Compute units   : 36
    Clock frequency : 1430 MHz

    Global memory bandwidth (GBPS)
      float   : 203.36
      float2  : 206.76
      float4  : 211.86
      float8  : 203.26
      float16 : 120.09

    Single-precision compute (GFLOPS)
      float   : 6327.29
      float2  : 6390.31
      float4  : 6345.46
      float8  : 6271.07
      float16 : 6244.71

    half-precision compute (GFLOPS)
      half   : 6356.17
      half2  : 6354.09
      half4  : 6343.73
      half8  : 6320.72
      half16 : 6271.89

    Double-precision compute (GFLOPS)
      double   : 409.84
      double2  : 409.45
      double4  : 409.21
      double8  : 408.40
      double16 : 406.72

    Integer compute (GIOPS)
      int   : 1306.14
      int2  : 1306.08
      int4  : 1305.88
      int8  : 1303.67
      int16 : 1304.30

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 5.62
      enqueueReadBuffer          : 6.67
      enqueueMapBuffer(for read) : 4885.09
        memcpy from mapped ptr   : 6.68
      enqueueUnmap(after write)  : 4834.50
        memcpy to mapped ptr     : 5.84

    Kernel launch latency : 172.84 us

Segmentation fault
 
Top