Is OpenCL possible with the current Nvidia driver?

inf3rno · Jan 5, 2021

Currently I have an Nvidia GTX 750 Ti video card I want to use for my little machine learning projects, and I see that CUDA does not work on FreeBSD. I am curious if I could use OpenCL https://www.freshports.org/devel/opencl with this Nvidia card, or that is out of the question too?

msplsh · Jan 5, 2021

devel/ocl-icd says it will work with a non-free ICD, and I've seen people get nvidia.icd from... wherever (presumably some nvidia binary driver package)... on linux, so perhaps the mechanism is the same. I couldn't find any specific examples of this working, though.

shkhln · Jan 5, 2021

Nvidia's OpenCL is implemented through CUDA.

inf3rno · Jan 5, 2021

shkhln said:
Nvidia's OpenCL is implemented through CUDA.

I guess that means not. Then I'll use virtualization or sell this Nvidia card and buy an AMD one. Using WebGL appears to be an alternative too, but I guess it is very slow.

shkhln · Jan 6, 2021

WebGL has nothing to do with general-purpose GPU computing. Vulkan could be used for that purpose in theory, but you'll have to implement everything from scratch yourself, which is probably not what you are looking for.

I'm kind of curious what kind of small project you have in mind. You obviously didn't bother to do any research on the topic, so there no reason for me to believe you actually need GPU anything.

inf3rno · Jan 6, 2021

shkhln said:
WebGL has nothing to do with general-purpose GPU computing. Vulkan could be used for that purpose in theory, but you'll have to implement everything from scratch yourself, which is probably not what you are looking for.

I'm kind of curious what kind of small project you have in mind. You obviously didn't bother to do any research on the topic, so there no reason for me to believe you actually need GPU anything.

Well Tensorflow can run on CPU too, I just have a spare GPU I haven't sold, so I thought I use that instead of stressing the CPU. It is a sequential pattern mining project, I want to compare normal algorithms like SPADE or PrefixSpan to machine learning.

Looks like you are not up to date either:

https://github.com/tensorflow/tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.

a6h · Jan 6, 2021

OpenCL is not GPU-only. it can run on single-core CPU, multicore CPU, FPGA, single-core, and multicore MCU (*).

(*) OpenCL on microcontrollers | multicore-conference.com

cederom · Feb 2, 2025

Due to constant problems with my RADEON RX580 I swapped it with NVIDIA GTX1060.. using binary driver version 550.. all problems are gone.. and I was wondering how to use OpenCL/CUDA with that card.. any hints welcome

Graphics/OpenCL Wiki (https://wiki.freebsd.org/Graphics/OpenCL) does not even mention nvidia..

cracauer@ · Feb 2, 2025

cederom said:
Due to constant problems with my RADEON RX580 I swapped it with NVIDIA GTX1060.. using binary driver version 550.. all problems are gone.. and I was wondering how to use OpenCL/CUDA with that card.. any hints welcome Graphics/OpenCL Wiki (https://wiki.freebsd.org/Graphics/OpenCL) does not even mention nvidia..

Some people have made CUDA work with Linux binaries in the Linuxulator.

cederom · Feb 2, 2025

Thanks cracauer@

I am a bit surprised to see ZERO native support from NVIDIA to FreeBSD in CUDA / OpenCL area. AMD supports OpenCL / Clover.

Looking at Nvidia forums (https://forums.developer.nvidia.com/t/cuda-on-freebsd/168747) there are requests for over 15 years with zero results :-(

Will take a look linuxlator, well.

T-Aoki · Feb 2, 2025

I myself don't use OpenCL nor CUDA directly, but

Code:

% pkg info -l x11/nvidia-driver | grep -n icd                                                                       
124:    /usr/local/share/vulkan/icd.d/nvidia_icd.json
% pkg info -l x11/linux-nvidia-libs | grep -n icd
2:    /compat/linux/etc/OpenCL/vendors/nvidia.icd
3:    /compat/linux/etc/vulkan/icd.d/nvidia_icd.json
%

could the above be of hints?

cederom · Feb 2, 2025

Allright so I need to play with NVIDIA/ICD stuff, thank you

T-Aoki · Feb 2, 2025

I'm currently running on 565.77 of the nvidia-driver by overriding version.
New beta branch of driver 570.86.16 turned out to require some work on ports (cannot find screen with simple overriding).

I'm planning to work on it once I can take enough time with updates for latest production branch of driver at the moment.

As Linux version of 570.86.16 has more changes/additions, but as I'm not actually using x11/linux-nvidia-libs myself, wouldn't be able to determine which files to be additionally install "to which directory", especially for new json files. So my next work wouldn't include this part, not as usual.

cederom · Feb 3, 2025

Yeah, I have Debian12 deboostraped and was trying to run some applications there on a binary nvidia-driver 550. It turns out Debian has all libraries built for 535 so nothing works anymore after installing nvidia packages. I had full 3D acceleration with AMDGPU. Also 535 driver from nvidia website does not build on FreeBSD 14

I am a bit impressed that FreeBSD has newer driver than Debian.. although this makes nvidia drivers self-incompatible.. maybe we should stick to the same versions among different systems to get things compatible

T-Aoki · Feb 3, 2025

cederom said:
Also 535 driver from nvidia website does not build on FreeBSD 14

Which minor version of 535? I've tried ones listed below before.

535.43.02
535.54.03
535.86.05
535.98
535.104.05
535.113.01
535.129.03
535.146.02
535.154.05

As seen in PR 282312, 550.127.05 of the driver contains security fix, so using 535 should be discouraged.

If you need 535 series of driver anyway, do you want graphics/nvidia-drm-[510|515|61]-kmod?
If so, as the official support for it started from 550 series of drivers, you need to obtain Austin's private distfiles corresponding with the version you want, and need to follow his procedure. See diff of distinfo part in commit 71e92b26bd43763a7b82208625e628f043858fa7.

If you don't need DRM part of the driver, you can override using something like below in your /etc/make.conf.

Code:

NVIDIA_OVERRIDE_VERSION= 535.146.02

.if ${.CURDIR:M/usr/ports/x11/nvidia-driver} && defined (NVIDIA_OVERRIDE_VERSION)
  DISTVERSION=    ${NVIDIA_OVERRIDE_VERSION}
  NO_CHECKSUM=    YES
.endif

.if ${.CURDIR:M/usr/ports/x11/linux-nvidia-libs} && defined (NVIDIA_OVERRIDE_VERSION)
  DISTVERSION=    ${NVIDIA_OVERRIDE_VERSION}
  NO_CHECKSUM=    YES
.endif

Unfortunately, overriding versions for graphics/nvidia-drm-[510|515|61]-kmod like above even if the same logic is in /etc/make.conf for it. You need to modify x11/nvidia-driver/Makefile.version to point to the wanted version, too. Otherwise it picks the wrong (pointed in the Makefile.version) version, thus, don't work. NO_CHECKSUM is still needed to override, though.

cederom said:
I am a bit impressed that FreeBSD has newer driver than Debian.

Maybe it would be because of the Debian's philosopy, dislike proprietary softwares.
But FreeBSD port of x11/nvidia-driver is already behind official nvidia production branch, too, as I work on it only when some work is needed for supporting new feature branch and/or beta branch of drivers, and no one others updates it for the latest version.

If you're OK for c7 on Linuxulator, you can use x11/linux-nvidia-libs for Linux apps running on it. Not sure it works for rl9.

Currently, turned out that latest beta 570.86.16 requires some work, but I cannot take enough time to investigate for now. And as I myself already don't using x11/linux-nvidia-libs and Linux version of driver seems to have more changes than FreeBSD version, possibly x11/linux-nvidia-libs does not work even after I file a PR.

shkhln · Feb 3, 2025

Code:

% nv-sglrun clpeak
shim init

Platform: NVIDIA CUDA
  Device: NVIDIA GeForce GTX 1660
    Driver version  : 550.127.05 (FreeBSD)
    Compute units   : 22
    Clock frequency : 1830 MHz

    Global memory bandwidth (GBPS)
      float   : 153.98
      float2  : 160.37
      float4  : 164.72
      float8  : 160.68
      float16 : 159.13

    Single-precision compute (GFLOPS)
      float   : 5580.16
      float2  : 5519.13
      float4  : 5503.80
      float8  : 5451.26
      float16 : 5396.63

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 176.49
      double2  : 176.27
      double4  : 175.76
      double8  : 175.04
      double16 : 173.29

    Integer compute (GIOPS)
      int   : 4860.36
      int2  : 4770.71
      int4  : 4774.76
      int8  : 4799.89
      int16 : 4777.37

    Integer compute Fast 24bit (GIOPS)
      int   : 4628.80
      int2  : 4761.41
      int4  : 4770.80
      int8  : 4744.14
      int16 : 4692.37

    Integer char (8bit) compute (GIOPS)
      char   : 4041.64
      char2  : 3993.92
      char4  : 4049.56
      char8  : 4063.33
      char16 : 3389.27

    Integer short (16bit) compute (GIOPS)
      short   : 4011.38
      short2  : 3832.12
      short4  : 3959.47
      short8  : 3474.62
      short16 : 3339.47

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 12.55
      enqueueReadBuffer               : 12.75
      enqueueWriteBuffer non-blocking : 11.90
      enqueueReadBuffer non-blocking  : 12.14
      enqueueMapBuffer(for read)      : 3.88
        memcpy from mapped ptr        : 19.61
      enqueueUnmap(after write)       : 13.13
        memcpy to mapped ptr          : 19.86

    Kernel launch latency : 6.49 us
% pkg which -p nv-sglrun
/usr/local/bin/nv-sglrun was installed by package libc6-shim-20240512
% pkg info | grep nvidia
linux-nvidia-libs-550.127.05   NVidia graphics libraries and programs (Linux version)
nvidia-driver-550.127.05.1402000 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-settings-535.146.02_1   Display Control Panel for X NVidia driver
nvidia-xconfig-525.116.04      Tool to manipulate X configuration files for the NVidia driver

cederom said:
AMD supports OpenCL / Clover.

Clover is entirely unusable: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19385.

cederom · Feb 3, 2025

Thanks shkhln!

I have requested for analog RL9 x11/linux-nvidia-libs as C7 is fading away but its great to know this stuff works.. would be nice to have few words on how to make it working in Handbook

284537 – graphics/linux-rl9: request for linux nvidia components and tools that work with FreeBSD nvidia-driver.

bugs.freebsd.org

kpedersen · Feb 3, 2025

inf3rno said:
https://github.com/tensorflow/tfjs

If you are happy with "poor-man's" compute (generally misusing OpenGL for generic compute use-cases), then for desktop OpenGL (i.e not WebGL/OpenGL|ES) you can make use of compute shaders:

https://learnopengl.com/Guest-Articles/2022/Compute-Shaders/Introduction

Apparently TensorFlow does have support for this.

That said, I have not tested it personally. I am still happy with OpenGL 2.1+ "ultra-poor man's" compute (passing general data via a sampler).

cederom · Feb 4, 2025

Allright, having x11/nvidia-driver (550) and building x11/linux-nvidia-libs with DEFAULT_VERSIONS+=linux=rl9 set in /etc/make.conf CUDA and OpenCL is now working with Linux-RL9 layer (default port still uses C7). Programs that are supposed to run OpenCL/CUDA applications whould be wrapped by nv-sglrun that comes with libc6-shim-20240512 package. For instance nvidia-smi that shows GPU capabilities (bundled with the driver) will not reveal CUDA until it is wrapped with nv-sglrun nvidia-smi (something like padsp to wrap OSS audio over PulseAudio). I can see now that computations can be shortened from 6 weeks to 4 days on relatively old NVIDIA GTX1060 GPU

!!BIG THANK YOU!!

Code:

# nv-sglrun nvidia-smi
shim init
Tue Feb  4 11:16:02 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05             Driver Version: 550.127.05     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:01:00.0  On |                  N/A |
| 29%   50C    P0             26W /  120W |    1220MiB /   6144MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

Code:

# nv-sglrun clpeak
shim init

Platform: NVIDIA CUDA
  Device: NVIDIA GeForce GTX 1060 6GB
    Driver version  : 550.127.05 (FreeBSD)
    Compute units   : 10
    Clock frequency : 1708 MHz

    Global memory bandwidth (GBPS)
      float   : 138.25
      float2  : 142.37
      float4  : 147.13
      float8  : 147.19
      float16 : 98.19

    Single-precision compute (GFLOPS)
      float   : 4108.05
      float2  : 4312.85
      float4  : 4283.72
      float8  : 4249.30
      float16 : 4226.22

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 140.94
      double2  : 140.18
      double4  : 139.88
      double8  : 138.68
      double16 : 139.25

    Integer compute (GIOPS)
      int   : 1441.66
      int2  : 1421.13
      int4  : 1431.67
      int8  : 1316.01
      int16 : 1299.19

    Integer compute Fast 24bit (GIOPS)
      int   : 1428.16
      int2  : 1394.27
      int4  : 1415.95
      int8  : 1411.34
      int16 : 1388.43

    Integer char (8bit) compute (GIOPS)
      char   : 3871.67
      char2  : 4115.16
      char4  : 4133.75
      char8  : 4086.24
      char16 : 4042.09

    Integer short (16bit) compute (GIOPS)
      short   : 3798.35
      short2  : 3925.07
      short4  : 4028.50
      short8  : 4101.96
      short16 : 4036.17

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 5.14
      enqueueReadBuffer               : 5.82
      enqueueWriteBuffer non-blocking : 4.99
      enqueueReadBuffer non-blocking  : 5.46
      enqueueMapBuffer(for read)      : 5.86
        memcpy from mapped ptr        : 4.16
      enqueueUnmap(after write)       : 5.93
        memcpy to mapped ptr          : 4.18

    Kernel launch latency : 7.88 us

T-Aoki · Feb 4, 2025

Some additional note. (I've noted in Comment 7 of PR 284537.)
USES= linux in ports Makefile make it depending on default Linuxulator (currently still c7) and can be overridden with rl9 by USES= linux:rl9 per-port basis. This is on the maintainer of the each port.

For system-wide configuration, you can specify DEFAUT_VERSIONS+= linux=rl9 in /etc/make.conf. This is on the admin of the each computer.

What can be specified is defined in /usr/ports/Mk/bsd.default-versions.mk.

Anyway, thanks to Comment 2 of PR 284537 by Dima Panov, turned out that no need to adapt x11/linux-nvidia-libs for rl9.

drsnx60 · Feb 4, 2025

HI

Yes I can get this linux shim to work as well, also VAAPI using libva-nvidia-driver seems to work with firefox.
this is done on a workstation with "pkg install" ed packages ,no local compile.

xxxxx@w680ace:~ $ nv-sglrun nvidia-smi
shim init
Tue Feb 4 23:30:36 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 On | N/A |
| 0% 36C P8 4W / 160W | 662MiB / 8188MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
ltu@w680ace:~ $ nv-sglrun clpeak
shim init

Platform: NVIDIA CUDA
Device: NVIDIA GeForce RTX 4060 Ti
Driver version : 550.127.05 (FreeBSD)
Compute units : 34
Clock frequency : 2685 MHz

Global memory bandwidth (GBPS)
float : 250.84
float2 : 258.27
float4 : 262.82
float8 : 265.79
float16 : 268.13

Single-precision compute (GFLOPS)
float : 23131.70
float2 : 23055.74
float4 : 22980.27
float8 : 22821.29
float16 : 22674.35

No half precision support! Skipped

Double-precision compute (GFLOPS)
double : 374.97
double2 : 374.69
double4 : 373.07
double8 : 357.72
double16 : 350.53

Integer compute (GIOPS)
int : 11718.43
int2 : 11761.22
int4 : 11730.86
int8 : 11276.48
int16 : 10451.94

Integer compute Fast 24bit (GIOPS)
int : 10994.94
int2 : 11013.73
int4 : 11461.28
int8 : 11458.52
int16 : 11211.60

Integer char (8bit) compute (GIOPS)
char : 10321.13
char2 : 10123.83
char4 : 9848.45
char8 : 8251.40
char16 : 7871.05

Integer short (16bit) compute (GIOPS)
short : 10279.86
short2 : 9728.00
short4 : 9967.57
short8 : 8892.56
short16 : 7616.51

Transfer bandwidth (GBPS)
enqueueWriteBuffer : 8.65
enqueueReadBuffer : 8.45
enqueueWriteBuffer non-blocking : 8.74
enqueueReadBuffer non-blocking : 9.83
enqueueMapBuffer(for read) : 11.74
memcpy from mapped ptr : 12.52
enqueueUnmap(after write) : 12.93
memcpy to mapped ptr : 12.82

Kernel launch latency : 3.84 us

$ pkg info | grep nvidia
libva-nvidia-driver-0.0.13 NVDEC-based backend for VAAPI
linux-nvidia-libs-550.127.05 NVidia graphics libraries and programs (Linux version)
nvidia-driver-550.127.05.1401000 NVidia graphics card binary drivers for hardware OpenGL rendering
nvidia-drm-61-kmod-550.127.05.1401000_1 NVIDIA DRM Kernel Module
nvidia-drm-kmod-550.127.05 NVIDIA DRM Kernel Module
nvidia-settings-535.146.02_1 Display Control Panel for X NVidia driver
nvidia-xconfig-525.116.04 Tool to manipulate X configuration files for the NVidia driver

Nilton Jose Rizzo · Feb 5, 2025

Hi everyone,
I've been trying this on my -current, and it works.
However, I can only compile OpenCL source code using Clang , and I need to use the nv-sglrun command to run the binaries.

It seems that Nvidia doesn't invest time or resources in creating a native FreeBSD port.

cederom · Feb 5, 2025

T-Aoki said:
Some additional note. (I've noted in Comment 7 of PR 284537.)
USES= linux in ports Makefile make it depending on default Linuxulator (currently still c7) and can be overridden with rl9 by USES= linux:rl9 per-port basis. This is on the maintainer of the each port.

For system-wide configuration, you can specify DEFAUT_VERSIONS+= linux=rl9 in /etc/make.conf. This is on the admin of the each computer.

What can be specified is defined in /usr/ports/Mk/bsd.default-versions.mk.

Anyway, thanks to Comment 2 of PR 284537 by Dima Panov, turned out that no need to adapt x11/linux-nvidia-libs for rl9.

I think the best way to offer this x11/linux-nvidia-libs in packages is to just add FLAVOR for c7 and rl9 so it builds both for c7 and rl9 and its still a single port easy to maitnain?

cederom · Feb 5, 2025

Nilton Jose Rizzo said:
Hi everyone,
I've been trying this on my -current, and it works.
However, I can only compile OpenCL source code using Clang , and I need to use the nv-sglrun command to run the binaries.

Yes exactly, nv-sglrun is a wrapper that enables application to access underlying hardware acceleration (something like padsp to pass audio from oss applications to pulseaudio).

Nilton Jose Rizzo said:
It seems that Nvidia doesn't invest time or resources in creating a native FreeBSD port.

I was initially looking for that too, no change for 15 years so not gonna happen

Turns out FreeBSD's Linuxlator is already so good it can do this sort of tricks

And by my initial experimentation it is very important that nvidia drivers and libs have the same version, so having all this stuff in ports keeps things in sync

T-Aoki · Feb 5, 2025

cederom said:
I think the best way to offer this x11/linux-nvidia-libs in packages is to just add FLAVOR for c7 and rl9 so it builds both for c7 and rl9 and its still a single port easy to maitnain?

It would be quite trivial if the pkg name r9-linux-nvidia-libs or linux-nvidia-libs-rl9 is OK, but not sure it's accepted by the group maintainers (x11@). I'm not a part of them.
Are there any ports doing such a thing? I assume none.

Note that, as far as I remember, ports having its name foo-c7-bar or foo-rl9-bar are the part of upstream distributions (Centos7 or Rocky Linux 9). And linux-nvidia-libs is not a part of them (as its upstream is nvidia itself).