NVIDIA GPU gets very hot while Xorg is not running

There is a server with NVIDIA RTX a2000 or NVIDIA T400 video cards installed (the problem appears on any NVIDIA GPU). If /etc/rc.conf is not set to start Xorg using xdm_enable="YES", then the video cards warm up to 50 degrees Celsius in idle after boot.

If you run startx (or start Xorg by starting XDM), then the temperature drops to 30C within 2-3 minutes.

The presence of Xorg on the server is explained by the fact that it is periodically used as a workstation or Linux GPU server (in dualboot). The problem is that I would like to remove from FreeBSD other graphical software (browsers, etc.) and Xorg server. But then the video card will get very hot.

Related lines in the /etc/rc.conf:
Code:
kld_list="nvidia-modeset"
xdm_enable="YES"

/usr/local/etc/X11/xorg.conf.d/nvidia.conf
Code:
Section "Device"
    BusID          "PCI:129:0:0"
    Driver         "nvidia"
    Identifier     "Device0"
    VendorName     "NVIDIA Corporation"
EndSection

Is this normal NVIDIA GPU temperature in FreeBSD? Most of the time the server runs under FreeBSD 13.2, which means the video card will always be hot.
 
That is normal. The power management for the GPU is in the driver. Without driver it runs at higher power. Unfortunately that means that you have to accept the Xorg security risks or live with the increased power.
 
Wouldn't loading the nvidia kld via loader.conf or rc.conf kld_list do the same such that you don't need to wait for X to load?
 
  • Thanks
Reactions: dnb
When the system does not have X loaded and the GPU is running at a high temp is nvidia,ko loaded? Run kldstat? If not `kldload nvidia`? I'd assume nvidia_modeset is loading it other wish I'm not sure how you're getting those temp readings other than nvidia-smi (which needs the normal driver). Otherwise I'd say that's a bug in the driver... You try asking on the nvidia devel forum? https://forums.developer.nvidia.com/c/gpu-graphics/freebsd-solaris/147
 
  • Thanks
Reactions: dnb
Interesting question. On my old desktop system I also see the power usage idle at ~30W but go down to ~11W when X is started. I'm assuming you are also measuring with nvidia-smi.

All in all this is normal, I don't think you need to worry about it being "too hot". I don't think there's a downside to not loading X and letting it idle like that, but if you are paranoid or want to save the power you could always blacklist the device in loader.conf to keep it powered off all the time assuming you truly will never use it.
 
When the system does not have X loaded and the GPU is running at a high temp is nvidia,ko loaded? Run kldstat? If not `kldload nvidia`? I'd assume nvidia_modeset is loading it other wish I'm not sure how you're getting those temp readings other than nvidia-smi (which needs the normal driver). Otherwise I'd say that's a bug in the driver... You try asking on the nvidia devel forum? https://forums.developer.nvidia.com/c/gpu-graphics/freebsd-solaris/147
I have carried out the experiments you are talking about. I turned off Xorg (and XDM) and tried loading nvidia.ko and nvidia-modeset.ko, monitoring the sensors through ipmitool sensors (or nvidia-smi). The temperature remained 44C before and after loading the modules. After turning on X, the temperature immediately dropped to 30C.

From the point of view of the nvidia-driver port recommendations, we should proceed as follows, namely, manipulate only nvidia-modeset.ko (which loads nvidia.ko) in /etc/rc.conf and not use loader.conf:
Code:
To use these drivers, make sure that you have loaded the NVidia kernel                  
module, by running                                                                                                            
                                                                                                                             
        # kldload nvidia-modeset                                                                                              
                                                             
on the command line, or by putting ``nvidia-modeset'' on the ``kld_list''
variable in /etc/rc.conf, either manually or by running
                                                             
        # sysrc kld_list+=nvidia-modeset
 
The temperature remained 44 degrees Celsius before and after loading the modules. After turning on X, the temperature immediately dropped by 30C.
If you kill X again, does the temperature raise again? Or perhaps it has sent the GPU the "down scale" instruction and it remains in that state. If so, it should be feasible to send it ourselves (perhaps it is a sysctl tunable?).

This used to be more common when Xorg user modesetting drivers were common. But since everyone switched to kernel modesetting, the power management tended to get simpler (outside of nvidia blobs obviously).
 
Back
Top