Solved Xorg Doesn't Detect nVidia on Supermicro X11DAC

mvnathan

New Member


Messages: 2

Hello all,

Apologies for the long post. But I wanted to provide enough detail on the problem I'm facing as well as everything I've tried so far to resolve it.

The Problem

I used to have a desktop built around a Gigabyte Z170X-SOC FORCE motherboard and a Core i7-6700K processor. Back when I built it, FreeBSD did not yet support Skylake graphics. So, I used an nVidia Quadro K1200 graphics card connected to three monitors. I started off with FreeBSD 11.0 and upgraded it over the years to 11.1, 11.2, and, finally, 12.0.

Things were working fine until a few weeks ago when the Gigabyte motherboard died. So I built a new desktop centered around a Supermicro X11DAC motherboard and two Xeon Gold 5222 processors. I kept the graphics card and disks from the old desktop.

FreeBSD boots up on the new desktop but the Xorg server doesn't detect the nVidia card.

System Info

Hardware

As mentioned above, the desktop has a Supermicro X11DAC motherboard, which features an on-board VGA controller provided by its ASPEED AST2500 BMC.

The UEFI has a setting, viz., Advanced -> PCIe/PCI/PnP Configuration -> VGA Priority, to select the primary graphics device for system boot. I have set this to "Offboard." AFAICT there is no setting to completely disable the on-board VGA.

FreeBSD

Here is the output of freebsd-version -kru:

Code:
    12.0-RELEASE-p7
    12.0-RELEASE-p7
    12.0-RELEASE-p7

The kernel definitely detects both graphics cards as can be seen from these dmesg lines:

Code:
    vgapci0: <VGA-compatible display> port 0x2000-0x207f mem 0x9c000000-0x9cffffff,0x9d000000-0x9d01ffff irq 17 at device 0.0 numa-domain 0 on pci3
    vgapci1: <VGA-compatible display> port 0xf000-0xf07f mem 0xfa000000-0xfaffffff,0x39ffe0000000-0x39ffefffffff,0x39fff0000000-0x39fff1ffffff irq 96 at device 0.0 numa-domain 1 on pci19
    vgapci1: Boot video device
    nvidia1: <Quadro K1200> numa-domain 1 on vgapci1
    vgapci1: child nvidia1 requested pci_enable_io
    vgapci1: child nvidia1 requested pci_enable_io
    nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  390.87  Tue Aug 21 15:53:31 PDT 2018

If it helps, the full dmesg log is available here.

Here are the kernel modules I have loaded:

Code:
    Id Refs Address                Size Name
     1   64 0xffffffff80200000  243d260 kernel
     2    1 0xffffffff8263f000   3a9a10 zfs.ko
     3    2 0xffffffff829e9000     a4f0 opensolaris.ko
     4    1 0xffffffff829f7000     c7e0 aesni.ko
     5    1 0xffffffff82a04000    1ead0 geom_eli.ko
     6    1 0xffffffff83112000     1a20 fdescfs.ko
     7    1 0xffffffff83114000    f4ef0 nvidia-modeset.ko
     8    1 0xffffffff83209000   c42ae8 nvidia.ko
     9    2 0xffffffff83e4c000    39970 linux.ko
    10    3 0xffffffff83e86000     2e28 linux_common.ko
    11    3 0xffffffff83e89000    53140 vboxdrv.ko
    12    1 0xffffffff83edd000     7078 ioat.ko
    13    1 0xffffffff83ee5000     1800 uhid.ko
    14    1 0xffffffff83ee7000     23a8 ums.ko
    15    2 0xffffffff83eea000     2ce0 vboxnetflt.ko
    16    2 0xffffffff83eed000     a020 netgraph.ko
    17    1 0xffffffff83ef8000     1710 ng_ether.ko
    18    1 0xffffffff83efa000     3f30 vboxnetadp.ko
    19    1 0xffffffff83efe000      acf mac_ntpd.ko
    20    1 0xffffffff83eff000    16a78 ext2fs.ko
    21    1 0xffffffff83f16000     7bd0 ipmi.ko
    22    1 0xffffffff83f1e000      b10 smbus.ko
    23    1 0xffffffff83f1f000      b98 coretemp.ko

PCI Devices

Here are the relevant lines from pciconf -lvbe:

Code:
    vgapci0@pci0:3:0:0:	class=0x030000 card=0x20001a03 chip=0x20001a03 rev=0x41 hdr=0x00
        vendor     = 'ASPEED Technology, Inc.'
        device     = 'ASPEED Graphics Family'
        class      = display
        subclass   = VGA
        bar   [10] = type Memory, range 32, base 0x9c000000, size 16777216, enabled
        bar   [14] = type Memory, range 32, base 0x9d000000, size 131072, enabled
        bar   [18] = type I/O Port, range 32, base 0x2000, size 128, enabled
    vgapci1@pci0:216:0:0:	class=0x030000 card=0x114010de chip=0x13bc10de rev=0xa2 hdr=0x00
        vendor     = 'NVIDIA Corporation'
        device     = 'GM107GL [Quadro K1200]'
        class      = display
        subclass   = VGA
        bar   [10] = type Memory, range 32, base 0xfa000000, size 16777216, enabled
        bar   [14] = type Prefetchable Memory, range 64, base 0x39ffe0000000, size 268435456, enabled
        bar   [1c] = type Prefetchable Memory, range 64, base 0x39fff0000000, size 33554432, enabled
        bar   [24] = type I/O Port, range 32, base 0xf000, size 128, enabled
      PCI-e errors = Correctable Error Detected
                     Unsupported Request Detected
         Corrected = Advisory Non-Fatal Error

If it helps, the full output of pciconf -lvbe is available here.

I also ran nvidia-smi -a. It reports the card's bus ID as 00000000:D8:00.0. Incidentally, 0xD8 is 216 in decimal. The full output of nvidia-smi -a is here.

Xorg

On my old desktop, I had configured X to use the proprietary nVidia driver in /usr/local/etc/X11/xorg.conf.d/device.conf:

Code:
    Section "Device"
        Identifier "K1200"
        Driver "nvidia"
    EndSection

Here are the contents of /var/log/Xorg.0.log when I run startx with the above config on the new desktop. The reason Xorg fails is because it only detects one graphics device, i.e., the on-board VGA:

Code:
    [  1806.956] (--) PCI:*(0:3:0:0) 1a03:2000:1a03:2000 rev 65, Mem @ 0x9c000000/16777216, 0x9d000000/131072, I/O @ 0x00002000/128, BIOS @ 0x????????/65536

Attempted Solutions

Sanity Check With VESA Driver

To confirm that the X11DAC isn't some sort of server-only motherboard that won't support graphics at all (illogical and unlikely but worth confirming nevertheless), I replaced the nvidia in device.conf with vesa. When I ran startx after this change, I did indeed get my regular GUI desktop.

However, it was low resolution and one monitor only (connected, of course, to the on-board VGA port). This makes sense as the on-board VGA is meant to be used for remote management via IPMI. Obviously, that is not a viable solution for everyday use as a development workstation.

Updating Xorg Config

As a first stab at the problem, I added BusID "PCI:216:0:0" to device.conf. No effect; Xorg still only detected the on-board VGA and complained about not finding any devices. The Xorg log is here.

Next, I added a ServerFlags section and explicitly disabled AutoAddGPU in it to see if the server would pick the card with the above PCI bus ID. No luck.

Thinking that, perhaps, with GPU detection disabled, I needed to provide an explicit ServerLayout and other sections, I proceeded to create a complete Xorg config with everything spelled out about devices, monitors, etc. All to no avail.

Finally, I ran nvidia-xconfig, hoping that it would do some magic hardware probing and generate a usable config. Sadly, it is devoid of magic.

Ubuntu 19.04 Installer

My next line of attack was to see if Linux would work. So, I downloaded the Ubuntu 19.04 installer ISO, wrote it to a USB stick, and booted up the computer with it. The Ubuntu installer did detect both the on-board VGA as well as the nVidia card. It configured and launched the Xorg server with the nouveau driver. Here is the Xorg log file.

All three monitors came alive. However, there was excessive and frequent flashing and flickering. Presumably, this can be solved with the proprietary nVidia driver.

The important point is that Ubuntu's Xorg server detected both graphics devices:

Code:
    [    28.976] (--) PCI: (3@0:0:0) 1a03:2000:1a03:2000 rev 65, Mem @ 0x9c000000/16777216, 0x9d000000/131072, I/O @ 0x00002000/128
    [    28.976] (--) PCI:*(216@0:0:0) 10de:13bc:10de:1140 rev 162, Mem @ 0xfa000000/16777216, 0x39ffe0000000/268435456, 0x39fff0000000/33554432, I/O @ 0x0000f000/128, BIOS @ 0x????????/131072

Upgrading x11-servers/xorg-server to Version 1.20.4

Since Ubuntu was running version 1.20.4 of the Xorg server while FreeBSD 12 is on 1.18.4, I thought upgrading it might help. For that, I came across PR 196678; its fourth attachment is a patch set for building version 1.20.4 of the x11-servers/xorg-server port.

I applied the above patch set to my local ports tree and then ran make package in /usr/ports/x11-servers/xorg-server. Instead of installing this newly built package, I simply unpacked its tarball in my home directory. I then made the executable SUID root:

Code:
    sudo chown root:wheel /path/to/bin/Xorg
    sudo chmod 4755 /path/to/bin/Xorg

I also copied nvidia_drv.so and libglx.so from their usual locations under /usr/local to the corresponding locations where I'd unpacked the package. And I reverted device.conf to its original state (i.e., without my ServerFlags, ServerLayout, etc. experiments) but with the BusID still in place.

All ready, I ran this newly built version 1.20.4 of Xorg and, just like 1.18.4, it too failed to detect the nVidia card. Here is the log file.

Project Trident Installer

As one last-ditch effort, since it is a desktop-oriented FreeBSD derivative, thinking perhaps they'd patched the Xorg server to get it to work right, I downloaded the latest STABLE release of the Project Trident installer ISO and tried it out (the latest CURRENT release doesn't boot on my machine).

But their GUI installer simply uses the VESA driver. So, I escaped to a shell and tried startx from there. But, as before, /var/log/Xorg.0.log showed detection of only the on-board VGA.

Next Steps

It looks to me like the Xorg server on FreeBSD doesn't probe the PCI bus properly. I had a quick look at its code, but, naturally, couldn't really tell much. Is anyone on the forum familiar with this code? I'd be willing to patch and test.

Or maybe there's no need to futz around with the X server code. Maybe I'm simply missing some kernel or other config needed to get things working. Has anyone else faced a similar problem (especially someone using a Supermicro motherboard as a workstation rather than a headless, remotely managed server)?

Well, I'm out of ideas. Any further pointers on how to proceed would be deeply appreciated. TIA.
 

Phishfry

Beastie's Twin

Reaction score: 2,377
Messages: 5,289

I think NUMA1 in your output is all the hint I need.
You have the graphics card on the CPU1 bus. Try your video card in another slot.
Preferably a slot tied to CPU0
 

toorski

Active Member

Reaction score: 55
Messages: 163

Your dmesg indicates nvidia-modset 390.87
nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.87 Tue Aug 21 15:53:31 PDT 2018
Nvidia is suggesting 430.40 driver for Quadro K1200:
which you'll need to built from Nvidia's tarball.

I'd start with loading just the basic kernel mods.

But first, I'd update FreeBSD-12 to to latest patch level -9.

Code:
freebsd-update fetch
freebsd-update install

Then:
in /boot/loader.conf
Code:
linux_load="YES"

After installing the correct nvidia driver and nvidia-xconfig
in /boot/loader.conf add
Code:
nvidia_load="YES"
nvidia-modeset_load="YES"

Reboot and do:
nvidia-xconfig
followed by:
startx

And then hope for the best ;)
Good luck!
 
OP
M

mvnathan

New Member


Messages: 2

I think NUMA1 in your output is all the hint I need.
You have the graphics card on the CPU1 bus. Try your video card in another slot.
Preferably a slot tied to CPU0

This worked! The graphics card was in slot 3, which is connected to CPU1. I moved it to slot 1 and all is well with the world again. Many thanks for the help.

For my own edification, why was Xorg unable to detect the card when it was in slot 3? And since the Ubuntu LiveCD worked, I'm guessing this is something at the kernel level?
 
Top