nvme "attach returned 12"

I just installed a dual-slot carrier for NVMe (IcyDock ToughArmor MB834M2K-B) in my workstation (running 13.0-RELEASE) and connected it via a Supermicro AOC-SLG3-2E4 PLX-based PCIe switch "HBA".
The PCIe switch is correctly recognized and NVMes are also detected upon insertion in either slot, but they can't be picked up by the nvme driver:

Code:
# lspci | grep PLX
01:00.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
02:01.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
02:02.0 PCI bridge: PLX Technology, Inc. PEX 8718 16-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ab)
[..]
pci8: <PCI bus> on pcib3
nvme2: <Generic NVMe Device> at device 0.0 on pci8
nvme2: 0x4000 bytes of rid 0x10 res 3 failed (0, 0xffffffffffffffff).
nvme2: unable to allocate pci resource
device_attach: nvme2 attach returned 12

The drive inserted is a samsung ssd980 (mz-v8v250), pulled from a NUC that is booting from that disk (i.e. "known working").

I'm using the nda(4) driver, so hotplugging should work (it does on 2 other systems with NVMe backplanes); but AFAIK the nvme driver hands over the device to either nvd or nda, so with nvme(4) failing nda never gets involved.

Is this some kind of limitation on desktop hardware (i.e. not supporting PCIe switches)?
 
Last edited:
I am using the same HBA with Samsung PM983 drives and it works fine. I am using the 4 bay Icy MB699VP.
Earlier I had tried NVMe hotswapping and it did not work well. The drive would become corrupted on eject.
Even when bringing down the drive with devctl and unmounting manually.
I can't remember if insertion worked right.
 
I can't determine what part of your results is based on the "hot-plugging" or actions that reflect that capability.

I'm no specialist but, for something to be hot-pluggable every hardware part that is in the chain has to able to support this and the circuitry has to be designed so that it will allow the hot-plugging to be actually available. After that work is all done correctly the software has to support it.

Does FreeBSD support a hot-pluggable NVMe device over PCIe?
  1. Supermicro Add-on Card AOC-SLG3-2E4
  2. Supermicro AOC-SLG3-2E4R and Supermicro AOC-SLG3-2E4 Differences
  3. 4 solutions tested: Add 2.5″ SFF NVMe to your current system
  4. PEX 8718 (see: Product Brief PEX8718)

The PEX8718 product brief does mention:
- 1 Hot-Plug port with native HP Signals
- All ports Hot-Plug capable thru I2C
Supermicro however, doesn't mention anything that hints to a hot-plug capability. Article ad 2. states that it is not. For the AOC-SLG3-2E4R it mentions "Supprts Hotswap"; this is the version without the switch chip. Supermicro itself however does not hint at anything hotswap or hot-plug: Add-on Card AOC-SLG3-2E4R

I'm rather uncertain that your set up is hot-pluggable.
 
One of the differences I have found between consumer drives and server drives is OptionROM.
Samsung ships some consumer M.2 drives with an OptionROM installed so they can be booted on Legacy BIOS.

I know this has nothing to do with hotswapping on SLG3-2E4 but as a participant in many NVMe threads I do remember that there were some Samsung M.2 consumer drives that would not work on FreeBSD.
Thinking back it makes me wonder if these drives had an OptionROM.
Also how did they act under LegacyBIOS versus EFI modes. That's the stuff I would have tried.
Myself, Having paddle cards around I would try it bare instead of with SLG3-2E4. Just to eliminate culprits.

Is this M.2 drive an PCIe 4.0 one? I ask because ${mz-v6v250} is not a valid part number.
980 was PCIe3 and 980 Pro is PCIe4 I think.
 
So does this drive work at all? Just troubles hotswapping?

The drive does not work as no device nodes are created. I haven't tested yet if it would work if plugged in during boot.
The HBA and drive carrier support hotplugging, but I'm not sure about that samsung drive. I shoud have an older intel drive and a transcend lying around somewhere; I'll try those on mondays.

Is this M.2 drive an PCIe 4.0 one? I ask because ${mz-v6v250} is not a valid part number.
980 was PCIe3 and 980 Pro is PCIe4 I think.
Sorry, that was a typo - it is an MZ-V8V250, so PCIe3.0.

Myself, Having paddle cards around I would try it bare instead of with SLG3-2E4. Just to eliminate culprits.
I would have liked to use a simple 1:1 adapter as well, but ASUS also has some stupid and non-standanrd conformant (AFAIK PCIe 4.0 standard requires full hotplug support) policy of not offering PCIe bifurcation options in the UEFI on that board (Pro B560M-C/CSM), so a PCIe switch HBA it is...
 
When I did my experiments I did not compile a custom kernel, so maybe that is needed.
 
Closer inspection shows that Supermicro does not mention the hot plug capability here: Supermicro Add-on Card AOC-SLG3-2E4 but, it does notably describe it in its User's Guide. It specifies special OS-specific drivers (Windows & Linux-es; no FreeBSD specific driver is mentioned) whereby the hot plug ejection preparation ends with a certain LED pattern on the HBA, signalling the user that the NVMe device may be physically removed. That LED pattern is of course not the relevant part in and of itself, but all actions that are not externally visible but necessary before the physical hot-unplugging are.

Looking at the Product Brief PEX8718:
- 1 Hot-Plug port with native HP Signals
- All ports Hot-Plug capable thru I2C
More at "Hot-Plug for High Availability":
[...] One downstream ports include a Standard Hot Plug Controller.
[...] Every port on the PEX8718 is equipped with a Hot-plug control/status register to support Hot-Plug capability through external logic via the I2C interface.
Taking that the notion "native HP Signals" refers to the PCIe HP (=hot plug) signal, as for example referred to as PCI_HP in pci(4). The description in the 8718 product brief (second quote) leads me to conclude that, other than the main port, other ports have to be controlled trough I2C signals as far as hot plugging is concerned. I presume that one port of the 8718 is connected to one NVMe SSD device. Obviously the Supermicro drivers are 8718-aware.

The question therefore:
Is the FreeBSD driver* capable of controlling the I2C signals of the 8718, necessary for hot plugging?

* probably nda(4) as mentioned here by olli@ and referred to in this thread.
 
I just did some more testing after reading through this thread and finding this and this.

When booting with the nvme inserted, it is properly detected, attached and accessible (nvme and nda device nodes in /dev are present):
Code:
# nvmecontrol devlist
 nvme0: Samsung SSD 980 250GB
    nvme0ns1 (238475MB)
 nvme1: Samsung SSD 970 EVO 1TB
    nvme1ns1 (953869MB)
 nvme2: Samsung SSD 970 EVO 1TB
    nvme2ns1 (953869MB)
(the device in question is the 250GB Samsung 980)

Upon physically removing the disk, the devices are destroyed:
Code:
# dmesg | tail
[...]
nda0 at nvme0 bus 0 scbus7 target 0 lun 1
nda0: <Samsung SSD 980 250GB 1B4QFXO7 S64BNJ0R305630J>
 s/n S64BNJ0R305630J detached                                                                                                                                                                                                                                
(nda0:nvme0:0:0:1): Periph destroyed

But inserting it again doesn't produce any dmesg logs. however, pciconf -lv shows the device:
Code:
nvme0@pci0:3:0:0:       class=0x010802 rev=0x00 hdr=0x00 vendor=0x144d device=0xa809 subvendor=0x144d subdevice=0xa801
    vendor     = 'Samsung Electronics Co Ltd'
    device     = 'NVMe SSD Controller 980'
    class      = mass storage
    subclass   = NVM
# pciconf -r pci0:3:0:0 0:128
a809144d 00100000 01080200 00000000
00000004 00000000 00000000 00000000
00000000 00000000 00000000 a801144d
00000000 00000040 00000000 000001ff
00035001 00000008 00000000 00000000
008a7005 00000000 00000000 00000000
00000000 00000000 00000000 00000000
0002b010 10008fc1 00002810 00477843

following the procedure shown by akuroger in this post doesn't bring it back:
Code:
# devctl reset pci0:3:0:0
# devctl rescan pci0:3:0:0
devctl: Failed to rescan pci0:3:0:0: Device not configured
# devctl set driver -f pci0:3:0:0 nvme
# dmesg | tail -n4
nda0 at nvme0 bus 0 scbus7 target 0 lun 1
nda0: <Samsung SSD 980 250GB 1B4QFXO7 S64BNJ0R305630J>
 s/n S64BNJ0R305630J detached
(nda0:nvme0:0:0:1): Periph destroyed
# devctl attach pci0:3:0:0
devctl: Failed to attach pci0:3:0:0: Device busy
# nvmecontrol devlist
 nvme1: Samsung SSD 970 EVO 1TB
    nvme1ns1 (953869MB)
 nvme2: Samsung SSD 970 EVO 1TB
    nvme2ns1 (953869MB)

The pci device is recognized and configured, but that's how far it gets. So it seems we're "almost there" with nvme hot-plug/swap support, but there's still some pieces missing to finally make it work.


edit:
For clarification: the systems where I've observed working nvme hotplugging are using LSI 9400 tri-mode HBAs; so I suspect hotplugging is solved differently by those HBAs and the mpr driver.
 
Back
Top