Unreliable hotplug support with Intel Patsburg AHCI SATA controller

I have some problems with unreliable hotplug support when using the Intel Patsburg AHCI SATA controller. Complete listing of hardware can be found [thread=43137]here[/thread], but the most relevant hardware is the mainboard: Supermicro X9SRL-F (Intel C602 chipset) The drives I'm having trouble with are connected to this mainboards SATA3 ports. All ports are configured as HOTPLUG=ENABLED in BIOS.

I'm running FreeBSD 10.0-beta3 AMD64.
Code:
[cmd]#uname -a[/cmd]
FreeBSD homer.brkg.me 10.0-BETA3 FreeBSD 10.0-BETA3 #0 r257580: Sun Nov  3 19:43:01 UTC 2013     [email]root@snap.freebsd.org[/email]:/usr/obj/usr/src/sys/GENERIC  amd64


Example output when hotplug works:
Code:
<remove from hotplug bay>
Nov 15 03:06:12 homer kernel: ada1 at ahcich1 bus 0 scbus3 target 0 lun 0
Nov 15 03:06:12 homer kernel: ada1: <SAMSUNG MZ7PD120HAFV-000DA DXM02W1Q> s/n xxx detached
[B]Nov 15 03:06:12 homer kernel: (ada1:ahcich1:0:0:0): Periph destroyed[/B]
<insert into hotplug bay>
Nov 15 03:06:27 homer kernel: ada1 at ahcich1 bus 0 scbus3 target 0 lun 0
Nov 15 03:06:27 homer kernel: ada1: <SAMSUNG MZ7PD120HAFV-000DA DXM02W1Q> ATA-9 SATA 3.x device
Nov 15 03:06:27 homer kernel: ada1: Serial Number xxx
Nov 15 03:06:27 homer kernel: ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Nov 15 03:06:27 homer kernel: ada1: Command Queueing enabled
Nov 15 03:06:27 homer kernel: ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
Nov 15 03:06:27 homer kernel: ada1: Previously was known as ad6

Example output when hotplug fails:
Code:
<remove from hotplug bay>
Nov 15 03:16:40 homer kernel: ada1 at ahcich1 bus 0 scbus3 target 0 lun 0
Nov 15 03:16:40 homer kernel: ada1: <SAMSUNG MZ7PD120HAFV-000DA DXM02W1Q> s/n xxx detached
<insert into hotplug bay>
[B]Nov 15 03:20:32 homer kernel: cam_periph_alloc: attempt to re-allocate valid device ada1 rejected flags 0x118 refcount 2[/B]
Note the absence of 'periph destroyed' in the above output.

One port can be hotplugged perfectly fine multiple times in a row, but after a reboot, it will not. After yet another reboot, it may or may not be working properly. This goes for both connected ports.

Hotplugging seems to be working fine on the other controller (LSI 2308-based, using the mps(4) driver)

Relevant dmesg output:
Code:
ahci0: <Intel Patsburg AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xfb921000-0xfb9217ff irq 18 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
ahciem0: <AHCI enclosure management bridge> on ahci0

ada0 at ahcich0 bus 0 scbus2 target 0 lun 0
<several da* devices connected to another controller>
ada0: <SAMSUNG MZ7PD120HAFV-000DA DXM02W1Q> ATA-9 SATA 3.x device
ada0: Serial Number xxx
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus3 target 0 lun 0
ada1: <SAMSUNG MZ7PD120HAFV-000DA DXM02W1Q> ATA-9 SATA 3.x device
ada1: Serial Number xxx
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 114473MB (234441648 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6

Submitted PR

Any ideas?

Edit: I've tried to reproduce the problem with FreeBSD 9.2-RELEASE, but have been unable to so far. 20 tries spread over 5 reboots.
 
Lack of "Periph destroyed" line may be result of device being held open by some consumer. For example, enabled swap won't let underlying disk go to not crash immediately if some kernel memory was swapped out there. Please tell more about what do you have on that disk, and show what do you have on "Mode: " line in geom disk list and higher on the stack.

I don't expect this issue to be related to specific hardware.
 
mav@ said:
Lack of "Periph destroyed" line may be result of device being held open by some consumer. For example, enabled swap won't let underlying disk go to not crash immediately if some kernel memory was swapped out there. Please tell more about what do you have on that disk, and show what do you have on "Mode: " line in `geom disk list` and higher on the stack.

I don't expect this issue to be related to specific hardware.

You're absolutely right. Hotplugging started working as expected once I disabled swap. To the best of my knowledge, no swap was being used at the time of the tests. Output in top said all swap was free, and since the box had plenty of RAM free (read: dozens of GB), I would be surprised if anything was being swapped for any reason.
I also realize I had not set up swap when testing in FreeBSD 9.2, which explains why I failed to reproduce the bug there.

Although the problem is not what I initially thought, I still think this is a problem which needs solving, as a hdd suddenly vanishing for some reason (power failure, hdd crash, whatever) shouldn't disable that port for some arbitary period of time, or until next reboot. I'll update the PR with these findings.
 
Back
Top