SATA hot swap problems

Hi everybody,
I have a FreeBSD machine at home with a 4 disk ZRAID. I occasionally put another SATA disk into a hot swap bay and transfer snapshots via zxfer.

After a somewhat longer pause, I decided to transfer the latest snapshots this week (I did this last in December 2012). After inserting a disk into the hot swap bay I get these messages in dmesg:

Code:
# dmesg
[...]
(ada0:ahcich0:0:0:0): lost device
(pass0:ahcich0:0:0:0): passdevgonecb: devfs entry is gone
(ada3:ahcich3:0:0:0): lost device
(pass3:ahcich3:0:0:0): passdevgonecb: devfs entry is gone
(ada0:ahcich0:0:0:0): removing device entry
(ada3:ahcich3:0:0:0): removing device entry
ada0 at ahcich3 bus 0 scbus3 target 0 lun 0
ada0: <WDC WD20EARX-00PASB0 51.0AB51> ATA-8 SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad10
ada3 at ahcich0 bus 0 scbus0 target 0 lun 0
ada3: <WDC WD20EARS-00MVWB0 51.0AB51> ATA-8 SATA 2.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad4
ada5 at ahcich6 bus 0 scbus6 target 0 lun 0
ada5: <WDC WD10EARS-00MVWB0 51.0AB51> ATA-8 SATA 2.x device
ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
ada5: Previously was known as ad16

this leads to the zpool becoming UNAVAIL:

Code:
# zpool status
  pool: storage
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool
clear'.
   see: [url]http://illumos.org/msg/ZFS-8000-HC[/url]
  scan: scrub repaired 0 in 12h38m with 0 errors on Fri Dec 28 04:34:06 2012
config:

	NAME                     STATE     READ WRITE CKSUM
	storage                  UNAVAIL      0     0     0
	  raidz1-0               UNAVAIL      0     0     0
	    5607441266446840499  REMOVED      0     0     0  was /dev/ada0
	    3577502903318221364  REMOVED      0     0     0  was /dev/ada3
	    ada1                 ONLINE       0     0     0
	    ada2                 ONLINE       0     0     0
but strangely, camcontrol still sees all installed disks:

Code:
# camcontrol devlist
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus0 target 0 lun 0 (ada3,pass3)
<WDC WD30EFRX-68AX9N0 80.00A80>    at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD20EARS-00MVWB0 51.0AB51>    at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD20EARX-00PASB0 51.0AB51>    at scbus3 target 0 lun 0 (ada0,pass0)
<WDC WD3200BPVT-00ZEST0 01.01A01>  at scbus4 target 0 lun 0 (pass4,ada4)
<WDC WD10EARS-00MVWB0 51.0AB51>    at scbus6 target 0 lun 0 (ada5,pass5)

I tried this with a factory new hard disk with the same effect. The hot swap bay was connected to the eSATA port of my mainboard. Since I thought this might somehow be the source of my troubles, I inserted a RocketRaid 620 controller and connected the hot swap bay to one of its SATA ports: still the same effect.

Since the last time I inserted a hard disk I upgraded from FreeBSD 9.1-RC2 to FreeBSD 9.1-RELEASE. Also I used to have the ZRAID's hard disks in an ICY Box backplane but have since connected them directly to the powersupply and the mainboard's SATA ports.

When start the machine with a disk already in the hot swap bay, everything works fine and I can import the zpool on the disk.

Has anybody an idea what causes this behaviour and how it can be changed?

Thanks in advance for any help.
 
As I understand, addition of new disk caused some existing disks disconnection. That should not happen for driver reason since AHCI ports are quite independent. My best guess is that your power supply may be not powerful enough to properly handle disk spinup on hot-plug. On some of my test hardware I hear sometimes as existing disks are doing emergency head parking when new disk power connector is plugged in. With bigger power spike I guess it may cause disk loss.

If your disks support it, you may try to enable "Power Up In Stand-by" (PUIS) feature on them to delay spinup while OS explicitly request it. That should reduce power spike on hot-plug.
 
It looks like there really was a drop in voltage, I installed a more powerful power supply and it works like a charm now.
 
Back
Top