Losing devices now and then

My "mainserver" loses devices now and then (9.1 rel and now 9.2 rel too) - after rebooting everything is back and fine (zpool scrub also says everything is ok)

So I don't really know where to search to find out what is causing the problems.

Bad sectors at the same time on three devices? Controller problem? but all controlles have 4 drives attached....?
Bad Backplane? - also 4 drives should be affected (as I use 4 drives backplanes)

Code:
Oct 11 18:05:16 mainserver kernel: (ada5:mvsch5:0:0:0): lost device
Oct 11 18:05:16 mainserver kernel: (ada5:mvsch5:0:0:0): removing device entry
Oct 11 18:05:16 mainserver kernel: (ada1:mvsch1:0:0:0): lost device
Oct 11 18:05:16 mainserver kernel: (ada1:mvsch1:0:0:0): removing device entry
Oct 11 18:05:16 mainserver kernel: (ada2:mvsch2:0:0:0): lost device
Oct 11 18:05:16 mainserver kernel: (ada2:mvsch2:0:0:0): removing device entry
Oct 11 18:05:26 mainserver kernel: (ada3:mvsch3:0:0:0): lost device
Oct 11 18:05:26 mainserver kernel: (ada3:mvsch3:0:0:0): removing device entry
Oct 11 18:06:39 mainserver kernel: ada1 at mvsch1 bus 0 scbus1 target 0 lun 0
Oct 11 18:06:39 mainserver kernel: ada1: <ST4000DM000-1F2168 CC52> ATA-8 SATA 3.x device
Oct 11 18:06:39 mainserver kernel: ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Oct 11 18:06:39 mainserver kernel: ada1: Command Queueing enabled
Oct 11 18:06:39 mainserver kernel: ada1: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
Oct 11 18:06:39 mainserver kernel: ada1: quirks=0x1<4K>
Oct 11 18:06:39 mainserver kernel: ada1: Previously was known as ad6
Oct 11 18:06:39 mainserver kernel: ada2 at mvsch3 bus 0 scbus3 target 0 lun 0
Oct 11 18:06:39 mainserver kernel: ada2: <ST4000DM000-1F2168 CC52> ATA-8 SATA 3.x device
Oct 11 18:06:39 mainserver kernel: ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Oct 11 18:06:39 mainserver kernel: ada2: Command Queueing enabled
Oct 11 18:06:39 mainserver kernel: ada2: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
Oct 11 18:06:39 mainserver kernel: ada2: quirks=0x1<4K>
Oct 11 18:06:39 mainserver kernel: ada2: Previously was known as ad10
Oct 11 18:06:39 mainserver kernel: ada3 at mvsch5 bus 0 scbus7 target 0 lun 0
Oct 11 18:06:39 mainserver kernel: ada3: <ST4000DM000-1F2168 CC52> ATA-8 SATA 3.x device

And zpool status after loss (before reboot)

Code:
       tank                      UNAVAIL      0     0     0
          raidz2-0                UNAVAIL      0     0     0
            ada0.eli              ONLINE       0     0     0
            10058800660867328065  REMOVED      0     0     0  was /dev/ada1.eli
            2192216948537936452   REMOVED      0     0     0  was /dev/ada2.eli
            ada3.eli              ONLINE       0     0     0
            ada4.eli              ONLINE       0     0     0
            13988282619329651790  REMOVED      0     0     0  was /dev/ada5.eli
            ada6.eli              ONLINE       0     0     0
            ada7.eli              ONLINE       0     0     0
            ada9.eli              ONLINE       0     0     0
            ada10.eli             ONLINE       0     0     0
            ada11.eli             ONLINE       0     0     0
            ada12.eli             ONLINE       0     0     0
 
Check your power supply. There are many methods described, but it must be tested while under load. If you have a spare powersup which you are sure is healthy, replace and observe error status.

Failing power supply units can"t cope with peak power requests from devices and simply cut the power to that device. At that stage, the system looses the device as it temporarily has no power.
 
I'll swap the power supply though I doubt, it's the reason for the problems.

I never had these device losses so far when scrubbing was running - as my ZFS runs on geli, this means my server runs ~ 14 hours constantly on really heavy load.

But thanks.... as I don't really know anything else to do :D it's at least worth a try.
 
Check also the power connectors. The quality of some connectors is really poor, when combining current and vibrations. Due of one power connector I realized the importance of keeping my data on ZFS. I had a similar problem, switched the power connectors among hard-drives then the problem disappeared.
HTH
 
Back
Top