ZFS Why I use redundant storage

gpw928 · Nov 14, 2022

This morning I got an automated email from my ZFS server that's going to demand my attention for a little while:

Code:

[sherman.149] # zpool status zroot
  pool: zroot
 state: DEGRADED
status: One or more devices has been removed by the administrator.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Online the device using zpool online' or replace the device with
    'zpool replace'.
  scan: scrub repaired 0B in 00:04:56 with 0 errors on Thu Nov 10 03:22:38 2022
config:

    NAME                      STATE     READ WRITE CKSUM
    zroot                     DEGRADED     0     0     0
      mirror-0                DEGRADED     0     0     0
        gpt/236009L240AGN:p3  REMOVED      0     0     0
        gpt/410008H400VGN:p3  ONLINE       0     0     0

errors: No known data errors

The SSD just went offline without any data errors or any other warning:

Code:

Nov 15 03:25:23 sherman kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
Nov 15 03:25:23 sherman kernel: ada1: <INTEL SSDSC2BB240G7 N2010121> s/n BTDV7236009L240AGN detached

Then all the associated vdevs went offline. Then it came back a few seconds later. Then it went away a minute later. Then it came back four seconds later, and stayed online. It's been back online for 5 hours.

All the SMART data look fine. Extended offline tests just completed without error:

Code:

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     44223         -

This is a premium Intel enterprise class SSD, It's had about 5 years of light use, and it has failed without warning.

I'm hoping that I have a data cable problem. I'm planning to shut the system down, clean the data and power contacts on the SSD, re-seat the cables, and run the SMART extended tests again. But my instincts are urging me to get in a spare...

richardtoohey2 · Nov 14, 2022

I think I've read that before (but might be making it up!) - when an SSD fails, it fails hard. Not yet had the experience but it will be my turn one day.

So will be interested in what you find - if "just" a cabling issue or something more severe. Good luck.

gpw928 · Nov 15, 2022

I shut the system down, cleaned the data and power contacts on the ada1 SSD, firmly re-seated all the cables at both ends, and rebooted.

The zroot came back to full health all by itself, resilvering "677M in 00:00:02 with 0 errors".

A new set of SMART extended offline tests on ada1 completed without error.

I had also lost some vdevs on the root SSDs which are used by the large tank (special and separate ZIL). These also came back "resilvered 16.4M in 00:00:00 with 0 errors", and even though every vdev used by the tank was ONLINE, I had to perform a "zpool clear tank" to clear error messages in the "status:" summary.

I have scrub'd the zroot pool, without any problems.

Everything is back to normal. However, I just ordered a spare enterprise class SSD...

rawthey · Nov 15, 2022

I sometimes see the same thing with a Seagate HDD. I have 2 drives mirrored and it was always the same drive that went offline. Everything was fine after issuing a 'zpool online' command. I suspect it's a cable problem but haven't been able to prove it conclusively.

xk2600 · Nov 18, 2022

gpw928 You might validate its not a power supply issue. We are on the edge of the city and the power is all buried. Often when it rains a lot or during extreme weather (cold or hot) we will see the power dip under 120V and see LED light bulbs strobe while its happening. I have two APC UPSs providing power to two different PDUs in each rack.

On multiple occasions the UPS chirps going to battery and right back to mains. On several of these occasions I've seen drives fall out of the array and come back almost like the firmware was restarted on the SAS disk. Its never been more than 2 of the 24 disks in the array and its always been the flash disks, not the spinning metal which I thought was interesting. I will admit that the data vdev flash disks are just commodity SANDISK brand 1TB SSDs.

I use optane PCIe storage for SLOG, and so it has always been a non-event.

zirias@ · Nov 18, 2022

In my experience, issues with cabling are more frequent than actual drive issues. My experience is based on a system using 4 SATA-connected drives ... I guess the SATA connectors are a very flawed design.

ralphbsz · Nov 19, 2022

zirias@ said:
In my experience, issues with cabling are more frequent than actual drive issues.

I would confirm that for amateur-built system. In professional systems (with burn-in and well-designed cable harnesses), internal wiring issues are rare; there disk failures dominate wiring issues.

I guess the SATA connectors are a very flawed design.

From a hardware viewpoint, SATA connectors (both power and data) are mostly identical to SAS connectors; SAS connectors even have a second port on the data connector. And SAS connectors can be highly reliable. So I don't think it is a design flaw. My suspicion is rather that SATA accessories (cables, splitters, connectors) are made by a cut-throat industry that is pinching every penny, and for the hardware that is sold to amateurs, there is very little quality control or testing.

gpw928 · Nov 19, 2022

xk2600 said:
gpw928 You might validate its not a power supply issue.

That's a good point. However I have a UPS, and log all events (which are frequent in my rural location). There were no glitches associated with that particular event.

My lived experience is that SATA cables do routinely provide problems with consumer grade kit -- more often than disk failures, which is why it's so important to use a good troubleshooting guide.

Also, I always make a conscious effort to get SATA cables that have retention clips. They are vastly superior to the "slip on" variety.

PMc · Nov 19, 2022

xk2600 said:
gpw928 You might validate its not a power supply issue. We are on the edge of the city and the power is all buried. Often when it rains a lot or during extreme weather (cold or hot) we will see the power dip under 120V and see LED light bulbs strobe while its happening. I have two APC UPSs providing power to two different PDUs in each rack.

On multiple occasions the UPS chirps going to battery and right back to mains. On several of these occasions I've seen drives fall out of the array and come back almost like the firmware was restarted

So what's that UPS then good for?
I am running my server (Xeon with 18 disks) from a stock 350W PS (the 25$ kind). That is calculated, and there are still 23W reserve. I reworked the wiring, but there is no tolerance for weakening connectors. And occasionally I get exactly that effect of disks disappearing for a fraction of a second, with no record in SMART. It's only the spinning ones here - probably because this is an Ampere issue, not a Volt issue. And usually the connectors just need some cleaning or readjusting.
But I have the good feeling that at least I didn't spend money for expensive server stuff.

zirias@ said:
I guess the SATA connectors are a very flawed design.

Welcome to the club. I'm quite certain they are crap, and this goes undetected just because the PS are usually much oversized.

ralphbsz said:
I would confirm that for amateur-built system. In professional systems (with burn-in and well-designed cable harnesses), internal wiring issues are rare; there disk failures dominate wiring issues.

Okay, so get flesh to it: how do we build reliable wiring?
I've worked as a stage hand occasionally, I know how stuff looks when it is designed to not create trouble in rugged environments. I looked for SATA connectors that would have a metal casing to begin with, but there are none (maybe that would create a problem with the inductivities). The only difference I find is the arrestor clip - and that doesn't improve the connection, it just makes it impossible to unplug in narrow places, when I need to.

xk2600 · Nov 22, 2022

PMc said:
So what's that UPS then good for?

UPSs are really just surge protectors which can fault to battery, so you're on mains power and when it falls under a nominal level, the UPS transitions to battery power. Even in good (non-commercial) UPS' there is a transition from mains to battery power that causes the amperage to dip for just a split second. The UPSs conversion circuit and power supplies when provided solid 120-128V of power can serve as a buffer during this period. However, browning power (not enough amperage to maintain 120V) in an underserved area will likely suffer voltage drop. There really isn't a fix for this other than to buy extremely expensive power conditioning systems which directly convert amperage to higher voltage and recreate the 50/60hz sine wave for AC. Last I had looked at one, these were way outside of my budget.

The long story is lack of power does not great things on the lower voltage components in your system. SATA actually provides 3.3V, 5V, and 12V to most drives. My guess is its a combination of several things at least in my situation. The SANDISK SSD's I'm using probably run on the 3.3V rail (similar to RAM) and have no power regulation circuit in them, hence they get reset when the power browns. The spinning rust consume a lot of power (>5x) and due to their mechanical nature, most likely have some power regulation in them to provide consistent power so the motors are accurate in their ability to locate data on disk. Purely theory, but stands to reason.

The fact that I've never lost the Intel Optanes due to them being capacitor backed somewhat solidifies this in my head.

PMc said:
Okay, so get flesh to it: how do we build reliable wiring?

I worked in oil and gas for a long while (20+ years), and we often had to buy hardware that was mechanically rugged, meaning it could withstand constant vibration. A lot of these computer systems were OTS single board computers with wiring harnesses that screwed down or had locking multi-pin DIN whips on them. In this case, instead of a molex-style connector or friction fit connector like SAS/SATA, a custom cable harness' wiring would be soldered direct to the board or screwed down to field termination points and covered in epoxy or sealant. This is a very expensive but semi-permanent way to affix connections between boards.

This is in fact the balance engineers face.... solving the need while minimizing the cost. In the majority of cases, SATA/SAS connectors far exceed the mechanical requirements though they are friction fit systems. Cheaper cables though have lower quality assurance standards and possibly higher likeliness to fail from regular use. The same issue exists with HDMI.

I can say, I have not once seen in a server/appliance in a Datacenter or any of my rigs at the house have issues due to internal cabling. Granted, in most servers the disks attach to the board via a midplane and are locked to their SAS attachment with mechanical cage/slide on the front of the server. Maybe I've just been lucky on the cabling front with SATA/SAS. I tend to overspend on cabling because its generally not that much more to buy a good SATA/SAS cable than a cheap one. I also tend to buy higher end power supplies as they are the foundation the system runs on, and a bad power supply can burn up CPUs, RAM, etc... by not providing enough power and causing unnecessary heat output on components which drive down their lifetime.

As I'm typing this, I'm really feeling like a power conditioner is worth the $2k

PMc · Nov 22, 2022

xk2600 said:
UPSs are really just surge protectors which can fault to battery, so you're on mains power and when it falls under a nominal level, the UPS transitions to battery power. Even in good (non-commercial) UPS' there is a transition from mains to battery power that causes the amperage to dip for just a split second. The UPSs conversion circuit and power supplies when provided solid 120-128V of power can serve as a buffer during this period. However, browning power (not enough amperage to maintain 120V) in an underserved area will likely suffer voltage drop. There really isn't a fix for this other than to buy extremely expensive power conditioning systems which directly convert amperage to higher voltage and recreate the 50/60hz sine wave for AC. Last I had looked at one, these were way outside of my budget.

Okay, that figures. I occasionally contemplated if I should buy some UPS (and the problem started when I didn't find any docs on how they would integrate with a Diesel). This now sounds rather like a DIY project to do properly

xk2600 said:
The long story is lack of power does not great things on the lower voltage components in your system. SATA actually provides 3.3V, 5V, and 12V to most drives. My guess is its a combination of several things at least in my situation. The SANDISK SSD's I'm using probably run on the 3.3V rail (similar to RAM) and have no power regulation circuit in them,

I doubt that. Molex to SATA-SSD splitter have only two wires, GND and +5. I cut off the Molex, attach them to the +5 rail and things work.

xk2600 said:
I worked in oil and gas for a long while (20+ years), and we often had to buy hardware that was mechanically rugged, meaning it could withstand constant vibration. A lot of these computer systems were OTS single board computers with wiring harnesses that screwed down or had locking multi-pin DIN whips on them. In this case, instead of a molex-style connector or friction fit connector like SAS/SATA, a custom cable harness' wiring would be soldered direct to the board or screwed down to field termination points and covered in epoxy or sealant.

Yep, that sounds sensible.

It is nothing one couldn't build in the garage if one really wants to, a problem is only with the SAS/SATA signal cables - that signal speed is actually UHF, and treating UHF mechanically is nontrivial.

xk2600 said:
I can say, I have not once seen in a server/appliance in a Datacenter or any of my rigs at the house have issues due to internal cabling. Granted, in most servers the disks attach to the board via a midplane and are locked to their SAS attachment with mechanical cage/slide on the front of the server.

They usually have constant temperature, they have no dirt in the air, they don't get bended or frequently reassembled...

xk2600 · Nov 26, 2022

PMc said:
I doubt that. Molex to SATA-SSD splitter have only two wires, GND and +5. I cut off the Molex, attach them to the +5 rail and things work

Interesting observation. It is odd the the spec supports 3.3V, 5V, and 12V... guessing storage doesn't generally use anything but 5V? Could just be my cheapo SANDISK SSDs don't handle browned power well.

PMc · Nov 26, 2022

xk2600 said:
Interesting observation. It is odd the the spec supports 3.3V, 5V, and 12V... guessing storage doesn't generally use anything but 5V?

This should be in the specsheet. Mechanical disks need +5 and +12, SSD, as far as I know, only +5. And there is an allowed range, often +-10%.