ZFS Replacing drive hangs zpool

tcn · Sep 17, 2019

Hi,
I have a server, dual Xeon with 256GB memory with a RAID-Z2 array using 6 disks and two spares.
On this machine I have 5 running VMs, one of which seems to be picky with the array (Windows Server with SQL installed, hangs when trying to scrub the pool).
One of the disks was reporting errors so I had to change it. Did the procedure of adding the new drive and launched the replacing process.
Resilvering is abnormally slow on my machine just like scrubbing.... After two hours, the pool hung. Querying status on the pool works but takes a few minutes.
Stopping VMs puts dead processes in memory; IOs seem to stall everything.
I had to reboot and now the whole system is hung; ZFS is resilvering the array.

Does anyone have some kind of clue as of what might be happening on this system?
And also, I truly wish that when ZFS is resilvering on startup to get some kind of status or message anything; a prompt would be nice! This emptiness is killing.....

tcn

ralphbsz · Sep 17, 2019

To make the resilvering finish faster, stop all workload. And obviously stop any scrubbing. I know this is not a good long-term suggestion (in the long term, both resilvering and scrubbing should be able to run together with the workload), but that at least gets your machine back to running faster.

Then, when only resilvering is running, measure the IO activity, for example with the iostat or "zpool iostat" commands (read the man pages to see how to use them). Ideally, you should see that at least one disk is very busy (if these are rotating disks, you should see 100 MByte/s plus or minus a factor of two), even better would be all disks busy. If you can't get to that, there is some problem. You might have to look at the various ZFS tuning parameters in sysctl and in the loader. At least your machine has enough CPU power and memory.

Once you have your machine back up and fully redundant, then it's time to investigate why scrubbing makes it so slow. A little slowdown is fine and to be expected (ZFS is not super good at scheduling IOs to minimize impact of maintenance operations such as scrub or resilvering on foreground). I've noticed that on my home system too (which is much smaller, 3GB of memory, single 32-bit CPU, and only 2 disks in the largest zpool). But getting so slow that it de-facto hangs is not acceptable. Again, measure IO throughput, workload intensity, and check tunables.

Another question is this: Do you know what type of workload your applications are generating? If you are running an intense transaction processing workload on SQL server, then it might be doing lots of small writes (updated) into large files. That is very inefficient on parity-based RAID systems like RAIDZ2. You might be better off switching that part of the data to a mirrored RAID layout.

tcn · Sep 17, 2019

Well, resilvering took 2 hours. Really no fun having to wait in the dark like this......

I know I have lots of fragmentation in my pool... 27% is reported by ZFS. I am reviewing my snapshot methods and will fix this in a near future.
I am running on a degraded RAID right now; this will have to do until tonight where I will re-insert another disk in the pool. I will close all VMs and jails and look at the IO statistics.
I have to agree; I didn't thing we would have the need for an SQL server machine so I had to use the pool I had at the time. The server will probably move to another machine soon and will be on a mirror.

Not too sure about the load of the SQL server but it is a 512 byte sector sized ZVOL. Some of the IO statistics in a running environment (which is not that busy) gives:

 zpool iostat

pool        alloc   free   read  write   read  write

----------  -----  -----  -----  -----  -----  -----

tank         10.2T  11.2T  1.52K    373  8.19M  2.86M

----------  -----  -----  -----  -----  -----  -----

[\CMD]

tcn · Sep 18, 2019

I am resilvering again in single user. However, I can look at the IO statistics now.
This is what I am looking at right now: Bandwidth: 198M read, 1.44M write.
Next step will be to scrub with a single VM and see what is the degradation.

I also have encrypted my partitions (ZFS over GELI); I think this would also slow me down at bit but would it slow me down this much?

ralphbsz · Sep 18, 2019

The read throughput is reasonable but not great for a 6-disk RAIDZ2 array: To resilver, you're reading from 5 disks and writing to one. So it amounts to 40 MByte/s per disk, which is way below the hardware capability (100-200 MByte/s), but reasonable, within a factor of "2 for large values of 2". The write throughput is a little worrying; ideally, it should be the same as the read throughput to one disk, or about 40, and you are getting roughly 1.5. Perhaps resilvering is doing a lot of defragmentation at the same time, and has to do a lot of reads to build up one write. Or perhaps something else is going on.

tcn · Sep 18, 2019

Would the amount of snapshots slow resilver/scrub?

ralphbsz · Sep 18, 2019

Don't know.

tcn · Sep 23, 2019

My array has been rebuilt; no more snapshot for now. I also removed encryption as it is useless in my system. Of course, fragmentation is now 0%.
I still have a constant 1MB read, 2MB write in the IO statistics. Not sure where this is coming from. This constant read/write will slow down scrub and resilvering....
I will try a scrub next weekend and see how this goes. In the meantime, I have to review my periodic snapshots.