This pool is in quite a state - 4 2TB WD RE3 drives, of which two were failing (now three) plus two 4TB RE4 drives (both healthy). Remote hands put two new drives in on Friday and while that went well, last night the resilver/scan "finished" but the two mirrors were still marked "DEGRADED", perhaps because there was a single error reported (one file in a snapshot). On running a "zpool clear zroot" to get rid of the error, the resilver/scan started over... Ugh.
So help me out with a few thing here that I'm not following:
I see that we're now warned on doing something stupid (putting 4K drives in a 512B pool), and that's nice.
What is confusing me here is that while the general status command says that a resilver is in progress, the individual vdevs do not have "(resilvering)" next to them. I don't know if that's indicating that something is wrong or if this is just a change in ZFS that I've not noticed. Also the speed - if I look at gstat, the drives are certainly quite busy (and the drives that aren't resilvering are almost idle).
Any pointers here? I don't have enough capacity anywhere to copy this all over elsewhere and recreate the pool (wish I could as when this finishes and I drop in two more new drives they'll all be 4K sector drives). My feeling is that when the resilver hits a bad block on one of the dicey drives ("zdisk3") zfs gets really confused and instead of skipping it and moving forward it sort of never feels the resilver has completed.
And a lesson learned - yeah, multiple drives bought at the same time certainly can all start showing bad blocks within a week of each other.
So help me out with a few thing here that I'm not following:
Code:
pool: zroot
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Sep 17 03:01:18 2018
2.28T scanned out of 6.82T at 1.46K/s, (scan is slow, no estimated time)
1.24T resilvered, 33.50% done
config:
NAME STATE READ WRITE CKSUM
zroot DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gpt/zdisk0 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
3337922232420531863 REMOVED 0 0 0 was /dev/gpt/zdisk1/old
gpt/zdisk1 ONLINE 0 0 0 block size: 512B configured, 4096B native
mirror-1 DEGRADED 0 0 0
replacing-0 DEGRADED 0 0 0
8989923517117392608 REMOVED 0 0 0 was /dev/gpt/zdisk2/old
gpt/zdisk2 ONLINE 0 0 0 block size: 512B configured, 4096B native
gpt/zdisk3 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
gpt/zdisk5 ONLINE 0 0 0
gpt/zdisk4 ONLINE 0 0 0
logs
gpt/zil0 ONLINE 0 0 0
errors: No known data errors
I see that we're now warned on doing something stupid (putting 4K drives in a 512B pool), and that's nice.
What is confusing me here is that while the general status command says that a resilver is in progress, the individual vdevs do not have "(resilvering)" next to them. I don't know if that's indicating that something is wrong or if this is just a change in ZFS that I've not noticed. Also the speed - if I look at gstat, the drives are certainly quite busy (and the drives that aren't resilvering are almost idle).
Any pointers here? I don't have enough capacity anywhere to copy this all over elsewhere and recreate the pool (wish I could as when this finishes and I drop in two more new drives they'll all be 4K sector drives). My feeling is that when the resilver hits a bad block on one of the dicey drives ("zdisk3") zfs gets really confused and instead of skipping it and moving forward it sort of never feels the resilver has completed.
And a lesson learned - yeah, multiple drives bought at the same time certainly can all start showing bad blocks within a week of each other.