ZFS Zpool Degraded state

AngryAngel · Jan 10, 2018

Hi all,
I've been roaming the different forums to understand my problem and get a solution.
I have a 7 disk raidz1 array.
last week 1 disk failed complete, I replaced that and started resilvering.
when the resilvering was completed I saw that another disk had bad sectors. so I replaced that disk too and started another resilver.
that was when the problems started. after resilvering it still kept in a degraded state, complaining about errors and after that it started to resilver again. and again, and again.
i have deleted all the corrupted files, and now started a scrub.
zool status is showing:

Code:

root@freenas:~ # zpool status
  pool: Data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the entire pool from backup.

  scan: scrub in progress since Wed Jan 10 13:22:48 2018
    1.08T scanned at 2.37G/s, 114G issued at 250M/s, 11.5T total
    0 repaired, 0.97% done, 0 days 13:18:27 to go

config:
    NAME                                              STATE     READ WRITE CKSUM
    Data                                              DEGRADED     0     0    15
      raidz1-0                                        DEGRADED     0     0    30
        gptid/a6879536-6bde-11e3-a032-902b3435866c    ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/3af6039e-d33f-11e1-b2ae-902b3435866c    ONLINE       0     0     0
        gptid/4f852428-5929-11e4-a4fb-902b3435866c    ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/1d8721c5-f0a4-11e7-9ee8-902b3435866c    ONLINE       0     0     0  block size: 512B configured, 4096B native
        replacing-4                                   DEGRADED     0     0     0
          14769619753741120542                        UNAVAIL      0     0     0  was /dev/gptid/3c987157-d33f-11e1-b2ae-902b3435866c
          gptid/2e7ad5b1-f2fd-11e7-b78b-902b3435866c  ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/3d61414c-d33f-11e1-b2ae-902b3435866c    ONLINE       0     0     0  block size: 512B configured, 4096B native
        gptid/3e270f46-d33f-11e1-b2ae-902b3435866c    ONLINE       0     0     0  block size: 512B configured, 4096B native

errors: 9 data errors, use '-v' for a list

i was running freeNAS 8.3 (yeah I know old) when the problems started, I updated to freeNAS 11.1. I haven't updated the zpool scared of losing my data.
I don't have a backup or anything and not losing my data is priority off course.
I had over 160 errors, deleted all the files. currently status -v displays:

Code:

 Data:<0xa521>
        Data:<0x10b30>
        Data:<0x12732>
        Data:<0x11a4a>
        Data:<0x3869>
        Data:<0x13770>
        Data:<0x2774>
        Data:<0x10286>
        Data:<0x3297>

I am unable to remove the old disk (of my second swap, the one with the bad sectors) from the zpool. When I try zpool offline Data 14769619753741120542 I get

Code:

cannot offline 14769619753741120542: no valid replicas

when I try to zpool remove Data 14769619753741120542 I get

Code:

cannot remove 14769619753741120542: only inactive hot spares, cache, top-level, or log devices can be removed

I don't understand this as the disk is not physically in present anymore. I got 7 disks and I resilvered too many times already. also I don't understand this line either:

Code:

replacing-4                                   DEGRADED     0     0     0

the msg

Code:

512B configured, 4096B native

behind the disk is new since freeNAS 11.1, don't know what to make from it. or if it is related.

I've read a lot and I am not any wiser. I'm pretty familiar with unix systems, but still on a level I need step-by-step guide what to do.

SirDice · Jan 10, 2018

It's called FreeNAS, not "freeness".

PC-BSD, FreeNAS, NAS4Free, and all other FreeBSD Derivatives

AngryAngel · Jan 10, 2018

SirDice said:
It's called FreeNAS, not "freeness".

PC-BSD, FreeNAS, NAS4Free, and all other FreeBSD Derivatives

I know, autocorrect of my macbook sometimes ignores me.

SirDice · Jan 10, 2018

AngryAngel said:
I know, autocorrect of my macbook sometimes ignores me.

Well, at least it was consistent

sko · Jan 10, 2018

AngryAngel said:
I have a 7 disk raidz1 array.

WHY??

AngryAngel said:
last week 1 disk failed complete, I replaced that and started resilvering.
when the resilvering was completed I saw that another disk had bad sectors.

So that other disk most likely already had errors which were detected while your pool was resilvering (long time no scrub?), so this might have caused data corruption due to lack of redundancy as you effectively had 2 failed disks.

AngryAngel said:
I replaced that disk too and started another resilver. that was when the problems started.

yep, sounds like raidz1 was working as expected and couldn't handle the simultaneous failure of 2 disks.

AngryAngel said:
when I try to zpool remove Data 14769619753741120542 I get

zpool(8)

zpool remove pool device ...

Removes the specified device from the pool. This command currently
only supports removing hot spares, cache, and log devices. A mirrored
log device can be removed by specifying the top-level mirror for the
log. Non-log devices that are part of a mirrored configuration can be
removed using the "zpool detach" command. Non-redundant and raidz
devices cannot be removed from a pool.

So try zpool detach, maybe first issuing a zpool offline on the device.
If it can't be detached and the pool stays in DEGRADED state, it means the resilvering wasn't finished successfully (due to lack of redundancy as you had effectively 2 failed disks at some time) and your data is lost.
If it can be detached, you might want/have to reset the error counters with zpool clear.

Using only a redundancy level of 1 out of 7 disks just cannot be considered safe, so you should have had backups of that data if it was important. At 7 disks you should use at least a redundancy level of 2 or - better - create multiple vdevs instead of putting all disks into one big vdev. A 2x3 raidz1 layout with 1 spare drive would have been much more reasonable and safe with 7 disks.

It seems the only advice left here: nuke from orbit, create a reasonable pool layout (with sufficient redundancy) and restore from backups. The fact that the pool was configured with the wrong block size for the disks (512B vs 4096B), which massively degrades performance due to amplification, is also a very good reason to start over.

AngryAngel · Jan 10, 2018

sko said:
WHY??

So that other disk most likely already had errors which were detected while your pool was resilvering (long time no scrub?), so this might have caused data corruption due to lack of redundancy as you effectively had 2 failed disks.

yep, sounds like raidz1 was working as expected and couldn't handle the simultaneous failure of 2 disks.

zpool(8)

So try zpool detach, maybe first issuing a zpool offline on the device.
If it can't be detached and the pool stays in DEGRADED state, it means the resilvering wasn't finished successfully (due to lack of redundancy as you had effectively 2 failed disks at some time) and your data is lost.
If it can be detached, you might want/have to reset the error counters with zpool clear.

Using only a redundancy level of 1 out of 7 disks just cannot be considered safe, so you should have had backups of that data if it was important. At 7 disks you should use at least a redundancy level of 2 or - better - create multiple vdevs instead of putting all disks into one big vdev. A 2x3 raidz1 layout with 1 spare drive would have been much more reasonable and safe with 7 disks.

It seems the only advice left here: nuke from orbit, create a reasonable pool layout (with sufficient redundancy) and restore from backups. The fact that the pool was configured with the wrong block size for the disks (512B vs 4096B), which massively degrades performance due to amplification, is also a very good reason to start over.

The reason for 7 disks is that my case has room for 7 3.5 inch disks.
except for the few files that were corrupted (which still I was able to play - .mkv files) all data is accessible and present. after the first disk failure, the resilver went ok except for a few errors in files. some bad sectors. then I replaced that other disk. I still can access all my files and play them. why should my data be lost, not is not logical.
do I need to wait for my scrub to be finished before clearing and trying to detach?
last time I did a clear immediately a resilver began automatic.
has it any use to upgrade my pool?

edit: I can access all my data, it is present. So I don't want to lose it. I don't know if my data currently is protected if another disk fails. I want to correct the zpool so that my data is correctly protected and is error free.

thanks

sko · Jan 10, 2018

AngryAngel said:
except for the few files that were corrupted (which still I was able to play - .mkv files) all data is accessible and present.

OK, I must have missed that info from your first post. Then you should be in luck and resilvering was finished before the second disk failed.

The disk is marked "UNAVAIL", so ZFS isn't doing any I/O on that device. Just try if you can zpool detach it and wait for the scrub to finish.

If the counters for all disks stays at 0, you can reset the counters for the pool and vdev with zpool clear.
If you still get errors and corrupted files reported after the scrub, MAKE A BACKUP and try to roll back the last transactions with zpool clear -F. Carefully read what this means in the zpool(8) manpage.

AngryAngel said:
The reason for 7 disks is that my case has room for 7 3.5 inch disks.

you could still have gone for a 2x3 raidz1 + 1 spare layout, which is much more reasonable and performs better. ZFS spreads the load across all vdevs - putting all disks in one vdev essentially limits performance to the slowest disk as all disks have to report success/failure on e.g. a write operation, so ZFS acknowledges the write to be complete and moving on. Also, as said, you can only survive a single disk failure for all 7 drives, which is _very_ optimistic.

ralphbsz · Jan 10, 2018

A few years ago, the CTO of NetApp (you know what company NetApp is, right?) said in public that selling a storage system that can only tolerate a single disk failure amounts to professional malpractice. He's right, and he is to be applauded for having the courage to say in public that RAID is not a panacea, and still requires a sensible system configuration.

Let's analyze your system design a little bit. You built a system with 7 disks. I'm going to assume that they are 2TB disks (reasonable guess, since the Zpool output above said that you have a total of ~12 TB usable capacity), and that they are consumer-grade disks. The typical MTBF specified for consumer-grade disks is ~1M hours, and the reality is probably half of that. With 7 disks, that works out to an annual probability of ~12% that one disk fails completely. I don't know how long you've had your array, but with a 12% failure probability per year, the complete failure of one drive is not surprising. The specified uncorrectable error rate for your drives is probably 10^-14, and lacking real-world experience we'll go by the published specification here. With 7 2TB drives, you have 1.12*10^14 bits in production, which means the probability of finding sector errors (at a probability of a single bit failing of 10^-14) of about 1. Which means that during a full rebuild after a disk failure, you are virtually guaranteed to get data loss due to a sector error. This scenario is called a "strip kill", where a double fault during RAID rebuild kills just one strip of data, and is the more likely version of a single-fault tolerant RAID failing. The less likely version is a "RAID kill", where complete failure of two disks kills the whole RAID array. By the way, even with enterprise-grade disks and an uncorrectable error rate of 10^-15, you would still be in trouble.

So what you experienced is to be expected. Old joke: "Doctor, it hurts when I do this." "Well, then stop doing it."

Now, what to do in the future? First and most important, don't configure your system this way. I sort of like sko's suggested configuration of two 2+P RAIDZ arrays (each with 2 disks worth of capacity, and can handle one failure), plus one spare disk. Gives you four disks' worth ot total capacity, but is still only single-fault tolerant. In such a configuration, I would be super religious about scrubbing, since you need to find sector errors early (you can't completely rely on RAID rebuild to save you, because of the math above). And run smartctl or smartd and look for the earliest signs of sick disks, and switch any sick disk against the spare - you can't afford to lose a disk. Personally, I would rather suggest taking 6 disks and configuring them for a RAIDZ2 (which can now tolerate two failures, for example one complete disk failure and a sector error during reconstruction after that failure). Also gives you 4 disks' worth of capacity, and you still keep one spare around. Because this configuration is not as fragile, you don't need to be as strict on scrubbing (more about that below). I would still run smartd or smartctl and actively look for trouble.

One problem with either setup is the scrubbing frequency. Modern disks are specified to only allow a certain amount of IO. One common figure is that disk vendors void the warranty on the drive if you perform more than 550 TB/year of IO, because doing that much IO will make the disk less reliable. And that includes reads, and therefore scrubbing. With 2TB disks, that means you can read the disk ~250 times per year, or a scrubbing frequency of roughly once every 1.5 days (assuming that normal IO is very little, which it tends to be for small personal or workgroup servers, which usually have mostly archival data, with a small amount of hot storage). What I would do: Set the scrubbing frequency to be safely below that (I'm running at home at once per week, because that's a convenient thing to code), but still do scrubbing as often as you can.

Other hints: (a) Upgrade to a very recent version of FreeNAS (done). (b) Do not ask FreeNAS questions here; this is a FreeBSD forum, and FreeNAS is not welcome (you already violated that rule, too late). (c) Create a new zpool, in a sensible configuration, and migrate your data onto it. (d) When you recreate your zpool, make sure it is done correctly for disks with 4096-byte blocks. I think on modern versions of ZFS that is automatic; on older versions, or if you are creating a pool with the first disk being a 512-byte model, you may have to first issue this command: "sysctl vfs.zfs.min_auto_ashift=12", but read about that in the documentation before following that advice blindly.

Now, you may ask: "How do I create a new zpool, given that my disk enclosure (known as JBOD in the storage business) is full, all 7 slots are occupied, maybe 1 is free because one disk is dead. And given that the 6 functioning disks are full of data". Well, you put yourself into a tough situation, and depending on your appetite for risk, it's going to be hard to get out of. The ideal solution would be to acquire a second JBOD, put new disks in, and migrate completely without destroying the current disk array. That's going to cost you a lot of money though. The question is, what's more valuable: Your data, or the disks it's stored on? Another question: Given that you have already determined that your current disks (the hardware itself) isn't very reliable, do you really want to continue using them? Here are a few bad options: Borrow a tape drive, back up all your data (multiple times), then wipe and reformat the existing disks. Very very risky, since it relies on being able to read the tape backup successfully. Buy one 10TB disk, put it in the slot freed by the dead drive, copy all the data to it (with a non-redundant ZFS pool), then wipe and reformat the existing disks. Again risky, because for a while all your data is in a non-redundant spot. Here would be my version, but it is a little bit of a hassle and costs some money: Buy or borrow an external enclosure which allows moving 3 of your existing disks out of your enclosure. It doesn't have to be great quality, since it is not permanent. Now you have 4 open disk slots in your enclosure. Buy four 6TB drives, and set them up as RAIDZ2 (that gives you 12TB of capacity, with 2-fault tolerance). Migrate your data over, take the old disks and put them into a shoe box for long-term storage. Then buy one more 6TB disk of the same model, and install it as a spare. I think this is a good compromise between cost and hassle.

AngryAngel · Jan 10, 2018

sko said:
OK, I must have missed that info from your first post. Then you should be in luck and resilvering was finished before the second disk failed.

The disk is marked "UNAVAIL", so ZFS isn't doing any I/O on that device. Just try if you can zpool detach it and wait for the scrub to finish.

If the counters for all disks stays at 0, you can reset the counters for the pool and vdev with zpool clear.
If you still get errors and corrupted files reported after the scrub, MAKE A BACKUP and try to roll back the last transactions with zpool clear -F. Carefully read what this means in the zpool(8) manpage.

you could still have gone for a 2x3 raidz1 + 1 spare layout, which is much more reasonable and performs better. ZFS spreads the load across all vdevs - putting all disks in one vdev essentially limits performance to the slowest disk as all disks have to report success/failure on e.g. a write operation, so ZFS acknowledges the write to be complete and moving on. Also, as said, you can only survive a single disk failure for all 7 drives, which is _very_ optimistic.

Thanks for the info. the detach gave this error:

Code:

cannot detach 14769619753741120542: no valid replicas

The scrub is finished tomorrow, clear and try another detach cmd?
is my data currently protected against another failure?

What does

Code:

replacing-4                                   DEGRADED     0     0     0

means? da4 (the fifth disk) was the second one replaced. does this relate to 14769619753741120542 disk? as this is the only thing in degraded state.

by making a backup, is this the same to backup all data meaning that I need to have another 12TB storage lying around?

I have another disk here, is it easy to make it a raidz2 array of this pool?

AngryAngel · Jan 10, 2018

ralphbsz said:
A few years ago, the CTO of NetApp (you know what company NetApp is, right?) said in public that selling a storage system that can only tolerate a single disk failure amounts to professional malpractice. He's right, and he is to be applauded for having the courage to say in public that RAID is not a panacea, and still requires a sensible system configuration.

Let's analyze your system design a little bit. You built a system with 7 disks. I'm going to assume that they are 2TB disks (reasonable guess, since the Zpool output above said that you have a total of ~12 TB usable capacity), and that they are consumer-grade disks. The typical MTBF specified for consumer-grade disks is ~1M hours, and the reality is probably half of that. With 7 disks, that works out to an annual probability of ~12% that one disk fails completely. I don't know how long you've had your array, but with a 12% failure probability per year, the complete failure of one drive is not surprising. The specified uncorrectable error rate for your drives is probably 10^-14, and lacking real-world experience we'll go by the published specification here. With 7 2TB drives, you have 1.12*10^14 bits in production, which means the probability of finding sector errors (at a probability of a single bit failing of 10^-14) of about 1. Which means that during a full rebuild after a disk failure, you are virtually guaranteed to get data loss due to a sector error. This scenario is called a "strip kill", where a double fault during RAID rebuild kills just one strip of data, and is the more likely version of a single-fault tolerant RAID failing. The less likely version is a "RAID kill", where complete failure of two disks kills the whole RAID array. By the way, even with enterprise-grade disks and an uncorrectable error rate of 10^-15, you would still be in trouble.

So what you experienced is to be expected. Old joke: "Doctor, it hurts when I do this." "Well, then stop doing it."

Now, what to do in the future? First and most important, don't configure your system this way. I sort of like sko's suggested configuration of two 2+P RAIDZ arrays (each with 2 disks worth of capacity, and can handle one failure), plus one spare disk. Gives you four disks' worth ot total capacity, but is still only single-fault tolerant. In such a configuration, I would be super religious about scrubbing, since you need to find sector errors early (you can't completely rely on RAID rebuild to save you, because of the math above). And run smartctl or smartd and look for the earliest signs of sick disks, and switch any sick disk against the spare - you can't afford to lose a disk. Personally, I would rather suggest taking 6 disks and configuring them for a RAIDZ2 (which can now tolerate two failures, for example one complete disk failure and a sector error during reconstruction after that failure). Also gives you 4 disks' worth of capacity, and you still keep one spare around. Because this configuration is not as fragile, you don't need to be as strict on scrubbing (more about that below). I would still run smartd or smartctl and actively look for trouble.

One problem with either setup is the scrubbing frequency. Modern disks are specified to only allow a certain amount of IO. One common figure is that disk vendors void the warranty on the drive if you perform more than 550 TB/year of IO, because doing that much IO will make the disk less reliable. And that includes reads, and therefore scrubbing. With 2TB disks, that means you can read the disk ~250 times per year, or a scrubbing frequency of roughly once every 1.5 days (assuming that normal IO is very little, which it tends to be for small personal or workgroup servers, which usually have mostly archival data, with a small amount of hot storage). What I would do: Set the scrubbing frequency to be safely below that (I'm running at home at once per week, because that's a convenient thing to code), but still do scrubbing as often as you can.

Other hints: (a) Upgrade to a very recent version of FreeNAS (done). (b) Do not ask FreeNAS questions here; this is a FreeBSD forum, and FreeNAS is not welcome (you already violated that rule, too late). (c) Create a new zpool, in a sensible configuration, and migrate your data onto it. (d) When you recreate your zpool, make sure it is done correctly for disks with 4096-byte blocks. I think on modern versions of ZFS that is automatic; on older versions, or if you are creating a pool with the first disk being a 512-byte model, you may have to first issue this command: "sysctl vfs.zfs.min_auto_ashift=12", but read about that in the documentation before following that advice blindly.

Now, you may ask: "How do I create a new zpool, given that my disk enclosure (known as JBOD in the storage business) is full, all 7 slots are occupied, maybe 1 is free because one disk is dead. And given that the 6 functioning disks are full of data". Well, you put yourself into a tough situation, and depending on your appetite for risk, it's going to be hard to get out of. The ideal solution would be to acquire a second JBOD, put new disks in, and migrate completely without destroying the current disk array. That's going to cost you a lot of money though. The question is, what's more valuable: Your data, or the disks it's stored on? Another question: Given that you have already determined that your current disks (the hardware itself) isn't very reliable, do you really want to continue using them? Here are a few bad options: Borrow a tape drive, back up all your data (multiple times), then wipe and reformat the existing disks. Very very risky, since it relies on being able to read the tape backup successfully. Buy one 10TB disk, put it in the slot freed by the dead drive, copy all the data to it (with a non-redundant ZFS pool), then wipe and reformat the existing disks. Again risky, because for a while all your data is in a non-redundant spot. Here would be my version, but it is a little bit of a hassle and costs some money: Buy or borrow an external enclosure which allows moving 3 of your existing disks out of your enclosure. It doesn't have to be great quality, since it is not permanent. Now you have 4 open disk slots in your enclosure. Buy four 6TB drives, and set them up as RAIDZ2 (that gives you 12TB of capacity, with 2-fault tolerance). Migrate your data over, take the old disks and put them into a shoe box for long-term storage. Then buy one more 6TB disk of the same model, and install it as a spare. I think this is a good compromise between cost and hassle.

First off all thanks for the extensive reply.
I have this setup when FreeNAS 8.3 was latest, so I guess about 6-7 years or so. currently those 2 disks were disk no. 3 and 4 failing. So guess math wise I cannot complain.
Thanks for the hits, under b I did not know I couldn't ask FreeNAS questions here, I chose this forum because I expected most experienced ppl here

looks like I am right. Under d, the pool was created with version 8.1 I believe, is this something I can fix now or only when I make new pool?
My main setup for the NAS is backup of private photo's part I don't want to lose. and movie/series library (most of the TB's) that would be very unfortunate however I can 'acquire' all files again.
Don't have deep pockets so probably looking at the single disk option if I recreate the zpool. Is it easy to add another drive and make my current pool raidz2?
My current freeNAS disk would move from the HD controller to the motherboard sata connector for the 8th disk. my current setup you can see here: https://tweakers.net/gallery/1082/inventaris/ under NAS

`thanks again

sko · Jan 10, 2018

AngryAngel said:
I have another disk here, is it easy to make it a raidz2 array of this pool?

RAIDZ vdevs are carved in stone once created - there is code on its way to allow some more flexibility, but I wouldn't expect seeing that landing in anything near-stable for another year or more as this requires rewrite of some major parts of core logic.
Personally I only use mirrors in my pools at home as they are much more flexible and cheaper/easier to upgrade: just add another 2 disks and keep a big enough one as spare in the pool. With raidz vdevs you absolutely should only extend the pool with similarly configured raidz vdevs (e.g. a pool with 2x 4-drive RAIDZ2 with another 4-drive RAIDZ2). So instead of only 2 you have to buy 4 drives at once.

AngryAngel said:
Thanks for the info. the detach gave this error:

Code:

cannot detach 14769619753741120542: no valid replicas

The scrub is finished tomorrow, clear and try another detach cmd?
is my data currently protected against another failure?

If the pool currently consists of 7 known good members in your raidz1 vdev, then it *might* be protected against another disk failure. Because at this state some data corruption is already present and some more may still lurk in the dark, I would consider this pool to be at high risk of complete failure. So my recommendation still is: back up all important data ASAP.

AngryAngel said:
by making a backup, is this the same to backup all data meaning that I need to have another 12TB storage lying around?

If your pool is completely filled up, you'd need a 12TB drive. You only need a drive that can hold all your data that's actually stored on that pool (+ some headroom).

I presume you have at least one unused SATA/SAS port in your system, as 7 is not an usual number of lanes for a controller. Often 8 ports are provided by some HBA or on-board controller and 2-4 (sometimes slower and only SATA) ports are provided by the chipset or CPU. A total of 2 free, usable connectors would be sufficient to make a fully redundant backup of your data to a new pool on a 2-way mirror.
These two disks (or the pool) could be used for periodic backups after migrating the data to the new pool, so these disks are not "wasted money" just for the migration.

ralphbsz · Jan 10, 2018

AngryAngel said:
Under d, the pool was created with version 8.1 I believe, is this something I can fix now or only when I make new pool?

You can not change the RAID code (RAIDZ -> RAIDZ2), nor can you fix the 4096-byte block alignment. You will have to recreate your pool.

Don't have deep pockets so probably looking at the single disk option if I recreate the zpool. Is it easy to add another drive and make my current pool raidz2?

No, even adding another disk will NOT convert it to RAIDZ2. If you want to keep your old disks, you will have to perform a two-step operation: Copy all the data off the current pool, then destroy and recreate the pool on the old disks. With a 10 or 12TB drive you can do that with a single temporary drive, but those drives still cost big $. But as sko said, you will also need good backups, and that 10-12TB drive might be a good investment in that direction. Personally, I like backup to disk, and ZFS gives you nice features to do that quickly and efficiently (snapshots, send-receive, and that stuff). Once you have the data on that extra drive, you would have to perform the scary operation: Destroy your current pool (at this moment, you are relying on a single drive! scary!), recreate a new pool from the old drives, and copy back. I don't know what your level of experience is, nor your risk tolerance. At home, I would do it, but only after copying the really important files (the ones that recreating will be a lot of work) to another storage medium.

Another remark: Your current drives are not young any more. Unlike men, women, and wine, disks don't get better with age. Look at the numbers up there again: With 7 drives, you can expect a complete failure roughly every 5-10 years, and sector error rates will be climbing (you are probably already seeing that effect). I understand that you are not rolling in money (who is?), but replacement of the drives should go on your to-do list. This is another good option, different from buying a backup/temporary drive: Start buying your new production drives now, format the new pool correctly, and move the data. The transition is less scary (at no point are you non-redundant), but it may cause hardware (not enough ports, not enough space, not enough power) and financial issues.

My current freeNAS disk would move from the HD controller to the motherboard sata connector for the 8th disk.

Fine, the motherboard SATA should work just fine for this.

Snurg · Jan 10, 2018

And, never never use WD Greens on non-Windows OSes. (You might be able patch them, but I won't depend on that.)
May I ask out of curiosity, were it the greens that failed?

Regarding the errors, old drives produce more, as ralphbsz said correctly.
This is the smart output of my desktop computers' 6 year old HGST da0:

Code:

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0    73961         0     73961     236284       3802.274           0
write:         0   450326         0    450326     357818       5691.472           0
verify:        0  4837197         0   4837197      57228     172401.497           0

So don't worry because of a few harmless errors.

AngryAngel · Jan 10, 2018

Snurg said:
And, never never use WD Greens on non-Windows OSes. (You might be able patch them, but I won't depend on that.)
May I ask out of curiosity, were it the greens that failed?

Regarding the errors, old drives produce more, as ralphbsz said correctly.
This is the smart output of my desktop computers' 6 year old HGST da0:

Code:

Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 0 73961 0 73961 236284 3802.274 0 write: 0 450326 0 450326 357818 5691.472 0 verify: 0 4837197 0 4837197 57228 172401.497 0

So don't worry because of a few harmless errors.

They were, red didn't exist at that time yet. I had a bit of sense to buy greens from different age and production time to lessen the chance of double failure at the same time.

I had 3 greens fail and 1 with bad sectors, currently all replaced with 2TB Reds.

AngryAngel · Jan 10, 2018

sko said:
RAIDZ vdevs are carved in stone once created - there is code on its way to allow some more flexibility, but I wouldn't expect seeing that landing in anything near-stable for another year or more as this requires rewrite of some major parts of core logic.
Personally I only use mirrors in my pools at home as they are much more flexible and cheaper/easier to upgrade: just add another 2 disks and keep a big enough one as spare in the pool. With raidz vdevs you absolutely should only extend the pool with similarly configured raidz vdevs (e.g. a pool with 2x 4-drive RAIDZ2 with another 4-drive RAIDZ2). So instead of only 2 you have to buy 4 drives at once.

If the pool currently consists of 7 known good members in your raidz1 vdev, then it *might* be protected against another disk failure. Because at this state some data corruption is already present and some more may still lurk in the dark, I would consider this pool to be at high risk of complete failure. So my recommendation still is: back up all important data ASAP.

If your pool is completely filled up, you'd need a 12TB drive. You only need a drive that can hold all your data that's actually stored on that pool (+ some headroom).

I presume you have at least one unused SATA/SAS port in your system, as 7 is not an usual number of lanes for a controller. Often 8 ports are provided by some HBA or on-board controller and 2-4 (sometimes slower and only SATA) ports are provided by the chipset or CPU. A total of 2 free, usable connectors would be sufficient to make a fully redundant backup of your data to a new pool on a 2-way mirror.
These two disks (or the pool) could be used for periodic backups after migrating the data to the new pool, so these disks are not "wasted money" just for the migration.

Thank you again, as you all suggested I am going to link in to recreate a good zpool.
However on short term what to do about:

Code:

replacing-4                                   DEGRADED     0     0     0
          14769619753741120542                        UNAVAIL      0     0     0  was /dev/gptid/3c987157-d33f-11e1-b2ae-902b3435866c

tomorrow the scrub is finished. I will then clear errors, but last time it triggered a resilver action. I think because of the degraded "replacing-4" and the unavail disk. How do I fix this, or should the scrub fix this for me? and allow me to offline/remove/detach that ghost disk.

ralphbsz · Jan 11, 2018

Let scrub finish. I think because the disk is unavail, you can just detach it. But I'm not 100% sure that will work for disks in a RAIDZ pool; never tried it myself. If it doesn't work, post here, and hopefully someone else will have a better idea.

Eric A. Borisch · Jan 11, 2018

For those looking to have periodic scrubs done, don’t roll your own when it already exists:

 $ grep scrub /etc/defaults/periodic.conf

# 800.scrub-zfs

daily_scrub_zfs_enable="NO"

daily_scrub_zfs_pools=""                        # empty string selects all pools

daily_scrub_zfs_default_threshold="35"          # days between scrubs

#daily_scrub_zfs_${poolname}_threshold="35"     # pool specific threshold

Override these in /etc/periodic.conf (starting with _enable, and others if needed) and sit back and relax.

AngryAngel · Jan 11, 2018

The scrub has finished and the unavail disk magically disappeared. no errors. FreeNAS is complaining about the pool being to full (91% instead of below 80% of capacity) but that's it. Thanks all for the info
Last question is it smart to do a zpool upgrade now?

ralphbsz · Jan 11, 2018

The magic disappearing makes sense, and I should have guessed this was going to happen: ZFS knows the disk is unavailable (duh), and the device visible in the zpool status is sort of a ghost, needed to do the accounting of metadata for things that still point at the unavailable disk and that still need to be resilvered elsewhere. Once the resilvering is done, nothing is "stored" on the unavailable disk, and there is no need to keep its ghost around.

Obviously, the capacity of your pool is now smaller (it has fewer disks), with resilvering having finished, and therefore it is more full; and we discussed above that there are various ways forward.

It seems to me that with your zpool now in good shape (albeit a little smaller), a zpool upgrade is fine. But that still won't fix the ashift problem (block size 512 versus 4096); you'll have to recreate the pool eventually anyway.

ZFS Zpool Degraded state

AngryAngel

SirDice

Administrator

AngryAngel

SirDice

Administrator

sko

AngryAngel

sko

ralphbsz

AngryAngel

AngryAngel

sko

ralphbsz

Snurg

AngryAngel

AngryAngel

ralphbsz

Eric A. Borisch

AngryAngel

ralphbsz