ZFS Help! Permanent errors due to a single file. But when I remove that file, it goes down

Here is my zpool status:
Code:
[root@suennas ~]# zpool status -v                                                                                                  
  pool: NewVol                                                                                                                     
state: ONLINE                                                                                                                     
status: One or more devices has experienced an error resulting in data                                                             
        corruption.  Applications may be affected.                                                                                 
action: Restore the file in question if possible.  Otherwise restore the                                                           
        entire pool from backup.                                                                                                   
   see: http://illumos.org/msg/ZFS-8000-8A                                                                                         
  scan: resilvered 3.20G in 0h7m with 3 errors on Wed Feb  3 22:18:56 2016                                                         
config:                                                                                                                            
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        NewVol                                          ONLINE       0     0     3                                                 
          raidz2-0                                      ONLINE       0     0     0                                                 
            gptid/66d6ac98-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
            gptid/67c78687-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
            gptid/68b5de0f-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
            gptid/6a0626d7-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
          raidz2-1                                      ONLINE       0     0     6                                                 
            gptid/6acd4172-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
            gptid/ff9c269c-9845-11e5-9ae5-009027e2e92c  ONLINE       0     0     1                                                 
            gptid/6cc589d3-95ca-11e5-8234-009027e2e92c  ONLINE       0     0     0                                                 
            gptid/7c93a77f-985e-11e5-9ae5-009027e2e92c  ONLINE       0     0     0                                                 
                                                                                                                                   
errors: Permanent errors have been detected in the following files:                                                                
                                                                                                                                   
        /mnt/NewVol/Video/Animation/2013/Gundam Build fighter/Gundam Build Fighters Special - Build Fighters TV 04 (SP 2013 BDrip x2
64 720p 2ch AAC)-SvM.mkv                                                                                                           
                                                                                                                                   
  pool: freenas-boot                                                                                                               
state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h0m with 0 errors on Wed Jan 13 03:45:21 2016                                                         
config:                                                                                                                            
                                                                                                                                   
        NAME                                          STATE     READ WRITE CKSUM                                                   
        freenas-boot                                  ONLINE       0     0     0                                                   
          gptid/3c5abead-e8ea-11e4-86f2-009027e2e92c  ONLINE       0     0     0                                                   
                                                                                                                                   
errors: No known data errors
[FONT=Courier New][/FONT]
[FONT=Arial]As shown, the errors are caused by a file.

[FONT=Arial]S[/FONT]o I go to remove it[/FONT]
Code:
[root@suennas /mnt/NewVol/Video/Animation/2013/Gundam Build fighter]# ls Gundam*04*                                               
Gundam Build Fighters Special - Build Fighters TV 04 (SP 2013 BDrip x264 720p 2ch AAC)-SvM.mkv                                     
[root@suennas /mnt/NewVol/Video/Animation/2013/Gundam Build fighter]# rm -rf Gundam*04*                                           
[root@suennas /mnt/NewVol/Video/Animation/2013/Gundam Build fighter]#
[FONT=Arial]The system show a list of errors in the main console, too fast, can't read.

[FONT=Arial]T[/FONT]he system then reboots.

[FONT=Arial]B[/FONT]ut the problem file still there.

How can I solve this error?[/FONT]
 
Have a look on the link that's mentioned: http://illumos.org/msg/ZFS-8000-8A
Damaged files may or may not be able to be removed depending on the type of corruption. If the corruption is within the plain data, the file should be removable. If the corruption is in the file metadata, then the file cannot be removed, though it can be moved to an alternate location. In either case, the data should be restored from a backup source. It is also possible for the corruption to be within pool-wide metadata, resulting in entire datasets being unavailable. If this is the case, the only option is to destroy the pool and re-create the datasets from backup.
 
This actually happened to me not too long ago. I did in fact follow the advice from the illumos link and not remove the file (I was paranoid enough not to do so anyway). Thankfully I did have a backup pool that was on another machine. Since the second machine was live, it was simply easier (and faster) for me to switch to it than to try and mitigate any disk related issues on the first one. Having a backup pool with regularly scheduled backups is very helpful.
 
You could try "overwriting" the file with something else. I say "overwriting" because, as you may or may not know, ZFS is a Copy-On-Write filesystem so you're never actually overwriting anything. But doing so may free up the data blocks that are now tied to the corrupt file. Unless they're referenced in clones or snapshots the data blocks themselves would be freed. Hopefully this will clear up the corruption too.

Be prepared to spend a night restoring backups though. It's corrupt and mucking about with it may in fact make things worse.
 
May I ask how a single files metadata can be corrupted in a in a double parity Raidz2, while scrub detected no further errors that are corrected by redundancy???
 
Last edited by a moderator:
Are you using ECC ram? If not I can easily imagine a scenario where the metadata gets corrupted in memory and ZFS happily writes back nonsense to the disk (with redundancy!!!) and later when it reads back and tries to use the nonsense you get exactly what is seen in your case.

To put in another way, ZFS is designed to protect the data only from errors on the storage media, any other error outside the storage media might slip past the radar and cause irreversible data corruption.
 
Are you using ECC ram? If not I can easily imagine a scenario where the metadata gets corrupted in memory and ZFS happily writes back nonsense to the disk (with redundancy!!!) and later when it reads back and tries to use the nonsense you get exactly what is seen in your case.

To put in another way, ZFS is designed to protect the data only from errors on the storage media, any other error outside the storage media might slip past the radar and cause irreversible data corruption.

And its for that reason until I buy/build a machine with ECC I will continue to use UFS.
 
And its for that reason until I buy/build a machine with ECC I will continue to use UFS.
With UFS, you don't even know the file is corrupted, UFS without ECC ram doesn't mean your data will be safe too, it can be worst than ZFS.

Even the error message on ZFS is unrecoverable, but you still been alerted. If you have the backup then it can be manually restored. With UFS , you know nothing, corruption just happens silently so eventually you may lose more data than using ZFS.
 
That's fine, the risk is in my favour.
I mean even on NON-ECC machine, ZFS still far more superior than UFS, ECC is highly recommended for ZFS but not mandatory.

Storage without ECC ram means you might be in risk of corruption, regardless any OS or file system you have chosen.
 
bachmarc said:
May I ask how a single files metadata can be corrupted in a in a double parity Raidz2, while scrub detected no further errors that are corrected by redundancy???

ZFS will protect data on-disk, but if a process altered that data in any way, and was suddenly interrupted during that alteration, corruption can still occur. Limitations of physical hardware mean there's still going to be a minute time delay between writes happening in RAM, writes happening in one disk on a mirror/array, and writes happening to the rest of the mirror/array. ZFS will try to rebuild the data if it can, and will tell you if it can't, but it offers no guarantee that nothing will ever go wrong. Really, the fact that only a single file was corrupted while others could have been, and that the pool continued working after the problem occurred, is what makes ZFS better than the alternatives.

I mean even on NON-ECC machine, ZFS still far more superior than UFS, ECC is highly recommended for ZFS but not mandatory.

That's pretty subjective, dependent on the importance of the data and the use case. ZFS has a higher standard of data protection, but most of its features only make it "superior" if you have a reason to use them.

PacketMan: ECC RAM and ZFS aren't bundled together in any way, and (as far as I can tell) the recommendation to use it is only intended for enterprise environments. Using ZFS doesn't require ECC RAM, and using ZFS without ECC RAM simply means that your data is still susceptible to a particular risk: that it can become corrupted in memory, and written back to disk in a mangled state. All filesystems, including UFS, face this problem. The only difference is that ZFS will realize you your data is trashed and tell you so, while UFS will not.

If you ask me (a total non-expert), ECC memory is really only vitally useful in instances where losing a file for a couple hours could cost you someone's business. In a case such as the OP's, a single corrupted video file can just be restored from a backup at leisure, like always.
 
Even if you do not have ECC memory you at least are protected by parity memory and would know if you had a bad memory location and it would be reported. The advantage of ECC is that is corrects single bit errors, but a double bit error would be reported same as a parity error, "critical". Anyway either memory will detect corrupt data and report it to the OS.

I have several FreeNAS servers and they are used for business. Two of them have 6-core XEON processors, 64GB of ECC RAM and 8 2TB SAS drives. When I first started using FreeNAS on the higher end hardware I would occasionally get auto generated e-mails that the system had some ECC single bit errors, but a few months ago I upgraded to the latest FreeNAS and no more errors. My other FreeNAS servers were just old Windows desktop computers and I simply installed a pair of 4TB SATA drives to have repositories to export video files and SQL Server database backup files.
 
Back
Top