1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Unstoppable resilver

Discussion in 'Storage' started by simplex, Apr 30, 2012.

  1. simplex

    simplex New Member

    Messages:
    9
    Thanks Received:
    0
    Hi, I've a problem with my ZFS pool on FreeBSD 8.3-RELEASE. The pool is version 15, composed by four disks, two mirror. This was the situation: I had a faulty disk in the second mirror and before I was able to replace it the other one started having problems ("Already active DMA on this device"). I've fixed it disabling DMA (but now I've other errors in dmesg, but that's another problem..).

    After booting without DMA the system was able to mount the ZFS pool and the data looked ok. I've replaced the originally dead disk and I've started the resilver. This is the situation now:
    Code:
      pool: pr0nserv
     state: DEGRADED
    status: One or more devices is currently being resilvered.  The pool will
    	continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
      scan: resilver in progress since Mon Apr 30 10:29:09 2012
            284G scanned out of 1.18T at 277M/s, 0h56m to go
            16.6G resilvered, 23.51% done
    config:
    
    	NAME                       STATE     READ WRITE CKSUM
    	pr0nserv                   DEGRADED     0     0   108
    	  mirror-0                 ONLINE       0     0     0
    	    ad4                    ONLINE       0     0     0
    	    ad6                    ONLINE       0     0     0
    	  mirror-1                 DEGRADED     0     0   648
    	    replacing-0            DEGRADED   648     0     0
    	      6530854401941125969  OFFLINE      0     0     0  was /dev/ad8/old
    	      ad8                  ONLINE       0     0   648  (resilvering)
    	    ad10                   ONLINE       0     0   648
    
    errors: Permanent errors have been detected in the following files:
    
            <metadata>:<0x87>
    

    The problem:
    I have an error on metadata, and I can't get rid of it. The resilver keeps restarting again and again. If I reboot it, resilver again, if I do a # zpool clear pr0nserv or a # zpool clear pr0nserv mirror-1 the resilver restarts again. I've removed two files that were corrupted but I can't "fix" the metadata error. I think that a scrub could fix it but I can't scrub because it will resilver again :(

    If someone knows how to fix it, please tell me.

    I think that a brutal way to fix this could be copy all the files that are in the second mirror, remove it, re-create it and copy the files back but I would avoid this, if possible.

    Thanks.
     
  2. simplex

    simplex New Member

    Messages:
    9
    Thanks Received:
    0
    Looks like I've solved with a # zpool detach pr0nserv 6530854401941125969
    Now I'm scrubbing to see if that fixes the metadata error.
     
  3. simplex

    simplex New Member

    Messages:
    9
    Thanks Received:
    0
    Looks like it's not finished...
    I've upgraded the pool to version 28, scrubbed again and cleared the errors but the metadata error is still here:
    Code:
    [root@pr0nserv ~]# zpool status -v
      pool: pr0nserv
     state: ONLINE
    status: One or more devices has experienced an error resulting in data
            corruption.  Applications may be affected.
    action: Restore the file in question if possible.  Otherwise restore the
            entire pool from backup.
       see: http://www.sun.com/msg/ZFS-8000-8A
      scan: scrub repaired 389K in 16h37m with 1 errors on Wed May  2 04:15:07 2012
    config:
    
            NAME        STATE     READ WRITE CKSUM
            pr0nserv    ONLINE       0     0     0
              mirror-0  ONLINE       0     0     0
                ad4     ONLINE       0     0     0
                ad6     ONLINE       0     0     0
              mirror-1  ONLINE       0     0     0
                ad8     ONLINE       0     0     0
                ad10    ONLINE       0     0     0
    
    errors: Permanent errors have been detected in the following files:
    
            <metadata>:<0x87>
    


    Does someone know how to fix it without destroying and re-creating the pool?
    Thanks.
     
  4. simplex

    simplex New Member

    Messages:
    9
    Thanks Received:
    0
    I've rebooted the machine and the resilver started again :(