ZFS performes well on degraded state but resilvering is horrible slow

gonzo83 · Sep 6, 2011

Hello,

im using ZFS version 28 on FreeBSD 8.2 with raidz1 over 8x 1tb devices
with zpool version 26. The pool goes degraded because one drive was
faulty. I replaced the faulted drive with a new one and resilvering starts.

Now resilver is in progress but is on horrible slow rate ~6MB/sec
and guessed to take 340 hours to complete while system is idle.

By this time i copied the whole pool with rsync to another raidz1
pool on same machine with a transferate ~200MB/sec.

This solves the problem of degraded data but not the issue on horrible
slow resilvering while pool is degraded.

How can i speed up the resilvering of degraded pools?

thx for any help

PS: at this time the pool is still degraded and resilver is in progress but data is backed up.

SirDice · Sep 6, 2011

It will take a while. Keep in mind it has to read 7TB of data and write another 1TB to 'rebuild' the replaced harddrive.

gonzo83 · Sep 6, 2011

thats correct.

but why i can copy the whole pool (7tb) within ~14 hours where lost data must also restored from parity and new parity must calulated on target pool but resilvering take near 2 weeks to restore.

there must be an issue wich slows down the resilvering.

SirDice · Sep 6, 2011

gonzo83 said:
but why i can copy the whole pool (7tb) within ~14 hours where lost data must also restored from parity and new parity must calulated on target pool but resilvering take near 2 weeks to restore.

Yes, but in this case one entire pool is only reading while the other (remote) pool is only writing.

With resilvering it'll read, write, read, write, read, write etc on the same pool.

there must be an issue wich slows down the resilvering.

That's also possible, I haven't needed to resilver anything yet so I can't comment exactly on how long it'll take. I can tell you that rebuiling a mirror set on Solaris takes forever so I'm assuming it does on FreeBSD too.

swallowtail_butterfly · Sep 6, 2011

Try increasing vfs.zfs.txg.synctime_ms sysctl, see illumos@b118bbd65be9. Mark it with read-write and tune according to load without rebooting.

Terry_Kennedy · Sep 7, 2011

gonzo83 said:
im using ZFS version 28 on FreeBSD 8.2 with raidz1 over 8x 1tb devices
with zpool version 26. The pool goes degraded because one drive was
faulty. I replaced the faulted drive with a new one and resilvering starts.

Now resilver is in progress but is on horrible slow rate ~6MB/sec
and guessed to take 340 hours to complete while system is idle.

By this time i copied the whole pool with rsync to another raidz1
pool on same machine with a transferate ~200MB/sec.

This solves the problem of degraded data but not the issue on horrible
slow resilvering while pool is degraded.

This might be related to an issue I encountered with scrub performance. On a ZFS 15 pool on 8-STABLE, scrub ran at approximately 600MB/sec. An exact copy of the data on a ZFS 28 pool, also on 8-STABLE, identical hardware configuration, started out by reporting speeds between 8 and 20MB/sec, with corresponding estimated completion times way in the future. With the very slow scrub going, I could still read the pool at 400-500MB/sec.

After a day or so of scrubbing at the slow speed, it inexplicably started processing data at the expected rate, and completed in another 7 hours.

I have no idea why this is happening. Keep an eye on your resilver and see if it eventually speeds up.

Sylhouette · Sep 8, 2011

To my understanding it is not wise to use 8 or more drives in a raidz{1,2,3} pool
I believe that 6 is the ideal number.
So you could better try making two raidz vdevs of four drives.
However you will loose some space!!

In the archives i found the thread about this.
Here it is also mentioned that resilvering could take weeks. (this was a raidz vdev of 24 drives), but it explains why resilvering takes a loooong time.
The post from Phoenix mentioned this.

http://forums.freebsd.org/archive/index.php/t-4641.html

regards.
Johan Hendriks

ZFS performes well on degraded state but resilvering is horrible slow

gonzo83

SirDice

Administrator

gonzo83

SirDice

Administrator

swallowtail_butterfly

Terry_Kennedy

Sylhouette