ZFS+HAST Replication time after drive failure

I was trying to find more input on hastd with ZFS and replication link speeds. I have >40 2T drives and if a disk fails, I'm seeing fullsync being performed, resulting in 1.8T of data that has to go from one JBOD chassis to the other. Current speeds are pretty unacceptable and was wondering if there any tunables in hast.conf(5) here that I could make use of.

I have a lagg(4) bond of 4 interfaces, each 10gbps, for a total of 40gbps running in round robin directly between the two heads. gstat(8) shows the following speeds:

Code:
dT: 1.002s  w: 1.000s  filter: da6
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    172    152  19417    0.9     10   1198    0.7   24.1| da6
    0    172    152  19417    0.9     10   1198    0.7   24.3| da6p1

suffice it to say... this is pretty darn slow. I've seen the r/s kBps reach as high as 32k, but often lingers around 18-19. The zpool has (at this time) zero activity... so I should be able to replicate 2T of disk in a matter of minutes to hours, not (at this rate) weeks.

I found the following thread freebsd.devel.file-systems/9794. It points to proto_common.c back in 8-RELEASE which I'm running 9.1-RELEASE so I would be under the impression that this has since changed. I have, however, installed hastd from pkg_add and not from ports.

Any help to get these devs replicating much faster would be appreciated. The disks are connected via 6gbps SAS, and with the 40gbps lagg between them, I should be rocking that replication link.
 
Back
Top