I just wanted to report a behaviour with a resilver I'm currently doing. It may help somebody else to remain calm and patient during a resilver.
I noticed my backup script failed, then realized my backup pool was degraded. One of the 1TB drives had failed. If I was in the office, I would have tried popping the drive out and in again. Occassionally on my JBOD, the drive stops responding. Popping it out/in again, followed by a zpool online susually resolves it. The drives are usually perfectly OK, just a SAS controller or backplane hicup I guess.
Anyway, since I'm at home, I decided to detach and re-attach instead, since I'm using mirrors in all my vdevs, not raidz. That was a mistake! It would have been faster for me to cycle to office and pop the drive!
I then decided to attach a spare, so that I had redundancy, until the next day, where I could get the original drive attached (hopefully).
So after attaching the spare drive, I noticed it's very slow to resilver. So I know from scrubing that it can speed up if I wait a little while. But after 2 hours, it's still moving extremely slowly. And I'm frustrated because I just want it to mirror the blocks from one drive to another. Being a mirror, that should be simple. After more than 2 hours, it stills says that it will take 600 hours to complete, only having resilvered 1.91G (0.27%). Now I'm thinking "what are my options". And I very nearly decided to update to FreeBSD 8.4 (from 8.3) to get the feature flags 5000, and hopefully a faster resilver.
BUT ALAS, after 10 more minutes whilst I surf about slow zfs resilvering, it seems to speed up dramatically!
Here is my output, the first zpool status was after 2 hours, the second was about 10 mins later:
So, the moral of the story is: zfs and it's scrubs and resilvers (replace/attach) is a complex beast. Perhaps just wait just a bit longer before doing something drastic like rebooting or updating, to get a newer zfs.
Tips to stay relaxed: use 2 drive redundancy (raidz2 or 3-way mirror). That way you can relax during a resilver / scrub.
I use 3-way mirrors on vdevs, with cheap WD blacks to offset the cost on my data pool. The pool in question (above) is my backup pool, so I felt 2-way mirror with WD blue was good enough. I personally don't like raidz, I find it (i) over-complicates, (ii) is less flexible during expansion etc (iii) reduces performance (iv) has limitations (v) more to go wrong. But I'm no ZFS expert, it's just my opinion.
...And yes, I know I should upgrade to 8.4 anyway, I do plan to. It's just that everything is working so well and I don't want to risk it just now. I have 230 days of continuous uptime, barely needing to even restart a service. I want to enjoy that for a bit longer yet.
I noticed my backup script failed, then realized my backup pool was degraded. One of the 1TB drives had failed. If I was in the office, I would have tried popping the drive out and in again. Occassionally on my JBOD, the drive stops responding. Popping it out/in again, followed by a zpool online susually resolves it. The drives are usually perfectly OK, just a SAS controller or backplane hicup I guess.
Anyway, since I'm at home, I decided to detach and re-attach instead, since I'm using mirrors in all my vdevs, not raidz. That was a mistake! It would have been faster for me to cycle to office and pop the drive!
I then decided to attach a spare, so that I had redundancy, until the next day, where I could get the original drive attached (hopefully).
So after attaching the spare drive, I noticed it's very slow to resilver. So I know from scrubing that it can speed up if I wait a little while. But after 2 hours, it's still moving extremely slowly. And I'm frustrated because I just want it to mirror the blocks from one drive to another. Being a mirror, that should be simple. After more than 2 hours, it stills says that it will take 600 hours to complete, only having resilvered 1.91G (0.27%). Now I'm thinking "what are my options". And I very nearly decided to update to FreeBSD 8.4 (from 8.3) to get the feature flags 5000, and hopefully a faster resilver.
BUT ALAS, after 10 more minutes whilst I surf about slow zfs resilvering, it seems to speed up dramatically!
Here is my output, the first zpool status was after 2 hours, the second was about 10 mins later:
Code:
zpool status -v backup
pool: backup
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Jun 4 23:44:18 2014
15.7G scanned out of 5.67T at 2.77M/s, 595h46m to go
1.91G resilvered, 0.27% done
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/array1disk8 ONLINE 0 0 0
label/array1disk9 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
label/array1disk10 ONLINE 0 0 0
label/array1disk11 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
label/array1disk12 ONLINE 0 0 0
label/array1disk13 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
label/array1disk14 ONLINE 0 0 0
label/array1disk15 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
label/array1disk16 ONLINE 0 0 0
label/array1disk17 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
label/array1disk18 ONLINE 0 0 0
label/array1disk19 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
label/array1disk20 ONLINE 0 0 0
label/array1disk21 ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
label/array1disk23 ONLINE 0 0 0
label/array1disk7 ONLINE 0 0 0 (resilvering)
errors: No known data errors
[root@beastie1 /backup]# zpool status -v backup
pool: backup
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Jun 4 23:44:18 2014
67.7G scanned out of 5.67T at 11.1M/s, 147h25m to go
8.27G resilvered, 1.16% done
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
label/array1disk8 ONLINE 0 0 0
label/array1disk9 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
label/array1disk10 ONLINE 0 0 0
label/array1disk11 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
label/array1disk12 ONLINE 0 0 0
label/array1disk13 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
label/array1disk14 ONLINE 0 0 0
label/array1disk15 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
label/array1disk16 ONLINE 0 0 0
label/array1disk17 ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
label/array1disk18 ONLINE 0 0 0
label/array1disk19 ONLINE 0 0 0
mirror-6 ONLINE 0 0 0
label/array1disk20 ONLINE 0 0 0
label/array1disk21 ONLINE 0 0 0
mirror-7 ONLINE 0 0 0
label/array1disk23 ONLINE 0 0 0
label/array1disk7 ONLINE 0 0 0 (resilvering)
errors: No known data errors
So, the moral of the story is: zfs and it's scrubs and resilvers (replace/attach) is a complex beast. Perhaps just wait just a bit longer before doing something drastic like rebooting or updating, to get a newer zfs.
Tips to stay relaxed: use 2 drive redundancy (raidz2 or 3-way mirror). That way you can relax during a resilver / scrub.
I use 3-way mirrors on vdevs, with cheap WD blacks to offset the cost on my data pool. The pool in question (above) is my backup pool, so I felt 2-way mirror with WD blue was good enough. I personally don't like raidz, I find it (i) over-complicates, (ii) is less flexible during expansion etc (iii) reduces performance (iv) has limitations (v) more to go wrong. But I'm no ZFS expert, it's just my opinion.
...And yes, I know I should upgrade to 8.4 anyway, I do plan to. It's just that everything is working so well and I don't want to risk it just now. I have 230 days of continuous uptime, barely needing to even restart a service. I want to enjoy that for a bit longer yet.