How long does ZFS wait for a disk to return from a read/write?

jem · Dec 11, 2012

I've noticed that Western Digital have recently released a new range of "NAS optimised" hard disk drives, the Red series.

Among the features these drives have is adjustable error recovery control timeouts (or TLER in WD-parlance).

This made me wonder how well these drives will interact with ZFS. How long does ZFS wait for a drive to return from an operation before considering the disk failed and marking it degraded or failed? What would be the ideal error recovery timeouts to set on these disks when using them in a zpool?

Thanks.
J

throAU · Dec 13, 2012

Not sure, but I had RAID controller issues with some WD blacks (no TLER support on them), which have been rock solid for 6 months so far under ZFS.

I don't believe ZFS depends on TLER support.

Sfynx · Dec 13, 2012

With ZFS, you want to let the drive do it's thing instead of returning early because of TLER. That feature is only useful for RAID controllers that decide to panic and do strange things (like dropping a drive prematurely) if a drive takes its time doing recovery.
When a drive is connected to a 'dumb' HBA, as recommended with ZFS, then the FreeBSD kernel is in control of these timeouts and ZFS will just wait for the drive to recover from any media failure without really caring how long this will take. I believe the maximum drive timeout is then dictated by the FreeBSD disk controller driver together with the default timeout of the drive itself.

So yes, I'd consider TLER irrelevant when not using a special RAID controller.

phoenix · Dec 13, 2012

ZFS waits forever. There are no "I/O timeouts" within ZFS itself. It's entirely left up to the layers below ZFS (disk subsystem, device drivers, hardware). Which makes sense, as those are the layers that best understand the devices and what length of time makes sense for the different types of I/O.

However, that does lead to some interesting "lockups", where ZFS issues an I/O, and the layers below it "lose" the I/O request, so it never terminates/returns. ZFS can't issue any new I/O until the previous one returns or else it loses transaction consistency.

There's a new "deadman timer" feature added to the OSS ZFS via Illumos. If ZFS does not receive a response to an I/O within 1000 seconds, then it panics the system. See this thread for all the details. No idea if this has been imported into FreeBSD -CURRENT yet.

throAU · Dec 17, 2012

phoenix said:
There's a new "deadman timer" feature added to the OSS ZFS via Illumos. If ZFS does not receive a response to an I/O within 1000 seconds, then it panics the system. See this thread for all the details. No idea if this has been imported into FreeBSD -CURRENT yet.

I gather this is not including the case where the drive encounters a bad sector. Presumably the drive itself sends a response back saying something to the effect of "bad block" and ZFS can then take appropriate action (fail the drive, etc.).

From what I understand, the only way the IO can get hung up like that is due to a controller/driver bug?

usdmatt · Dec 17, 2012

If the RED disks perform as they are marketed I would still prefer them to other consumer grade disks. They claim to have a much better MTBF but I haven't looked at the actual numbers. They are also supposedly designed for 24x7 use (if you're planning a NAS or server).

Also while I'm sure ZFS works fine with non-TLER disks, I though the main advantage of this was basically to stop the array hanging when data can't be read. A normal disk will try it's best to read the data, with ZFS waiting around for it while it does. On a standalone disk this is probably what you would want to happen but not in RAID (hardware or ZFS). You want the disk to fail the read fairly quickly so the system can sort itself out and carry on (In the case of ZFS, read the mirror or parity and rewrite the record somewhere else).

Sfynx · Dec 19, 2012

Myeah, I've had enterprise-marketed drives fail in months and also desktop-marketed drives that spin 24/7 for many years to this day... but then again, we haven't seen many drives fail so far, so it could be a streak of bad luck with the high-MTBF drives (it has 'mean' in the term after all).

Having at least a double redundancy setup and a good drive replacement policy probably does more good than buying enterprise drives and expecting them to last forever

How long does ZFS wait for a disk to return from a read/write?

jem

throAU

Sfynx

phoenix

throAU

usdmatt

Sfynx