Hello, my first post here. I've been a sysadmin for a Long Time (20+ years) but am just starting out when it comes to FreeBSD. I built a ZFS storage server with a 60 bay JBOD, dual HBAs, four paths to the storage managed by gmultipath. Everything was going really well - but then I became aware of a problem with gmultipath and I'm wondering if it's a deal breaker that will force me to go with a different OS. Basically gmultipath can't differentiate between path failure and drive failure, so a failed drive will cause it to queue the I/O and retry the path forever so the drive never really fails from the perspective of ZFS.
DM Multipath on Linux has a setting called no_path_retry that can limit the number of retries, which would seem like an appropriate solution. As far as I can tell gmultipath has no such option. I'm not willing to set up a new storage system with a known issue that is likely to be a problem in the future. So I'm wondering how others have managed this, or if it's less of a problem in practice than it seems.
I have already added a comment to bug 178473 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=178473> but it's been open for 4 years so I'm not expecting a fix in the short term. Just wondering if there is a good workaround. Of course I could write a script that would scrape the log files looking for long stretches of path retries and have gmultipath stop or fail the paths but I'd rather avoid a kludge on a new system if possible.
Thanks, Richard
DM Multipath on Linux has a setting called no_path_retry that can limit the number of retries, which would seem like an appropriate solution. As far as I can tell gmultipath has no such option. I'm not willing to set up a new storage system with a known issue that is likely to be a problem in the future. So I'm wondering how others have managed this, or if it's less of a problem in practice than it seems.
I have already added a comment to bug 178473 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=178473> but it's been open for 4 years so I'm not expecting a fix in the short term. Just wondering if there is a good workaround. Of course I could write a script that would scrape the log files looking for long stretches of path retries and have gmultipath stop or fail the paths but I'd rather avoid a kludge on a new system if possible.
Thanks, Richard