Solved FreeBSD sysctl vfs.zfs.dmu_offset_next_sync and openzfs/zfs issue #15526 (errata notice, FreeBSD bug 275308)

I recently updated to FreeBSD 14, and this thread and numerous links have confused me a bit. Could someone summarize what is going on and if (as a user) I should do something to protect my data from being corrupted?

After doing some searching, I found an advisory on one of the FreeBSD mailinglists. Is that what I should do (i.e. the official recommendation) and wait for an update until the problem is fixed? Why isn't anything showing up on https://www.freebsd.org/? Isn't this issue serious enough to demand a more visible notice/advisory for FreeBSD's users?
 
BTW, git: 2276e53940c2 - main - zfs: merge openzfs/zfs@688514e47, committed to 15-CURRENT, includes the fix for the bug.

#15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
#15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness
 
BTW, git: 2276e53940c2 - main - zfs: merge openzfs/zfs@688514e47, committed to 15-CURRENT, includes the fix for the bug.

#15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
#15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness
Any idea if these patches completely fix the reported data corruption issues?
 
Any idea if these patches completely fix the reported data corruption issues?
From what I read, the patches are tested by brute-force trying to reproduce the issue. Apparently they are not fixed by logically understanding the full proceedings and then devising a fix that would be guaranteed to completely solve the issue. And from my own glances into the code, I doubt the latter is possible at all anymore.
So, probably no, there is no assurance of completeness. Furthermore, from what I read, this thing did linger for quite a while, or even right from the beginning. And I would bet on there being more of such things lingering, which may or may not appear in some specific conditions. But that is not only true with zfs, it is true with most of the stack of "modern" stuff we're sitting upon.

OTOH, this thing does not normally trigger with casual data usage. It does trigger with high-performance automated process chains. And in such scenarios you will usually notice that your data is crap, one way or the other.

Anyway, I'm still on 13.2 and I decided to do nothing, for now. It didn't hit me during the last 15 years, so why to panic now?
 
… this thread and numerous links have confused me a bit.

More than a few people are (understandably) confused!

an advisory on one of the FreeBSD mailinglists. … Why isn't anything showing up on https://www.freebsd.org/? Isn't this issue serious enough to demand a more visible notice/advisory for FreeBSD's users?

Now at the home page:

1701411492760.png


Prior to today's errata notice: an email to the freebsd-announce list was suggested but not demanded ;-)



This topic is now Solved, but remains open for questions and answers.
 
Quoting, from Discord, the author of openzfs/zfs PRs 15566 and 15571 (with his permission):

… those two patches (<https://www.freebsd.org/security/patches/EN-23:16/openzfs.13.patch>, <https://security.freebsd.org/patches/EN-23:16/openzfs.14.patch>) referenced in the EN fix the issue with lseek that could lead to zeroes in copies

The 14/2.2 patch also fixes some bugs in block cloning (which didn't exist in 13/2.1, which is why they're not in the 13 patch). block cloning is still disabled by sysctl because we're not sure its stable yet, but that's unrelated to lseek (always was, we just didn't understand that at first)

tl:dr:
  • if you can get a patch, take it, and go confidently into the world
  • if you can't, set the tuneable, and go confidently in the world
  • if you can't, take a deep breath and then go confidently into the world because your chances of unknowingly and undetectably hitting this are basically non-existent
 
Back
Top