Solved FreeBSD sysctl vfs.zfs.dmu_offset_next_sync and openzfs/zfs issue #15526 (errata notice, FreeBSD bug 275308)

JanBeh · Nov 29, 2023

I recently updated to FreeBSD 14, and this thread and numerous links have confused me a bit. Could someone summarize what is going on and if (as a user) I should do something to protect my data from being corrupted?

After doing some searching, I found an advisory on one of the FreeBSD mailinglists. Is that what I should do (i.e. the official recommendation) and wait for an update until the problem is fixed? Why isn't anything showing up on https://www.freebsd.org/? Isn't this issue serious enough to demand a more visible notice/advisory for FreeBSD's users?

monwarez · Nov 29, 2023

I guess they are waiting for all errata to be resolved PR 275215
Looking at https://cgit.freebsd.org/src/log/?h=stable/14 the patches that was merged in OpenZFS are already in the source tree.
The short version is: if you never enabled bclone you are probably fine. Setting dmu_offset_next_sync to 0 as a workaround will reduce the likelihood of hitting the bugs.

cy@ · Nov 29, 2023

BTW, git: 2276e53940c2 - main - zfs: merge openzfs/zfs@688514e47, committed to 15-CURRENT, includes the fix for the bug.

#15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
#15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness

CyberCr33p · Nov 29, 2023

cy@ said:
BTW, git: 2276e53940c2 - main - zfs: merge openzfs/zfs@688514e47, committed to 15-CURRENT, includes the fix for the bug.

#15566 688514e47 dmu_buf_will_clone: fix race in transition back to NOFILL
#15571 30d581121 dnode_is_dirty: check dnode and its data for dirtiness

Any idea if these patches completely fix the reported data corruption issues?

Nasrudin · Nov 29, 2023

Commit 5858f93 on stable/13 seems to have a small fix committed for this issue. It was cherry picked, as Mr. Perrin has indicated.

PMc · Nov 29, 2023

CyberCr33p said:
Any idea if these patches completely fix the reported data corruption issues?

From what I read, the patches are tested by brute-force trying to reproduce the issue. Apparently they are not fixed by logically understanding the full proceedings and then devising a fix that would be guaranteed to completely solve the issue. And from my own glances into the code, I doubt the latter is possible at all anymore.
So, probably no, there is no assurance of completeness. Furthermore, from what I read, this thing did linger for quite a while, or even right from the beginning. And I would bet on there being more of such things lingering, which may or may not appear in some specific conditions. But that is not only true with zfs, it is true with most of the stack of "modern" stuff we're sitting upon.

OTOH, this thing does not normally trigger with casual data usage. It does trigger with high-performance automated process chains. And in such scenarios you will usually notice that your data is crap, one way or the other.

Anyway, I'm still on 13.2 and I decided to do nothing, for now. It didn't hit me during the last 15 years, so why to panic now?

Cath O'Deray · Dec 1, 2023

JanBeh said:
… this thread and numerous links have confused me a bit.

More than a few people are (understandably) confused!

… an advisory on one of the FreeBSD mailinglists. … Why isn't anything showing up on https://www.freebsd.org/? Isn't this issue serious enough to demand a more visible notice/advisory for FreeBSD's users?

Now at the home page:

Prior to today's errata notice: an email to the freebsd-announce list was suggested but not demanded ;-)

This topic is now Solved, but remains open for questions and answers.

Cath O'Deray · Dec 2, 2023

<https://github.com/openzfs/zfs/releases/tag/zfs-2.2.2> describes this as a good overview:

terse 15526.md

<https://gist.github.com/rincebrain/e23b4a39aba3fadc04db18574d30dc73>

So you want to understand what's going on with all these reports of "OpenZFS might be eating my data".

Here's a simple explanation of what is and isn't true.
…

Cath O'Deray · Dec 3, 2023

Quoting, from Discord, the author of openzfs/zfs PRs 15566 and 15571 (with his permission):

… those two patches (<https://www.freebsd.org/security/patches/EN-23:16/openzfs.13.patch>, <https://security.freebsd.org/patches/EN-23:16/openzfs.14.patch>) referenced in the EN fix the issue with lseek that could lead to zeroes in copies

The 14/2.2 patch also fixes some bugs in block cloning (which didn't exist in 13/2.1, which is why they're not in the 13 patch). block cloning is still disabled by sysctl because we're not sure its stable yet, but that's unrelated to lseek (always was, we just didn't understand that at first)

tl:dr:

if you can get a patch, take it, and go confidently into the world

if you can't, set the tuneable, and go confidently in the world

if you can't, take a deep breath and then go confidently into the world because your chances of unknowingly and undetectably hitting this are basically non-existent

…

Cath O'Deray · Jan 4, 2024

An excellent point of reference:

A data corruption bug in OpenZFS?

The Thanksgiving long weekend (23-26 November) in 2023 was an interesting one for OpenZFS, that I managed to land myself in the middle of.

despairlabs.com

Solved FreeBSD sysctl vfs.zfs.dmu_offset_next_sync and openzfs/zfs issue #15526 (errata notice, FreeBSD bug 275308)

terse 15526.md​

Here's a simple explanation of what is and isn't true.​

terse 15526.md

Here's a simple explanation of what is and isn't true.