ZFS disappointing ZFS(?) performance from 13.3

This morning I got to watch my 13.3 (ZFS zroot)desktop take 14 minutes to rsync about 21GB of data to a locally attached UFS disk with a steady 85%-88% CPU usage. Both parameters double last week's typical run. I have a BE to "fall back to" but this morning's boot gave the attached message prior to mounting the UFS backup disk. I answered "y" because this was the UFS disk and not the only one. To be honest I've not yet read anything about the benefits of this mapping but it's modified the UFS system; only the UFS disk hopefully. The performance was disappointing. The machine is ancient but performed better at this task before than today. Data itself is on a server so nothing is lost, and I have a second disk which I won't touch until next week's backup where there's actually new data to rsync. Perhaps this event was caused by the initial remapping of the UFS disk. General desktop performance appears to be fine. I rebooted into 13.2-p10 to post; so far that's fine too. I don't know what changes took place in using the new Open-ZFS in .3 but assume they're hard written to the system disk block structure. If time allows I'll use another disk and try the backup without agreeing to updating the UFS filesystem but that would be a new dump of 800GB and would take more time than I'm inclined to devote. Still nice to see a new release.
 

Attachments

  • zfs_4704.jpg
    zfs_4704.jpg
    72.1 KB · Views: 88
This morning I got to watch my 13.3 (ZFS zroot)desktop take 14 minutes to rsync about 21GB of data to a locally attached UFS disk with a steady 85%-88% CPU usage.
To say anything specific about this, one would need to know what kind of cpu usage is happening.

Anyway, ZFS in 13.3 has a problem and, under various circumstances, tends to bring the kernel into an endless loop. Or, from my own perception, on desktop and guest, I would rather say ZFS in 13.3 is broken and not really useable in the current shape.

Now we cannot know if your performance issue is caused by that same problem. If you see a kernel thread arc_prune consuming lots of compute, then it probably is. In that case there is PR 275594 which has a bunch
of patches, and these patches solved the problem very nicely, for me and other people.
 
PMc
Thank you for the information.

Edit:
Is this 'ZFS problem' in kernel code and, by using it, has the fs structure been changed? I didn't see specific behavioral changes, only some new abilities. If the data structure is unchanged a 13.2-p10 BE is a reboot away.
Again, thanks.
 
It's not a structural issue. It's about kernel memory housekeeping - which obviousely can lead to performance issues.
!3.3 has a newer version of OpenZFS, but I don't know about the structural changes (if there are any).
 
Reading this discussion and that one doesn't make me want to update from 13.2 to 13.3, I will wait a moment to see how it goes.
My plan was to wait patiently on 13.3 for 14.1 to be released but now I wonder if it is a good idea.
 
There will be lots of them. I upgraded at -BETA1, and rightaway my desktop-fan ran wild, because it went into this endless looping. Then I tried to rebuild ports, and it reproducibly crashed when trying to build gcc12, also due to this issue. This one is not difficult to hit.

There is somewhere a stance that releases are not just crap carriers, but should have some quality and stability. So usually I upgrade early at some -BETA, in order to report issues early so they might be fixed before release. But apparently nobody gives a fucking damn. This one should never have gone out unfixed, and PR 276862 shouldn't either.

Too soon to draw any conclusion there.

Politics won't fix this. Every party nowadays is disregarding their own promises and the will of the public, while continuing their propaganda-babble (believe in us, we are the best, blablabla).

BTW, it would be helpful if some of the concerned parties of this rsync issue could check with
top -HSz which threads are actually making the load, so one could see if this is indeed the same issue as PR 275594 or yet somthing else that also shouldn't be there.
 
I would rather say ZFS in 13.3 is broken and not really useable in the current shape.
Unfortunately I have to agree on this...

I always had issues with heavy I/O stalling the system since upgrading to 13.0 ... but the 13.3 kernel now escalated it to a state rendering the system completely unusable (no response in a minute) -- I just had to emergency-kill a poudriere bulk build.

The patches you linked seem to finally solve it, applied them and now the bulk build is running perfectly fine, at least so far.
 
I always had issues with heavy I/O stalling the system since upgrading to 13.0 ... but the 13.3 kernel now escalated it to a state rendering the system completely unusable (no response in a minute) -- I just had to emergency-kill a poudriere bulk build.
Have you had a chance to see if 14-RELEASE or 14-STABLE were better/different?
 
Have you had a chance to see if 14-RELEASE or 14-STABLE were better/different?
Not really. For my productive systems, I sticked to 13 so far, because there's nothing in 14 I absolutely want/need and I tried to sidestep such miserable failures ... well, not that successful it seems.

I did run 14-CURRENT all the time (and now 15-CURRENT of course) on two VMs for testing ports, of course they run ZFS and I didn't notice any issues there, but I doubt that allows some ascertained claim.
 
  • Like
Reactions: mer
I just checked, for some reason I skipped FreeBSD-13 altogether on an important ZFS server. 12 to 14. Must have been unconscious bias.
 
cracauer@ I applied all the patches from PR 275594, namely commits a33aedb344, ff1452a099, 092fcd8c9f and f4e31b6a5e from the linked github repo, on top of releng/13.3 here, and so far, this seems to solve the immediate issue and also completely eliminate the issue of occassional "stalls" under heavy I/O load I had ever since upgrading to 13.

If I understand that PR correctly, those patches do more than any upstream fix at that time? I'm currently running another large poudriere build with a well-filled ccache, something that always caused some problems so far ... if this is running fine now, I really hope all these patches will make it into main, stable and releng branches.
 

See also 275063 – kernel using 100% CPU in arc_prune
  • both reports are CLOSED FIXED
  • 275063 blocked 14.0-erratas (sic)
  • 275063 also requested a merge to stable/13 that was unanswered
  • given the spread across two closed reports – and the norm of not commenting/actioning whilst closed – I have suggested a new report.
(Do not close (resolve) an issue until all commits/changes have been made (including MFC's and MFH's).)
 
sysutils/openzfs should work around the problem …

Thanks, it does seem likely, however I hesitate before suggesting this port (currently ZFS 2.2.2) in RELEASE scenarios.

For RELEASE, I think, better either:
  • activate the ZFS boot environment that preceded the upgrade to 13.3-RELEASE and rename the default environment to indicate that it's no longer the default; or
  • if there are no ZFS boot environments, rollback with freebsd-update(8).
 
FYI:
 
Back
Top