Fancy 13.4

Code:
  PID TT  STAT        TIME COMMAND
15507  -  R        1:01.71 expr 0 + 1

Eh, WHAT??

Given the current security patches, I decided to try with 13.4. (Since I build all from source, I'm not forced to wait for the exact release.)
This is what I did see after compiling some 1100 ports, in a bhyve. And at least the bhyve was extremely laggy while doing nothing than this, with 20 core.

The process accumulated up to 2:40 minutes, and then things proceeded like normal. I'm not really amused.
 
Which commit?
RC2-p1 e893ec49afb2
Have you seen either of the loader-related warnings?

BOOT LOADER IS TOO OLD. PLEASE UPGRADE.

Loader needs to be updated
No, because I can't. It's an automated build, creating guests on-the-fly.

But since it clones a snapshot of the volume, boots from that, then mounts the actual volume into /mnt, deletes it and reinstalls via make install DESTDIR, I would suppose the loader cannot be older than the OS.
 
1725717787115.pngFreeBSD-13.4-RC3-amd64-bootonly.iso (release candidate 3):
 
For the download during installation, I tried a server in Sweden (because I knew of transient problems elsewhere, earlier today), it failed. My bad, I didn't notice that the selection was IPv6 Sweden; I don't have IPv6 connectivity.

Opting to restart the installer, a different failure (looping):

1725719360199.png

1725719576226.png




Opting to exit the installer then start the installer, a different failure (I thought that this resolver stuff was fixed long ago):

1725719640983.png




A reset of the computer (to work around the resolver bug), then I allowed the default FTP site, success:

1725720057605.png




Some of what's above is premature, because RC3 has not yet been announced.
 
Spooky release builds began just after midnight UTC on Friday 13th.

The usual: please await the official announcement, expected on Tuesday, before treating what's built as an official release.

Errata​

Hardware notes do not include iwlwifi(4). Instead, please see the DESCRIPTION section of the manual page. A future edition may include a HARDWARE section.

Plus, hopefully, there'll be something about the on-screen plea to upgrade a loader that is not outdated.

Release notes​

Security advisories are not yet in the draft. I count nine.
 
Fancy new feature: pools can now be suspended multiple times:

Code:
Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.

As a consequence, all tasks accessing the pool go into "D" disk uninterruptible wait. Then, all zpool commands go into "D" disk uninterruptible wait. And finally all disk access, mount, df, etc. be it ufs or zfs, also go into "D" disk uninterruptible wait.
 
That sounds like a bug
It's not a bug. "D" uninterruptible wait is uninterruptible, as the name says - and at the point where some process gets into that state, it may already hold some other locks of the disk subsystem. So anything that tries to obtain these locks, does also go into "D" uninterruptible wait.
The real problem is, unless there is a working IPMI, you can neither power-off nor reboot (because these will also end in uninterruptible wait), and so travel expenses for push-button service will arise.

I have seen this often over the years, it usually appears when some device driver doesn't properly timeout. But nowadays, if a disk device doesn't behave, zfs should just suspend it (and eventually suspend the concerned pool), keeping the remainder of the system running normally.

did anyone report it?
Forget it. Not many people follow a design philosophy of "use as many zfs pools as possible" (currently 16). Furthermore the concerned disk has 23 years service-time, and imminent failure prediction is active for about 15 years now. (This is a museum, did I mention it?)
 
… if a disk device doesn't behave, zfs should just suspend it (and eventually suspend the concerned pool), keeping the remainder of the system running normally. …

That's my (limited) experience with multiple pools on 15.0-CURRENT.



Back to your build of 13.4:

… pools can now be suspended multiple times:

Code:
Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.

Was there anything else in the midst of those three messages?
 
That's my (limited) experience with multiple pools on 15.0-CURRENT.
That was normally my experience also. So I was a bit surprized about these ongoings yesterday evening.



Back to your build of 13.4:
Was there anything else in the midst of those three messages?
Yes. Apparently the disk device is gone for good, but the system continues to do I/O to the suspended pool that doesn't contain volumes anymore:

Code:
Sep 13 21:32:11 <kern.crit> edge kernel: [432486] (da4:ahd0:0:0:0): Periph destroyed
Sep 13 21:32:11 <local7.notice> edge ZFS[89307]: vdev state changed, pool_guid=13467969670308412929 vdev_guid=16667129201166525073
Sep 13 21:32:11 <local7.notice> edge ZFS[89311]: vdev is removed, pool_guid=13467969670308412929 vdev_guid=16667129201166525073
Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:32:16 <kern.crit> edge kernel: [432491]
Sep 13 21:32:16 <local7.warn> edge ZFS[89363]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89367]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89371]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89375]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.alert> edge ZFS[89379]: catastrophic pool I/O failure, zpool=dbintb
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621]
Sep 13 21:51:06 <local7.warn> edge ZFS[4987]: pool I/O failure, zpool=dbintb error=6
Sep 13 21:51:06 <local7.alert> edge ZFS[4991]: catastrophic pool I/O failure, zpool=dbintb
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935]
Sep 13 21:56:19 <local7.warn> edge ZFS[9545]: pool I/O failure, zpool=dbintb error=6
Sep 13 21:56:19 <local7.alert> edge ZFS[9549]: catastrophic pool I/O failure, zpool=dbintb

At 21:43 the daily backup tried to access the mountpoint of the concerned pool, and got stuck in "D".
At 22:04 a zfs send/recv of some other pools tried to access zfs, and got stuck in "D".

After (hard) reset, no errors:
Code:
  pool: dbintb
 state: ONLINE
  scan: scrub repaired 0B in 00:02:56 with 0 errors on Wed Sep 11 03:47:32 2024
config:

        NAME         STATE     READ WRITE CKSUM
        dbintb       ONLINE       0     0     0
          da4.elip3  ONLINE       0     0     0

errors: No known data errors
 
Thanks. I can't see anything relevant in draft release notes. They're very sparse, IMHO (no mention of encrypted ZFS home directories, and so on).

If there is a related change in 13.4, you might find it logged in cgit.
 

Pre-release:

hopefully, there'll be something about the on-screen plea to upgrade a loader that is not outdated.

Not documented,

Code:
% rg loader /usr/doc/website/content/en/releases/13.4R
%

The two warnings about an outdated loader may occur with up-to-date copies of the loader on an official FreeBSD Project-provided virtual disk in a virtual machine … and so on.

If you update an EFI system partition:
  • ensure that both copies of the loader are in place
– the example in the loader.efi(8) manual page is the non-default copy. Updating this one alone is not sufficient to avoid warnings or potential future issues.

Release notes​

Additional notable items include:
  • adduser(8) support for creation of an encrypted or non-encrypted ZFS dataset for the home directory
 
Not documented,

Code:
% rg loader /usr/doc/website/content/en/releases/13.4R
%
Well, there is an "errata" page:


Reminds me of times when I was working for a commercial unix vendor - you had to read several pages of release notes etc, to find out that there is nothing written in them. Back at that time it was great to have FreeBSD where all this was different.
 
Back
Top