Fancy 13.4

PMc · Sep 7, 2024

Code:

  PID TT  STAT        TIME COMMAND
15507  -  R        1:01.71 expr 0 + 1

Eh, WHAT??

Given the current security patches, I decided to try with 13.4. (Since I build all from source, I'm not forced to wait for the exact release.)
This is what I did see after compiling some 1100 ports, in a bhyve. And at least the bhyve was extremely laggy while doing nothing than this, with 20 core.

The process accumulated up to 2:40 minutes, and then things proceeded like normal. I'm not really amused.

Cath O'Deray · Sep 7, 2024

PMc said:
13.4 … source

Which commit?

Have you seen either of the loader-related warnings?

BOOT LOADER IS TOO OLD. PLEASE UPGRADE.

Loader needs to be updated

PMc · Sep 7, 2024

Cath O'Deray said:
Which commit?

RC2-p1 e893ec49afb2

Cath O'Deray said:
Have you seen either of the loader-related warnings?

BOOT LOADER IS TOO OLD. PLEASE UPGRADE.

Loader needs to be updated

No, because I can't. It's an automated build, creating guests on-the-fly.

But since it clones a snapshot of the volume, boots from that, then mounts the actual volume into /mnt, deletes it and reinstalls via make install DESTDIR, I would suppose the loader cannot be older than the OS.

Cath O'Deray · Sep 7, 2024

FreeBSD-13.4-RC3-amd64-bootonly.iso (release candidate 3):

Cath O'Deray · Sep 7, 2024

For the download during installation, I tried a server in Sweden (because I knew of transient problems elsewhere, earlier today), it failed. My bad, I didn't notice that the selection was IPv6 Sweden; I don't have IPv6 connectivity.

Opting to restart the installer, a different failure (looping):

Opting to exit the installer then start the installer, a different failure (I thought that this resolver stuff was fixed long ago):

A reset of the computer (to work around the resolver bug), then I allowed the default FTP site, success:

Some of what's above is premature, because RC3 has not yet been announced.

PMc · Sep 7, 2024

Cath O'Deray said:
View attachment 20249FreeBSD-13.4-RC3-amd64-bootonly.iso (release candidate 3):

The picture is distorted.

Doing the rollout, now I see what you mean. (Would be helpful if You would acquire the custom of writing some intellegible full sentences about what where and why).

Cath O'Deray · Sep 7, 2024

PMc said:
The picture is distorted.

I intentionally cropped, then blurred everything except the phrase "Loader needs to be updated", which I mentioned in my previous comment.

Cath O'Deray · Sep 14, 2024

Spooky release builds began just after midnight UTC on Friday 13th.

The usual: please await the official announcement, expected on Tuesday, before treating what's built as an official release.

Errata

Hardware notes do not include iwlwifi(4). Instead, please see the DESCRIPTION section of the manual page. A future edition may include a HARDWARE section.

Plus, hopefully, there'll be something about the on-screen plea to upgrade a loader that is not outdated.

Release notes

Security advisories are not yet in the draft. I count nine.

PMc · Sep 14, 2024

Fancy new feature: pools can now be suspended multiple times:

Code:

Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.

As a consequence, all tasks accessing the pool go into "D" disk uninterruptible wait. Then, all zpool commands go into "D" disk uninterruptible wait. And finally all disk access, mount, df, etc. be it ufs or zfs, also go into "D" disk uninterruptible wait.

Cath O'Deray · Sep 14, 2024

PMc said:
… all disk access, mount, df, etc. … go into "D" disk uninterruptible wait.

That sounds like a bug, did anyone report it?

PMc · Sep 14, 2024

Cath O'Deray said:
That sounds like a bug

It's not a bug. "D" uninterruptible wait is uninterruptible, as the name says - and at the point where some process gets into that state, it may already hold some other locks of the disk subsystem. So anything that tries to obtain these locks, does also go into "D" uninterruptible wait.
The real problem is, unless there is a working IPMI, you can neither power-off nor reboot (because these will also end in uninterruptible wait), and so travel expenses for push-button service will arise.

I have seen this often over the years, it usually appears when some device driver doesn't properly timeout. But nowadays, if a disk device doesn't behave, zfs should just suspend it (and eventually suspend the concerned pool), keeping the remainder of the system running normally.

did anyone report it?

Forget it. Not many people follow a design philosophy of "use as many zfs pools as possible" (currently 16). Furthermore the concerned disk has 23 years service-time, and imminent failure prediction is active for about 15 years now. (This is a museum, did I mention it?)

Cath O'Deray · Sep 14, 2024

PMc said:
… if a disk device doesn't behave, zfs should just suspend it (and eventually suspend the concerned pool), keeping the remainder of the system running normally. …

That's my (limited) experience with multiple pools on 15.0-CURRENT.

Back to your build of 13.4:

PMc said:

… pools can now be suspended multiple times:

Code:

Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.

Was there anything else in the midst of those three messages?

PMc · Sep 14, 2024

Cath O'Deray said:
That's my (limited) experience with multiple pools on 15.0-CURRENT.

That was normally my experience also. So I was a bit surprized about these ongoings yesterday evening.

Cath O'Deray said:
Back to your build of 13.4:
Was there anything else in the midst of those three messages?

Yes. Apparently the disk device is gone for good, but the system continues to do I/O to the suspended pool that doesn't contain volumes anymore:

Code:

Sep 13 21:32:11 <kern.crit> edge kernel: [432486] (da4:ahd0:0:0:0): Periph destroyed
Sep 13 21:32:11 <local7.notice> edge ZFS[89307]: vdev state changed, pool_guid=13467969670308412929 vdev_guid=16667129201166525073
Sep 13 21:32:11 <local7.notice> edge ZFS[89311]: vdev is removed, pool_guid=13467969670308412929 vdev_guid=16667129201166525073
Sep 13 21:32:16 <kern.crit> edge kernel: [432491] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:32:16 <kern.crit> edge kernel: [432491]
Sep 13 21:32:16 <local7.warn> edge ZFS[89363]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89367]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89371]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.warn> edge ZFS[89375]: pool I/O failure, zpool=dbintb error=28
Sep 13 21:32:16 <local7.alert> edge ZFS[89379]: catastrophic pool I/O failure, zpool=dbintb
Sep 13 21:51:06 <kern.crit> edge kernel: [433621] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:51:06 <kern.crit> edge kernel: [433621]
Sep 13 21:51:06 <local7.warn> edge ZFS[4987]: pool I/O failure, zpool=dbintb error=6
Sep 13 21:51:06 <local7.alert> edge ZFS[4991]: catastrophic pool I/O failure, zpool=dbintb
Sep 13 21:56:19 <kern.crit> edge kernel: [433935] Solaris: WARNING: Pool 'dbintb' has encountered an uncorrectable I/O failure and has been suspended.
Sep 13 21:56:19 <kern.crit> edge kernel: [433935]
Sep 13 21:56:19 <local7.warn> edge ZFS[9545]: pool I/O failure, zpool=dbintb error=6
Sep 13 21:56:19 <local7.alert> edge ZFS[9549]: catastrophic pool I/O failure, zpool=dbintb

At 21:43 the daily backup tried to access the mountpoint of the concerned pool, and got stuck in "D".
At 22:04 a zfs send/recv of some other pools tried to access zfs, and got stuck in "D".

After (hard) reset, no errors:

Code:

  pool: dbintb
 state: ONLINE
  scan: scrub repaired 0B in 00:02:56 with 0 errors on Wed Sep 11 03:47:32 2024
config:

        NAME         STATE     READ WRITE CKSUM
        dbintb       ONLINE       0     0     0
          da4.elip3  ONLINE       0     0     0

errors: No known data errors

Cath O'Deray · Sep 14, 2024

PMc said:
… (da4:ahd0:0:0:0): Periph destroyed

da(4) "… The direct access class includes disk, magneto-optical, and solid-state devices. …"

What does ahd signify, and how was the device connected (before the logged destruction)?

PMc · Sep 14, 2024

Cath O'Deray said:
da(4) "… The direct access class includes disk, magneto-optical, and solid-state devices. …"

A normal disk, 23 years in operation.

Cath O'Deray said:
What does ahd signify, and how was the device connected (before the logged destruction)?

ahd(4)

Cath O'Deray · Sep 14, 2024

Thanks. I can't see anything relevant in draft release notes. They're very sparse, IMHO (no mention of encrypted ZFS home directories, and so on).

If there is a related change in 13.4, you might find it logged in cgit.

Cath O'Deray · Sep 17, 2024

FreeBSD 13.4-RELEASE Available

Release Information page. Continue reading...

forums.freebsd.org

Pre-release:

Cath O'Deray said:
hopefully, there'll be something about the on-screen plea to upgrade a loader that is not outdated.

Not documented,

Code:

% rg loader /usr/doc/website/content/en/releases/13.4R
%

The two warnings about an outdated loader may occur with up-to-date copies of the loader on an official FreeBSD Project-provided virtual disk in a virtual machine … and so on.

If you update an EFI system partition:

ensure that both copies of the loader are in place

– the example in the loader.efi(8) manual page is the non-default copy. Updating this one alone is not sufficient to avoid warnings or potential future issues.

Release notes

Additional notable items include:

adduser(8) support for creation of an encrypted or non-encrypted ZFS dataset for the home directory
…

PMc · Sep 17, 2024

Cath O'Deray said:
Not documented,

Code:

% rg loader /usr/doc/website/content/en/releases/13.4R %

Well, there is an "errata" page:

FreeBSD 13.4-RELEASE Errata

FreeBSD is an operating system used to power modern servers, desktops, and embedded platforms.

www.freebsd.org

Reminds me of times when I was working for a commercial unix vendor - you had to read several pages of release notes etc, to find out that there is nothing written in them. Back at that time it was great to have FreeBSD where all this was different.

Fancy 13.4

Errata​

Release notes​

Release notes​

Errata

Release notes

Release notes