random Bad file descriptor errors building ports on a vps from Digitalocean

Hey everyone,

(Perhaps the underlying problem is related to some process utilizing all available cpu resources.)
I manage a few servers on different hosts similarly configured, but this particular vps hosted with Digitalocean in Frankfurt throws a "Bad file descriptor" while using make on a port. Additionally the issue also occurs when there is apparently of background process using all free cpu resources. Rebooting the instance fixes the issue. I have reached out to Digitalocean several times and am unable to track the glitch down, and it seems to have remained beyond several upgrades from between 12 to 13x.

A sample error
Code:
===>  Extracting for clamav-0.105.1_1,1
=> SHA256 Checksum OK for clamav-0.105.1.tar.gz.
chmod: /usr/ports/security/clamav/work/clamav-0.105.1/libclamav_rust/.cargo/vendor/tiff/tests/encode_images.rs: Bad file descriptor
But it can occur on almost any port but once it gets stuck only a reboot would usually get it to build.

Any suggestions would be appreciated, thanks in advance,
Oclair
 
I manage a few servers on different hosts similarly configured, but this particular vps hosted with Digitalocean in Frankfurt throws a "Bad file descriptor" while using make on a port.

I think I'm seeing a similar error, here. With FreeBSD 13.1 built from one changeset in the stable/13 branch, I've seen literally hundreds of similar bad FD errors in one poudriere bulk run. The uname for the build that I'm running, at present:

Code:
FreeBSD xmin.cloud.thinkum.space 13.1-STABLE FreeBSD 13.1-STABLE #0 build/stable/13-n252824-84b4709f38f: Fri Oct 28 16:45:22 PDT 2022     gimbal@xmin.cloud.thinkum.space:/usr/obj/xmin_FreeBSD-13.1-STABLE_amd64/usr/src/amd64.amd64/sys/XMIN amd64
The build/stable/13 branch in the uname is a local convention. This is a build for changeset 84b4709f38f in the stable/13 branch

It's been difficult to isolate here. The bad FD errors don't seem to show up until sometime after a couple hundred ports have been built with poudriere, then it persists until the poudriere build is restarted.

Locally, I've used upwards to 16 parallel builders in Poudriere on a Minisforum HX90 machine (CPU: AMD Ryzen 9 5900HX). Reducing the number of builders didn't seem to help much. The bad FD error still showed up.

On this machine, the poudriere build is running in parallel to a Cinnamon desktop session in x.org, with firefox and a terminal emulator adding some load, on the desktop.

After noticing something about the skein checksum algorithm in ZFS, in the stable/13 changelogs, I started trying different checksum algorithms on the ZFS filesystems that Poudriere uses for jail root filesystems. The bad file descriptor errors still occurred.

After looking at the ZFS Tuning Guide at the FreeBSD Wiki, I've tried adjusting a couple of sysctl mibs. I've not seen the bad FD errors, today. Here's what I changed in my local sysctl.conf
Code:
vfs.zfs.l2arc_write_max=16777216
vfs.zfs.l2arc_write_boost=16777216

This basically changes the values from their default 2^23 to the value 2^24

I've also increased the machine's vfs.zfs.arc_max from approx 1/16th to approx 1/8 total machine RAM. I don't believe this has affected the bad FD error.

Although it could be a false positive again, I've not seen the bad FD errors with Poudriere since this change.

Presently, the machine is running a build for 260 ports, 109 have built without the bad FD error.

Albeit, after this change, the poudriere build seems to consume more overall I/O and processor time relative to the Cinnamon desktop environment running in parallel. There are some periods when there's a delay in the responsiveness of desktop apps during the build, e.g with firefox (actually firefox-esr). I wasn't seeing this before that change in sysctl.conf

Despite the short latency periods at the desktop, however, the bad FD errors don't seem to be showing up now.

HTH! Health
 
After looking at the ZFS Tuning Guide at the FreeBSD Wiki, I've tried adjusting a couple of sysctl mibs. I've not seen the bad FD errors, today. Here's what I changed in my local sysctl.conf
Code:
vfs.zfs.l2arc_write_max=16777216
vfs.zfs.l2arc_write_boost=16777216
Hey thanks for the suggestion, I did add these lines to sysctl.conf but unfortunately I still get random file description errors. I end up just going to the port directly and re-run make and then it usually builds.

Upon a restart the issue temporarily goes away, and the issue is apparent to only effect building software

Have a nice day!
 
in many cases (But not all) after a bad file descriptor stopping a portmaster run;

go directly to the port and type make reinstall clean and magically it will install.

Other situations I might need to install via pkg

Otherwise in worst case situations I will reboot and shut down all services and then the build was successful.

So far...
 
Back
Top