Strange networking behaviour in fetch vs curl

I'm experiencing a really strange set of behaviours. This is a FreeBSD VM (FreeBSD 14.1-RELEASE releng/14.1-n267679-10e31f0946d8 GENERIC amd64) that was installed manually in a Xen Hypervisor (XCP-NG 8.2.1). I installed from the DVD iso.

It's entirely likely that something is wacky in this environment. Having said that, I have a range of VMs on this same network and hypervisor running a bunch of different OSes and versions (FreeBSD 13.3-RELEASE, Alpine Linux 3.19, Debian 12, etc.) and only this one (newest) VM is displaying any of these symptoms.

To get the BLUF (bottom line up front) I'm looking for recommended commands to run to get more information and to debug. How do I diagnose this? The rest of the post is stuff I've already tried.

For reasons I don't think matter, I'm running make fetch-recursive on a bunch of ports to fetch their distfiles. I have set these 2 variables to affect how make fetch runs:
Code:
export HTTP_TIMEOUT="5"
export FETCH_ARGS="-Fmv"

I run make fetch-recursive and I see this:

Code:
===> Fetching all distfiles required by p5-Test-Without-Module-0.23 for building
===>   p5-Class-Load-XS-0.10 depends on file: /usr/local/sbin/pkg - found
=> Class-Load-XS-0.10.tar.gz doesn't seem to exist in /usr/ports/distfiles/.
=> Attempting to fetch https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
resolving server address: cpan.metacpan.org:443
SSL options: 82004850
TLSv1.2 connection established using ECDHE-RSA-CHACHA20-POLY1305
Certificate subject: /CN=*.metacpan.org
Certificate issuer: /C=BE/O=GlobalSign nv-sa/CN=GlobalSign Atlas R3 DV TLS CA 2024 Q1
requesting https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
fetch: transfer timed out
fetch: Class-Load-XS-0.10.tar.gz appears to be truncated: 0/77930 bytes
=> Attempting to fetch https://cpan.metacpan.org/modules/by-module/Class-Load-XS-0.10.tar.gz
resolving server address: cpan.metacpan.org:443

It hangs. It does not eventually download the various distfiles. It will gradually iterate through various mirrors. It will give the same sort of "appears to be truncated" message each time.

I can also run make fetch FETCH_BINARY="/usr/local/bin/curl" FETCH_ARGS="-L -v" and I will see this:

Code:
$ cd /usr/ports/*/p5-Class-Load-XS
sudo make fetch FETCH_BINARY="/usr/local/bin/curl" FETCH_ARGS="-L -v"
Password:
===>  License ART20 accepted by the user
===>   p5-Class-Load-XS-0.10 depends on file: /usr/local/sbin/pkg - found
=> Class-Load-XS-0.10.tar.gz doesn't seem to exist in /usr/ports/distfiles/.
=> Attempting to fetch https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
*   Trying 0.1.48.106:80...

Two things really stand out as strange. The URL I'm seeing is https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz. That's HTTPS. Look at the output from curl. A wacky IP address and port 80: Trying 0.1.48.106:80. . Now I don't know if using curl instead of fetch is actually supported this way. This might be a red herring.

Now what really bakes my noodle is that I can fetch the file directly with curl, but not fetch.
Code:
$ cd /tmp
$ fetch https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
fetch: transfer timed out
fetch: Class-Load-XS-0.10.tar.gz appears to be truncated: 0/77930 bytes

$ curl -v -O https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
curl -v -O https://cpan.metacpan.org/modules/by-module/Class/Class-Load-XS-0.10.tar.gz
* Host cpan.metacpan.org:443 was resolved.
* IPv6: 2a04:4e42:400::729, 2a04:4e42:200::729, 2a04:4e42::729, 2a04:4e42:600::729
* IPv4: 151.101.2.217, 151.101.66.217, 151.101.130.217, 151.101.194.217
*   Trying [2a04:4e42:400::729]:443...
*   Trying 151.101.2.217:443...
* Connected to cpan.metacpan.org (151.101.2.217) port 443

... some stuff truncated ...

{ [5 bytes data]
100 77930  100 77930    0     0   202k      0 --:--:-- --:--:-- --:--:--  202k
* Connection #0 to host cpan.metacpan.org left intact

Some other related symptoms. freebsd-update fetch fails. See here:
Code:
src component not installed, skipped
Looking up update.FreeBSD.org mirrors... none found.
Fetching metadata signature for 14.1-RELEASE from update.FreeBSD.org... failed.
No mirrors remaining, giving up.

But other things work fine. I can use pkg to install binary packages:
Code:
sudo pkg install bind-tools
Updating FreeBSD repository catalogue...
Fetching data.pkg: 100%    7 MiB   3.5MB/s    00:02
Processing entries: 100%
FreeBSD repository update completed. 33387 packages processed.
All repositories are up to date.
The following 10 package(s) will be affected (of 0 checked):
... omitted ...

As with any good bug, I suspect DNS. :) But I don't know if it's DNS because DNS is screwed up, or if it's DNS because networking is screwed up.

I'm running local unbound and if I do something like dig 0.freebsd.pool.ntp.org I see: ;; communications error to 127.0.0.1#53: timed out then I see the actual information. (here's the summary from the bottom):
Code:
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1) (UDP)
;; WHEN: Tue Aug 13 13:48:23 EDT 2024
;; MSG SIZE  rcvd: 115

So first it seems to time out, then it seems to succeed.

I'm a little at my wits' end. Any ideas?
 
One small thing I figured out that explains the curl versus fetch behaviour. There's a script /usr/ports/Mk/scripts/do-fetch.sh that adds -S ${SIZE} to whatever ${FETCH_CMD} you use. So the curl command line that actually executes is something like /usr/local/bin/curl -S 13242 https://blah.blah.blah/. And it's that integer number that is being turned into an IP address and then curl is trying to connect to it. So that much is explained. There are a bunch of variables you can use to alter the behaviour of fetching in the ports build process, like ${FETCH_CMD}, ${FETCH_BINARY}, and ${FETCH_ARGS}. No set of variables can stop that -S ${SIZE} from being added. So it isn't clear what good it would be to use any program other than fetch.

I'm still working on this, and I think one problem I'm bumping into is just a coincidence: I am downloading a LOT of perl modules. And the host cpan.metacpan.org seems to have some small problems. When I resolve that name, I end up with 3 IPv6 addresses: 2a04:4e42::729, 2a04:4e42:400::729, and 2a04:4e42:600::729. Well, I can't ping that 3rd one. So the hanging and not making progress sometimes is almost certainly related to that. Switching to only IPv4 got that moving.

Code:
$  ping6 2a04:4e42:600::729
PING(56=40+8+8 bytes) 2001:xxx:xxxx:x::245 --> 2a04:4e42:600::729
^C
--- 2a04:4e42:600::729 ping statistics ---
8 packets transmitted, 0 packets received, 100.0% packet loss
$  ping6 2a04:4e42:400::729
PING(56=40+8+8 bytes) 2001:xxx:xxxx:x::245 --> 2a04:4e42:400::729
16 bytes from 2a04:4e42:400::729, icmp_seq=0 hlim=59 time=4.417 ms
16 bytes from 2a04:4e42:400::729, icmp_seq=1 hlim=59 time=4.381 ms
16 bytes from 2a04:4e42:400::729, icmp_seq=2 hlim=59 time=7.543 ms
^C
--- 2a04:4e42:400::729 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 4.381/5.447/7.543/1.482 ms
$  ping6 2a04:4e42::729
PING(56=40+8+8 bytes) 2001:xxx:xxxx:x::245 --> 2a04:4e42::729
16 bytes from 2a04:4e42::729, icmp_seq=0 hlim=59 time=10.319 ms
16 bytes from 2a04:4e42::729, icmp_seq=1 hlim=59 time=12.318 ms
16 bytes from 2a04:4e42::729, icmp_seq=2 hlim=59 time=11.918 ms
^C
--- 2a04:4e42::729 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
 
Back
Top