Solved Sockets fail erratically when kqueue is active

PMc · Sep 29, 2024

This is my postgres client conncting the kerberos KDC to obtain a service ticket.
Earlier these TCP connections did run like this, and were successful:

Code:

13:57:53.788797 IP6 clientIPv6.54143 > serverIPv6.88: Flags [S], seq 4189109662, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 402503231 ecr 0], length 0
13:57:53.789053 IP6 serverIPv6.88 > clientIPv6.54143: Flags [S.], seq 1604590824, ack 4189109663, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1505702993 ecr 402503231], length 0
13:57:53.789122 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 0
13:57:53.789223 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 4
13:57:53.829575 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 5, win 1035, options [nop,nop,TS val 1505703037 ecr 402503231], length 0
13:57:53.829696 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 402503271 ecr 1505703037], length 1161
13:57:53.832411 IP6 serverIPv6.88 > clientIPv6.54143: Flags [P.], seq 1:5, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 4
13:57:53.832440 IP6 serverIPv6.88 > clientIPv6.54143: Flags [FP.], seq 5:907, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 902
13:57:53.832461 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 908, win 1021, options [nop,nop,TS val 402503276 ecr 1505703037], length 0
13:57:53.832620 IP6 clientIPv6.54143 > serverIPv6.88: Flags [F.], seq 1166, ack 908, win 1035, options [nop,nop,TS val 402503276 ecr 1505703037], length 0
13:57:53.832847 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 1167, win 1035, options [nop,nop,TS val 1505703037 ecr 402503276], length 0

After a chenge of the application they now look like this:

Code:

16:53:09.637708 IP6 clientIPv6.49124 > serverIPv6.88: Flags [S], seq 2860816940, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 2664510889 ecr 0], length 0
16:53:09.637990 IP6 serverIPv6.88 > clientIPv6.49124: Flags [S.], seq 2595272492, ack 2860816941, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 3896716896 ecr 2664510889], length 0
16:53:09.638153 IP6 clientIPv6.49124 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 0
16:53:09.638158 IP6 clientIPv6.49124 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 4
16:53:09.638162 IP6 clientIPv6.49124 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 1161
16:53:09.638451 IP6 serverIPv6.88 > clientIPv6.49124: Flags [.], ack 1167, win 1017, options [nop,nop,TS val 3896716896 ecr 2664510889], length 0
16:53:09.638508 IP clientIPv4.58934 > serverIPv4.88: Flags [S], seq 3474302110, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 726603825 ecr 0], length 0
16:53:09.638696 IP serverIPv4.88 > clientIPv4.58934: Flags [S.], seq 2424341350, ack 3474302111, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2662337458 ecr 726603825], length 0
16:53:09.638729 IP clientIPv4.58934 > serverIPv4.88: Flags [.], ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 0
16:53:09.638733 IP clientIPv4.58934 > serverIPv4.88: Flags [P.], seq 1:5, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 4
16:53:09.638751 IP clientIPv4.58934 > serverIPv4.88: Flags [FP.], seq 5:1166, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 1161
16:53:09.638856 IP6 clientIPv6.46516 > serverIPv6.88: Flags [S], seq 3686126664, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1901751531 ecr 0], length 0
16:53:09.639054 IP serverIPv4.88 > clientIPv4.58934: Flags [.], ack 1167, win 1009, options [nop,nop,TS val 2662337458 ecr 726603825], length 0
16:53:09.639071 IP6 serverIPv6.88 > clientIPv6.46516: Flags [S.], seq 3830196306, ack 3686126665, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 457425497 ecr 1901751531], length 0
16:53:09.639105 IP6 clientIPv6.46516 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 0
16:53:09.639130 IP6 clientIPv6.46516 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 4
16:53:09.639154 IP6 clientIPv6.46516 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 1161

This is repeated LOTS of times, and what remains is hundreds of sockets in FIN_WAIT_2 that do not appear to disappear quickly.
The error message is GSSAPI continuation error: Miscellaneous failure: unable to reach any KDC in realm
(In fact in does reach the KDC, only the TCP doesn't work)

A configuration problem on the machines can be ruled out, because both old (working) and new (failing) application are installed on the same machine at the same time, using the same network, same hardware, same OS, same libgssapi and same postgres client software.

What is going on there?

PMc · Sep 30, 2024

I got some further and could isolate the culprit: kqueue(2)

The strangely failing connections as shown above did appear after kqueue was monitoring ~4000 files. Reducing these below 1000 files, and the problem goes away.

So I wrote a testcase.

Code:

#! /usr/local/bin/ruby33

require 'resolv'

# listen exists as FreeBSD port devel/rubygem-listen
# devel/rubygem-rb-kqueue must also be installed

Gem.paths = { 'GEM_HOME' => Gem.paths.home,
              'GEM_PATH' => Gem.paths.path.prepend("/ext/gems/3.3").join(':') }
require 'listen'

Listen.to("/var/spool/news/alt") do |modified, added, removed|
  puts "some file has changed"
end.start

puts "listen started"
sleep 10

puts Resolv.getaddresses("google.com").inspect
puts ::Socket.getaddrinfo("google.com",
                     nil, Socket::AF_UNSPEC, Socket::SOCK_STREAM).inspect

In my case the concerned tools rubygem-listen and rubygem-rb-kqueue are in a separate repo, but they are also in ports.

/var/spool/news/alt does contain some 100'000 files, and consequentially the socket does no longer work at all:

Code:

$ ruby33 test.rb
listen started
["142.250.184.206", "2a00:1450:4001:830::200e"]
test.rb:20:in `getaddrinfo': getaddrinfo: Name does not resolve (Socket::ResolutionError)
        from test.rb:20:in `<main>'

Changing listen to a directory with fewer files, and things do just work:

Code:

$ ruby33 test.rb
listen started
["142.250.184.206", "2a00:1450:4001:830::200e"]
[["AF_INET6", 0, "2a00:1450:4001:830::200e", "2a00:1450:4001:830::200e", 28, 1, 6], ["AF_INET", 0, "142.250.184.206", "142.250.184.206", 2, 1, 6]]

zirias@ · Sep 30, 2024

Hm. kqueue needs an opened file descriptor per watched file to work (very unfortunately). Could this be an issue with "high" file descriptors?

What happens if you keep some dummy fds (e.g. just dups of the standard streams) and close one of them before creating a socket? does it still fail? (edit: or to avoid races with the watcher grabbing the fd, first create the socket and then dup2 it to one of your dummy fds?)

PMc · Sep 30, 2024

zirias@ said:
Hm. kqueue needs an opened file descriptor per watched file to work (very unfortunately). Could this be an issue with "high" file descriptors?

Probably yes. And it seems the ruby interpreter going weird, not the OS itself.

I looked into truss, what happens is this:

10576 212524: 145.735397262 open("/etc/hosts",O_RDONLY|O_CLOEXEC,0666) = 130731 (0x1feab)
10576 212524: 145.735484131 close(130731) = 0 (0x0)
10576 212524: 145.735805290 open("/etc/resolv.conf",O_RDONLY|O_CLOEXEC,0666) = 130731 (0x1feab)
10576 212524: 145.735887481 close(130731) = 0 (0x0)
10576 212524: 145.736305918 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 130731 (0x1feab)
10576 212524: 145.736461194 connect(130731,{ AF_INET 0.0.0.0:53 },16) = 0 (0x0)
10576 212524: 145.736678421 sendto(130731,"0\M-Y\^A\0\0\^A\0\0\0\0\0\0\^Fgo"...,28,0,NULL,0) = 28 (0x1c)
10576 212524: 145.736823153 poll({ 130731/POLLRDNORM },1,5000) = 1 (0x1)
10576 212524: 145.736924730 recvfrom(130731,0xc341d26c0,65536,0,0xc32189f10,0xc32189afc) ERR#61 'Connection refused'
10576 212524: 145.737024272 close(130731) = 0 (0x0)
10576 212524: 145.737112685 socket(PF_INET6,SOCK_DGRAM|SOCK_CLOEXEC,0) = 130731 (0x1feab)
10576 212524: 145.737219941 connect(130731,{ AF_INET6 [::]:53 },28) = 0 (0x0)
10576 212524: 145.737407731 sendto(130731,"0\M-Y\^A\0\0\^A\0\0\0\0\0\0\^Fgo"...,28,0,NULL,0) = 28 (0x1c)
10576 212524: 145.737505083 poll({ 130731/POLLRDNORM },1,2000) = 1 (0x1)
10576 212524: 145.737589753 recvfrom(130731,0xc341d26c0,65536,0,0xc32189f10,0xc32189afc) ERR#61 'Connection refused'
10576 212524: 145.737680407 close(130731) = 0 (0x0)

zirias@ said:
What happens if you keep some dummy fds (e.g. just dups of the standard streams) and close one of them before creating a socket?
does it still fail? (edit: or to avoid races with the watcher grabbing the fd, first create the socket and then dup2 it to one of your dummy fds?)

Thats the solution.

Code:

/usr/ports/devel/rubygem-listen/work/listen-3.9.0 # diff -u lib/listen/adapter/bsd.rb.orig lib/listen/adapter/bsd.rb
--- lib/listen/adapter/bsd.rb.orig      2024-09-30 20:40:59.780597000 +0200
+++ lib/listen/adapter/bsd.rb   2024-09-30 20:33:42.029902000 +0200
@@ -41,11 +41,18 @@
       private

       def _configure(directory, &callback)
+        f = Array.new
+        500.times do
+          f << open('/dev/null', 'rb')
+        end
         @worker ||= KQueue::Queue.new
         @callback = callback
         # use Record to make a snapshot of dir, so we
         # can detect new files
         _find(directory.to_s) { |path| _watch_file(path, @worker) }
+        f.each do |file|
+          file.close
+        end
       end

       def _run

Seems to work. Thank You, I as already at giving up and accepting the ever-humming fan with fstat-monitoring. Enjoying the silence now.

zirias@ · Sep 30, 2024

PMc said:
I looked into truss, what happens is this:

But the sequence of syscalls looks somewhat sane for trying to query DNS. What am I missing here? Maybe the error from recvfrom(2) does indicate some bug? ? ECONNREFUSED isn't documented as a possible error...

PMc said:
Seems to work.

Not sure about the code structure (and I don't speak ruby...) -- but wouldn't this code somewhere pick up newly created files to watch? If so, this code would probably "wear", with your 500 reserved fds consumed by the watcher eventually.

PMc · Oct 1, 2024

zirias@ said:
But the sequence of syscalls looks somewhat sane for trying to query DNS. What am I missing here? Maybe the error from recvfrom(2) does indicate some bug? ? ECONNREFUSED isn't documented as a possible error...

Sane, hmja, but ...

PMc said:
10576 212524: 145.735397262 open("/etc/hosts",O_RDONLY|O_CLOEXEC,0666) = 130731 (0x1feab)
10576 212524: 145.735484131 close(130731) = 0 (0x0)
10576 212524: 145.735805290 open("/etc/resolv.conf",O_RDONLY|O_CLOEXEC,0666) = 130731 (0x1feab)
10576 212524: 145.735887481 close(130731) = 0 (0x0)

... there is a read() missing between open() and close() ...
And then, lacking a nameserver address, the queries go to AF_INET 0.0.0.0:53 and AF_INET6 [::]:53,

zirias@ said:
Not sure about the code structure (and I don't speak ruby...) -- but wouldn't this code somewhere pick up newly created files to watch? If so, this code would probably "wear", with your 500 reserved fds consumed by the watcher eventually.

If it really manages to add newly created files into the monitoring, yes. But the whole idea of these watchers is to monitor an application in development mode and dynamically reload components that are changed, into the running app. It is unlikely that some development steps would add 500 files (and not want to do a full restart).
And I do not utterly need tnat watcher anyway, because I only do some porting and integration; I have no plans to become a web developer. It is just difficult to cleanly disable such features when they are somewhere in a 200MB codebase.

zirias@ · Oct 2, 2024

PMc said:
... there is a read() missing between open() and close() ...
And then, lacking a nameserver address, the queries go to AF_INET 0.0.0.0:53 and AF_INET6 [::]:53,

Ah thanks, that's the part I missed. So (the question aside why it does its own custom resolving at all), not reading these files is clearly a bug in ruby, looks like for some reason, it considers this high fd "invalid"? ?

There's still something fishy with what follows. Yes, sending to a 0 address is invalid, but THIS call doesn't fail, instead the receive call returns an error (that might be appropriate for the sending) not documented for receiving. Well, whatever, something's broken.

PMc said:
I have no plans to become a web developer.

Sure, better stay sane. ?

PMc · Oct 2, 2024

zirias@ said:
Ah thanks, that's the part I missed.

I had the advantage of comparing working and failing version

zirias@ said:
So (the question aside why it does its own custom resolving at all),

What else should it do? There is AFAIK no system facility to do resolving, that's why some people install a local caching/resolving nameserver.

But that reminds me... there are two ruby implementations for resolving. The old one was added to the Socket object, the new one uses a Resolv object that was designed to better integrate. I have both in the example above:

PMc said:

The old Socket implementation uses libc:getaddrinfo() - and that is the one that fails. It can easily be verified that it indeed uses libc, with a breakpoint:
(lldb) b libc.so.7`getaddrinfo

The Resolv implementation does not hit that breakpoint. I don't know how it is implemented, discussions are that it is built in a threading-aware fashion while the Socket implementation does block. And this is the one that is not affected and continues to work properly.

So the bogus here happens in libc.

zirias@ said:
not reading these files is clearly a bug in ruby, looks like for some reason, it considers this high fd "invalid"? ?

I do think that there is likely a bug in ruby, but it can't be that simple. Ruby will never know which fd the resolver lib obtained for reading /etc/resolv.conf

Also there is the situation with the database access (my first posting). ruby connects to the database, the kerberos service ticket is not yet obtained and must be fetched. This is the job of libkrb - which is mapped via a mech into libgssapi, then libgssapi is linked to the postgres clientlib libpq, and that one is called from ruby. (The postgres folks have already denied responsibility for whatever kerberos does

)

So what happens: ruby starts to open a connection to the database. libpq initiates the connection, the DB server answers that it wants to use GSSAPI. Then libpq queries libgssapi. and somewhere deep in libkrb it is found that we only have a TGT but no postgres service ticket. So an activity is initiated to fetch that from the KDC server.
And during that communication we now have a very similar effect as in the read of /etc/resolv.conf: the socket is correctly opened, but nothing is read.

Similar picture, but totally different locations. What is in common: it is a single process, and the filehandles are all together.

zirias@ said:
There's still something fishy with what follows. Yes, sending to a 0 address is invalid, but THIS call doesn't fail, instead the receive call returns an error (that might be appropriate for the sending) not documented for receiving. Well, whatever, something's broken.

Isn't that asynchronous? When the write returns, we have an mbuf that starts it's journey. I don't know for sure, but I think it can walk around quite a while in kernel structures until it is determined that it cannot be sent.