BIND9.11 rndc "host unreachable" UPDATE: caused by soreceive_stream

sko · May 24, 2017

Hello,

To clean up our local nameserver infrastructure from multiple mixed master/slave servers to a single master, I just configured multiple new namservers within jails on different hosts. All hosts are running FreeBSD 11.0-RELEASE-p10 and bind911-9.11.1 from packages.

While on the master-ns rndc is working:

Code:

# rndc status
version: BIND 9.11.1 <id:e3dc2e7> ([hidden])
running on ns0: FreeBSD amd64 11.0-RELEASE-p10 FreeBSD 11.0-RELEASE-p10 #5 r309898M: Fri May  5 12:14:20 CEST 2017     root@stor1:/usr/obj/usr/src/sys/NETGRAPH_VIMAGE
boot time: Wed, 24 May 2017 11:42:05 GMT
last configured: Wed, 24 May 2017 11:42:05 GMT
configuration file: /usr/local/etc/namedb/named.conf
CPUs found: 8
worker threads: 8
UDP listeners per interface: 7
number of zones: 918 (882 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is ON
recursive clients: 0/900/1000
tcp clients: 0/150
server is up and running

on the slaves rndc is running but not working:

Code:

# netstat -na4 | grep 953
tcp4       0      0 10.60.50.2.953         *.*                    LISTEN
# nc -v 10.60.50.2 953
Connection to 10.60.50.2 953 port [tcp/rndc] succeeded!
^C
# rndc status
rndc: recv failed: host unreachable
# rndc -s 127.0.0.1 status
rndc: recv failed: host unreachable
# rndc -s 10.60.50.2 status                                                                                                                                
rndc: recv failed: host unreachable

tcpdump reveals there is a proper connection in both directions:

Code:

# tcpdump -nti lo0 port 953                                                                                                                                
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 262144 bytes
IP 10.60.50.2.15234 > 10.60.50.2.953: Flags [S ], seq 4272549536, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 685264170 ecr 0], length 0
IP 10.60.50.2.953 > 10.60.50.2.15234: Flags [S.], seq 3987622948, ack 4272549537, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 685264170 ecr 685264170], length 0
IP 10.60.50.2.15234 > 10.60.50.2.953: Flags [.], ack 1, win 1276, options [nop,nop,TS val 685264170 ecr 685264170], length 0
IP 10.60.50.2.15234 > 10.60.50.2.953: Flags [P.], seq 1:148, ack 1, win 1276, options [nop,nop,TS val 685264170 ecr 685264170], length 147
IP 10.60.50.2.15234 > 10.60.50.2.953: Flags [F.], seq 148, ack 1, win 1276, options [nop,nop,TS val 685264170 ecr 685264170], length 0
IP 10.60.50.2.953 > 10.60.50.2.15234: Flags [.], ack 149, win 1274, options [nop,nop,TS val 685264170 ecr 685264170], length 0
IP 10.60.50.2.953 > 10.60.50.2.15234: Flags [F.], seq 1, ack 149, win 1276, options [nop,nop,TS val 685264170 ecr 685264170], length 0
IP 10.60.50.2.953 > 10.60.50.2.15234: Flags [F.], seq 1, ack 149, win 1276, options [nop,nop,TS val 685264401 ecr 685264170], length 0
IP 10.60.50.2.953 > 10.60.50.2.15234: Flags [F.], seq 1, ack 149, win 1276, options [nop,nop,TS val 685264664 ecr 685264170], length 0
[...]

So the host definately *is* reachable, but either rndc isn't responding or the responses are being dropped.

The master-ns is running on a host within our mgmt-VLAN which has no PF configured. The other hosts are acting as gateways and have pf configured.

All jails have similar settings regarding raw sockets, sysvipc etc. I even checked the differences of iocage get all <jail> from all hosts with diff and apart from host-specific options (UUIDs, names...) they are identically configured.
All hostst share the same jail- and networking related settings in /boot/loader.conf and /etc/sysctl.conf

rndc on all hosts is configured via rndc.conf (no seperate rndc.key file); in named.conf the keys are included and 'controls' are set accordingly (currently allow { any; } keys { rndc-key; }; for troubleshooting).
Using the localhost (127.0.0.1) or any external IP for rndc (in both, named.conf and rndc.conf) doesn't make any difference.

The only difference between the host where rndc is working and the others is PF, so to ensure it won't (shouldnt?) block rndc/dns traffic I temporarily added this rule on all hosts:

Code:

pass quick proto { tcp, udp } from any to any port { domain, rndc }

pflog0 doesn't report any blocked packets on port 953 when trying to connect to/via rndc, either within the jail or from another host.

I'm really out of ideas here. While every sysctl knob and jail/networking related configuration I've checked (and know/remember I've ever set or changed) is identical, I still don't want to rule out that I've missed something.
OTOH, the only thing that is different between the working host configuration and the others (and consistent between them) is PF. Although I've set a (very open) rule and it isn't reporting any blocked packets, my bet would be on PF as the culprit here...

I'd really apreciate any idea or hint on this as I'm really out of ideas...

sko · May 30, 2017

I spent some more time troubleshooting this issue and it turned out to be not limited to jails - on both host machines rndc was also not working. tshark revealed lots of out-of-order segments, so the problem seemed to be deeper down in the stack.
Today I set up a test machine with vanilla 11.0-RELEASE and started adding one configuration file/option/tunable at a time from the affected hosts - fortunately rndc broke with the very first file (/boot/loader.conf).
Long story short: It turns out the optimized soreceive() for streams breaks rndc:

Code:

root@test:~ # sysctl net.inet.tcp.soreceive_stream
net.inet.tcp.soreceive_stream: 0
root@test:~ # rndc status
version: BIND 9.11.1 <id:e3dc2e7>
running on test: FreeBSD amd64 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
boot time: Tue, 30 May 2017 08:51:30 GMT
last configured: Tue, 30 May 2017 08:51:30 GMT
configuration file: /usr/local/etc/namedb/named.conf
CPUs found: 4
worker threads: 4
UDP listeners per interface: 3
number of zones: 162 (1 automatic)
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/900/1000
tcp clients: 0/150
server is up and running

Code:

root@test:~ # sysctl net.inet.tcp.soreceive_stream
net.inet.tcp.soreceive_stream: 1
root@test:~ # rndc status
rndc: recv failed: host unreachable

I couldn't find any information if this is intended/expected or a known problem. The patch adding the 'new' soreceive_stream() function dates back to 2007 [1]. The current manpage for soreceive(9) mentions no soreceive_stream()-related bugs.
Maybe someone with a bit more insight into the inner workings of the kernel socket interface can confirm this as a possible bug or enlighten me why soreceive_stream() breaks rndc?

[1] https://docs.freebsd.org/cgi/getmsg...2007/freebsd-current/20070304.freebsd-current

SirDice · May 30, 2017

I would recommend not "tuning" anything unless you run into problems. A lot of the tuning guides you find on the internet are based on really old versions. FreeBSD nowadays does an excellent job tuning itself and there's rarely a need for manual intervention. The mentioned tunable dates from a time when -CURRENT was what later became 7.0-RELEASE. That's a long time ago and a lot of things have changed in the mean time.

sko · May 30, 2017

I remember adding this tunable to increase 10G performance (IIRC back with 10.2-RELEASE). But as these hosts are not connected via 10G Ethernet and the impact on single-Gbit connections is rather minimal, I just removed them and added a comment to the loader.conf on hosts that still use that tunable...

I try to set some time aside for some benchmarks with hosts that actually use 10G Ethernet to check whether this tunable still has an impact on FreeBSD 11.0.

SirDice · May 30, 2017

sko said:
I try to set some time aside for some benchmarks with hosts that actually use 10G Ethernet to check whether this tunable still has an impact on FreeBSD 11.0.

That's definitely a good idea. Start with a clean, untuned, version, then add one tuning parameter, measure, add another, measure again, etc.

sko · May 30, 2017

SirDice said:
That's definitely a good idea. Start with a clean, untuned, version, then add one tuning parameter, measure, add another, measure again, etc.

Thats what I did when I set up the first hosts of our new infrastructure and I had multiple servers available for testing for several weeks - first tests were made with 10.2, then during testing/pre-configuration updated to 10.3. Unfortunately some of my comments are rather scarce, so I really can't tell how much of an impact some of the tunables made.
Now as all of these hosts are in production, I have to wait until I have a window of 1-2h where I can bring down 2 of the hosts with 10G ethernet. Or I'll have to wait until I get our next Xeon-D System (with dual igbx 10G) for one of our branches ~sometimes in July...

Rob Burrowes · Nov 3, 2017

Ever solve this? I'm seeing exactly the same thing.

Both hosts running FreeBSD11.1 p3. One with PF ALTQ added to the generic kernel and the other without. rndc runs fine on the host without ALTQ, but gives "rndc: recv failed: host unreachable" on the one with the ALTQ kernel. I disabled pf in rc.conf. I then cleared out sysctl.conf. Then I tried compiling named/rndc from the latest bind-9.11.2 source just to make sure, but the same error occurs (and just on the ALTQ based system).

P.S.
Also had to add "options TCP_RFC7413" to the generic kernel with ALTQ, but not to the one without, and to also to add "net.inet.tcp.fastopen.enabled=1" to sysctl.conf on the ALTQ kernel's host only. I was getting an error logged in /var/messages from named "setsockopt(25, TCP_FASTOPEN) failed with Protocol not available". I don't get that message on the system with the GENERIC kernel.

Code:

root# rndc -V status
create memory context
create socket manager
create task manager
create task
create logging context
setting log tag
creating log channel
enabling log channel
create parser
get default key
get config key list
decode base64 secret
allocate data buffer
status
post event
using server 127.0.0.1 (127.0.0.1#953)
create socket
bind socket
connect
create message
render message
schedule recv
send message
rndc: recv failed: host unreachable

Working Kernel on host db continues with
parse message
version: BIND 9.11.2 <0a2b929> (Nov 3 2017)
running on db: 11.1-RELEASE-p3 FreeBSD 11.1-RELEASE-p3 #6: Fri Nov 3 13:47:37 NZDT 2017
...

Rob Burrowes · Nov 3, 2017

Got it. Didn't click when you said soreceive() was the problem. I had added:
net.inet.tcp.soreceive_stream="1" # (default 0)
to /boot/loader.conf on an earlier version of the OS and had blindly copied this to the new kernel.
Removing this line from loader.conf fixed the problem with rndc.

BIND9.11 rndc "host unreachable" UPDATE: caused by soreceive_stream

sko

sko

SirDice

Administrator

sko

SirDice

Administrator

sko

Rob Burrowes

Rob Burrowes