netstat -L doesn't list some listen queues

rihad · May 30, 2017

Hi, we have a moderately loaded box (10.3-RELEASE-p11) running ruby on rails servers inside jails. They work fine, but after some uptime measured in weeks/months we can no longer see the listen queues of some of them:

$ sudo netstat -Lan|fgrep 127.0.0.60
$ jls

Code:

...
   JID  IP Address      Hostname                      Path
     3  127.0.0.60      an.example.com                        /var/jails/an.example.com
...

Where did they go? Also, there's lots of messages of this kind:

Code:

May 30 13:37:16 myhost kernel: sonewconn: pcb 0xfffff808e1d00ab8: Listen queue overflow: 1 already in queue awaiting acceptance (2 occurrences)
May 30 13:39:51 myhost kernel: sonewconn: pcb 0xfffff8072919f188: Listen queue overflow: 1 already in queue awaiting acceptance (4 occurrences)
May 30 13:41:25 myhost kernel: sonewconn: pcb 0xfffff8072919f188: Listen queue overflow: 1 already in queue awaiting acceptance (5 occurrences)
May 30 13:43:00 myhost kernel: sonewconn: pcb 0xfffff808e1d00ab8: Listen queue overflow: 1 already in queue awaiting acceptance (4 occurrences)
May 30 13:44:20 myhost kernel: sonewconn: pcb 0xfffff80fdb43e310: Listen queue overflow: 1 already in queue awaiting acceptance (2 occurrences)
May 30 13:45:58 myhost kernel: sonewconn: pcb 0xfffff80fdb43e310: Listen queue overflow: 1 already in queue awaiting acceptance (5 occurrences)
May 30 13:47:50 myhost kernel: sonewconn: pcb 0xfffff8072919f188: Listen queue overflow: 1 already in queue awaiting acceptance (1 occurrences)
May 30 13:48:50 myhost kernel: sonewconn: pcb 0xfffff8072919f188: Listen queue overflow: 1 already in queue awaiting acceptance (2 occurrences)
May 30 13:50:08 myhost kernel: sonewconn: pcb 0xfffff8072919f188: Listen queue overflow: 1 already in queue awaiting acceptance (3 occurrences)

Code:

$ cat /boot/loader.conf
# This configuration file is used by loader.efi, which is executed when booting in *BIOS* mode (only).
# DO NOT EDIT this file unless you know what you are doing !
# See the /boot/loader.rc file within the EFI partition when booting in UEFI mode.
kern.geom.label.gptid.enable="0"
zfs_load="YES"
net.fibs=4
carp_load="YES"

ahci_load="YES"
accf_http_load="YES"
accf_data_load="YES"
aio_load="YES"

kern.hz=250
kern.maxdsiz="2048M"
kern.dfldsiz="2048M"
kern.maxssiz="2048M"
kern.ipc.maxpipekva="2048M"
kern.ipc.semmni=512
kern.ipc.semmns=1024
kern.ipc.semmnu=512

kern.maxproc="12328"
kern.maxprocperuid="11094"

kern.ipc.shm_use_phys=1
kern.ipc.nmbclusters=131072
kern.ipc.maxsockbuf=524288
kern.ipc.nsfbufs=10240

kern.sync_on_panic=1

net.inet.tcp.tcbhashsize=16384

vfs.zfs.arc_max="32G"

Code:

net.link.ether.inet.log_arp_movements=0
net.inet6.ip6.accept_rtadv=0
net.inet6.ip6.auto_linklocal=0

net.inet.ip.fw.dyn_buckets=8096
net.inet.ip.fw.verbose_limit=1000
net.inet.ip.fw.verbose=1
net.inet.tcp.fast_finwait2_recycle=1
net.inet.tcp.finwait2_timeout=15000
net.inet.tcp.drop_synfin=1
net.inet.ip.fw.dyn_max=65536

kern.maxfiles=62020
kern.maxfilesperproc=22190
kern.ipc.somaxconn=4096

kern.ipc.somaxconn=8192
net.inet.ip.portrange.hifirst=10000


net.inet.ip.intr_queue_maxlen=5120

net.inet.tcp.ecn.enable=1

# Postgresql
kern.ipc.shmmax=4294967296
# kern.ipc.shmall = kern.ipc.shmmax / hw.pagesize
kern.ipc.shmall=1048576

# Misc
## To disable closed port RST responses
net.inet.tcp.blackhole=0
net.inet.udp.blackhole=0
net.inet.icmp.icmplim_output=0

##  For readproctitle
kern.ps_arg_cache_limit=512

# UFS read-ahead http://ivoras.sharanet.org/blog/tree/2010-11-19.ufs-read-ahead.html
vfs.read_max=256

Thanks for any tips.

rihad · May 30, 2017

Code:

$ netstat -m
17089/9671/26760 mbufs in use (current/cache/total)
16584/5120/21704/131072 mbuf clusters in use (current/cache/total/max)
16584/4693 mbuf+clusters out of packet secondary zone in use (current/cache)
261/2087/2348/2037612 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/603736 9k jumbo clusters in use (current/cache/total/max)
0/0/0/339602 16k jumbo clusters in use (current/cache/total/max)
38484K/21005K/59490K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
33986212 requests for I/O initiated by sendfile
$

rihad · May 30, 2017

Also, in the output of netstat -Lan there's a message after all tcp4 sockets:

Code:

...
tcp4  0/0/128        127.0.0.2.8080         
tcp4  0/0/1024       172.16.1.13.11211     
tcp4  0/0/8192       172.16.1.13.514       
tcp4  0/0/128        127.0.0.2.5433         
Some tcp sockets may have been deleted.
unix  0/0/5          /var/run/ntpd.sock
unix  0/0/4          /var/run/devd.pipe
unix  0/0/4          /var/run/devd.seqpacket.pipe

What gives?

apacketofsweets · May 31, 2017

Does it happen if you run netstat -a | grep LISTEN? That gives me a similar output and works with jails with a lot of uptime.

rihad · May 31, 2017

netstat -a doesn't list listen queues, but netstat -L does.

rihad · May 31, 2017

Or at least it should

rihad · Jun 16, 2017

It has turned out that listen queues disappeared pretty quickly only for services which had their backlog (listen(2)) set to 64. Setting it to 128 and up failed to repeat the problem

rihad · Jun 19, 2017

Sorry, guys, the problem wasn't in FreeBSD, but in our misunderstanding of Ruby. We used this code in Unicorn to set the listen backlog:

Code:

listen ... backlog: (ENV['LISTEN_BACKLOG'].to_i || 64)

When LISTEN_BACKLOG isn't set in the environment to_i returns zero, and zero is true in Ruby. So || 64 had zero chance. Passing 0 as a second argument to listen(2) tells it not to create any listen queue at all. I hate ruby