Solved Listen queue overflow

circus78 · Jul 24, 2018

Hi,
I am using 11.1-RELEASE-p11, with standard email services (postfix, dovecot, nginx).
It seems that I am experiencing very poor performance (especially with IMAP).
This is the errors I get:

Code:

sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (225 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (44 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (94 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (153 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (36 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (110 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (47 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (85 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (121 occurrences)

This is netstat -Lan output:

Code:

# netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address         
tcp4  0/0/100                          *.25                   
tcp6  0/0/100                          *.993                 
tcp4  123/0/100                        *.993                 
tcp6  0/0/100                          *.143                 
tcp4  18/0/100                         *.143                 
tcp6  0/0/100                          *.995                 
tcp4  0/0/100                          *.995                 
tcp6  0/0/100                          *.110                 
tcp4  0/0/100                          *.110                 
tcp4  0/0/4096                         *.443                 
tcp6  0/0/4096                         *.80                   
tcp4  0/0/4096                         *.80                   
tcp4  0/0/128                          *.22                   
tcp6  0/0/128                          *.22                   
tcp4  0/0/4096                         127.0.0.1.8891         
tcp4  0/0/5                            *.873                 
tcp6  0/0/5                            *.873                 
tcp4  0/0/128                          *.199

So, it seems I have too many connection on port 993/TCP, am I wrong?

I added this line on sysctl.conf:

kern.ipc.somaxconn=4096

Code:

kern.ipc.somaxconn=4096

Why is netstat showing 100 and not 4096?

Thank you

Chris_H · Jul 24, 2018

circus78 said:

Hi,

Code:

# netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address        
tcp4  0/0/100                          *.25                  
tcp6  0/0/100                          *.993                
tcp4  123/0/100                        *.993                
tcp6  0/0/100                          *.143                
tcp4  18/0/100                         *.143                
tcp6  0/0/100                          *.995                
tcp4  0/0/100                          *.995                
tcp6  0/0/100                          *.110                
tcp4  0/0/100                          *.110                
tcp4  0/0/4096                         *.443                
tcp6  0/0/4096                         *.80                  
tcp4  0/0/4096                         *.80                  
tcp4  0/0/128                          *.22                  
tcp6  0/0/128                          *.22                  
tcp4  0/0/4096                         127.0.0.1.8891        
tcp4  0/0/5                            *.873                
tcp6  0/0/5                            *.873                
tcp4  0/0/128                          *.199

So, it seems I have too many connection on port 993/TCP, am I wrong?

I added this line on sysctl.conf:

kern.ipc.somaxconn=4096

Code:

kern.ipc.somaxconn=4096

Why is netstat showing 100 and not 4096?

Thank you

Because netstat(1) is showing you what's in use. Not you're committed max.

--Chris

VladiBG · Jul 24, 2018

Too many IMAP connections. You may check your maillog and consider start using something like fail2ban+ipfw to limit the number of fail longing attempts.

circus78 · Jul 24, 2018

Chris_H said:
Because netstat(1) is showing you what's in use. Not you're committed max.

--Chris

My comprehnsion is:

Code:

qlen = 123
incqlen = 0
maxqlen = 100

Why isnt' maxqlen = 4096?

As in:

Code:

tcp4  0/0/4096                         *.80

Thank you very much for your explaination

VladiBG · Jul 24, 2018

By default dovecot limit is set to 100k connections. (1000 clients * 100 processes). When your dovecot start to respond slowly to the new connections and cannot accept any new connections the OS start to queue those new connections that you see in netstat. So instead of changing the queue using kern.ipc.soacceptqueue you should investigate why dovecot is processing the new connections slowly.

 sysctl -d kern.ipc.somaxconn

Chris_H · Jul 24, 2018

VladiBG is correct. kern.ipc.somaxconn is your target. But unless you are running a torrent server, or are an ISP. There are more reasons to not run that number at 4096 , than to.
For example. I bumped kern.ipc.somaxconn from 128 to 512 because I service anywhere from 60-160 domains, and their associated services. Running today, at ~90 domains, with dovecot. The same netstat(1) output on that box reflects:

Code:

Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address
tcp4  0/0/128        127.0.0.1.953
tcp4  0/0/10         127.0.0.1.53
tcp6  0/0/10         ::1.53
tcp4  0/0/10         24.113.41.81.53
tcp4  0/0/128        *.6000
tcp6  0/0/128        *.6000
tcp4  0/0/208        127.0.0.1.5432
tcp6  0/0/208        ::1.5432
tcp6  0/0/512        *.21
tcp4  0/0/512        *.21
tcp4  0/0/10         *.587
tcp6  0/0/10         *.25
tcp4  0/0/10         *.25
tcp4  0/0/511        *.443
tcp6  0/0/511        *.443
tcp4  0/0/511        *.80
tcp6  0/0/511        *.80
tcp6  0/0/100        *.993
tcp4  0/0/100        *.993
unix  0/0/5          /tmp/.esd-0/socket
unix  0/0/5          /tmp/gpg-pEb8bJ/S.gpg-agent.ssh
unix  0/0/5          /tmp/gpg-pggayY/S.gpg-agent
unix  0/0/128        /tmp/.ICE-unix/1707
unix  0/0/30         /tmp/dbus-KGbsa3P1Zq
unix  0/0/128        /tmp/.X11-unix/X0
unix  0/0/208        /tmp/.s.PGSQL.5432
unix  0/0/30         /var/run/hald/dbus-jQAOAjHjHe
unix  0/0/30         /var/run/hald/dbus-G0WqVLFesc
unix  0/0/50         /tmp/mysql.sock
unix  0/0/511        /var/run/dovecot/anvil-auth-penalty
unix  0/0/511        /var/run/dovecot/anvil
unix  0/0/100        /var/run/dovecot/auth-worker
unix  0/0/511        /var/run/dovecot/auth-master
unix  0/0/511        /var/run/dovecot/auth-userdb
unix  0/0/511        /var/run/dovecot/auth-client
unix  0/0/511        /var/run/dovecot/auth-login
unix  0/0/511        /var/run/dovecot/token-login/tokenlogin
unix  0/0/511        /var/run/dovecot/login/login
unix  0/0/511        /var/run/dovecot/config
unix  0/0/100        /var/run/dovecot/dict
unix  0/0/511        /var/run/dovecot/director-userdb
unix  0/0/511        /var/run/dovecot/director-admin
unix  0/0/100        /var/run/dovecot/dns-client
unix  0/0/100        /var/run/dovecot/doveadm-server
unix  0/0/100        /var/run/dovecot/imap-urlauth
unix  0/0/511        /var/run/dovecot/token-login/imap-urlauth
unix  0/0/511        /var/run/dovecot/imap-urlauth-worker
unix  0/0/511        /var/run/dovecot/login/imap
unix  0/0/511        /var/run/dovecot/indexer
unix  0/0/10         /var/run/dovecot/indexer-worker
unix  0/0/511        /var/run/dovecot/login/ipc-proxy
unix  0/0/511        /var/run/dovecot/ipc
unix  0/0/511        /var/run/dovecot/log-errors
unix  0/0/511        /var/run/dovecot/replication-notify
unix  0/0/511        /var/run/dovecot/replicator
unix  0/0/511        /var/run/dovecot/login/ssl-params
unix  0/0/511        /var/run/dovecot/ssl-params
unix  0/0/511        /var/run/dovecot/stats
unix  0/0/30         /var/run/dbus/system_bus_socket
unix  0/0/4          /var/run/devd.pipe

At 512 . I never run into port/connection starvation. Nor are miscreants permitted to abuse those ports/connections in ways that also allow abuse of others (DNS deflection). Unless the services you're running are blocking on wanted traffic. There is no reason to bump that number. Further, if those connections are blocking on your wanted connections, because of UNwanted traffic. It's time to deal with that unwanted traffic.

--Chris

circus78 · Jul 25, 2018

Hi,
I confirm that I have many users ( > 10000) on this mailserver.
I also confirm that, after blocking a specific public network range through ipfw, it seems stable now.
I would like to improve my knowledge about fine tuning of certain parameters.
Can you suggest me a good starting point?
thank you very much

SirDice · Jul 25, 2018

circus78 said:
I confirm that I have many users ( > 10000) on this mailserver.

It's probably a good idea to spread some of that load onto multiple servers. And you don't want to have 10000 users depending on a single point of failure either. Heck, I don't want to have 10 users depending on a single point of failure.

Chris_H · Jul 25, 2018

circus78 said:
Hi,
I confirm that I have many users ( > 10000) on this mailserver.
I also confirm that, after blocking a specific public network range through ipfw, it seems stable now.
I would like to improve my knowledge about fine tuning of certain parameters.
Can you suggest me a good starting point?
thank you very much

Hello, circus78 !
Glad you were able to narrow it down. I would like to suggest, as I did above. That you'd do well to ease that kern.ipc.somaxconn number up, if you change it at all. Because it's reasonably low for a reason. Just bump it a small bit, at a time. Until you find the "sweet spot".
As to additional resources; I, myself, use pf(8). Because I find that I have more control, and it's a bit more powerful. The syntax is also simpler.

So I don't have a lot of experience(s) with ipfw(8). I might suggest a couple of resources (if you haven't tried them already). 1) The ipfw(8) entry in the FreeBSD documentation: IPFW. 2) You might like to join the FreeBSD IPFW mailing list: IPFW mailing list.
Others who use ipfw regularly might chime in with additional suggestions.

All the best to you, circus78 !

--Chris

VladiBG · Jul 25, 2018

Blocking a specific public range is not a good practice unless you know from where your clients will access the specific service. That's why it's better to use some log monitoring to dynamically block failed logins after N attempts this will reduce the overall number of failed connections. Also you may consider to separate your services on different machines. You can use dovecot dsync for replication between two machines.
What type of storage do you use for your mail? Is there's enough IOPS on it to serve such number of users when they access they mailboxes?

Regarding the type of the firewall: use the one that you are familiar with. Both pf(4) and ipfw(8) are good and will make the job done.

circus78 · Jul 25, 2018

VladiBG said:
Blocking a specific public range is not a good practice unless you know from where your clients will access the specific service.

Hi, yes, I am aware. It is a foreign (=not in my country) /24, so I think I will not stop legitimate users in such case.

That's why it's better to use some log monitoring to dynamically block failed logins after N attempts this will reduce the overall number of failed connections.

Yes, I am going to implement fail2ban, as suggested.

Also you may consider to separate your services on different machines. You can use dovecot dsync for replication between two machines.

dsync will "spread" load on different servers, or just replicate (for redundancy)?

What type of storage do you use for your mail? Is there's enough IOPS on it to serve such number of users when they access they mailboxes?

I use 15 SATA disk , with ZFS, with this layout:

Code:

# zpool status
  pool: mail
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mail        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada10   ONLINE       0     0     0
            ada11   ONLINE       0     0     0
            ada12   ONLINE       0     0     0
            ada13   ONLINE       0     0     0
        spares
          ada14     AVAIL

Regarding the type of the firewall: use the one that you are familiar with. Both pf(4) and ipfw(8) are good and will make the job done.

Thank you very much.

VladiBG · Jul 25, 2018

circus78 said:
dsync will "spread" load on different servers, or just replicate (for redundancy)?

it's a master/master program level replication between the servers for redundancy.
It's better to have a storage with clustered shared volume which can be accessed by several nodes and to have redundancy on every service so you can take down a server for upgrade without disturbing the users.

Solved Listen queue overflow

circus78

Chris_H

VladiBG

circus78

VladiBG

Chris_H

circus78

SirDice

Administrator

Chris_H

VladiBG

circus78

VladiBG

Attachments