Solved Listen queue overflow

Hi,
I am using 11.1-RELEASE-p11, with standard email services (postfix, dovecot, nginx).
It seems that I am experiencing very poor performance (especially with IMAP).
This is the errors I get:

Code:
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (225 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (3 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (44 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (94 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (153 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (36 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (110 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (47 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (85 occurrences)
sonewconn: pcb 0xfffff8024e42e570: Listen queue overflow: 151 already in queue awaiting acceptance (121 occurrences)

This is netstat -Lan output:

Code:
# netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address         
tcp4  0/0/100                          *.25                   
tcp6  0/0/100                          *.993                 
tcp4  123/0/100                        *.993                 
tcp6  0/0/100                          *.143                 
tcp4  18/0/100                         *.143                 
tcp6  0/0/100                          *.995                 
tcp4  0/0/100                          *.995                 
tcp6  0/0/100                          *.110                 
tcp4  0/0/100                          *.110                 
tcp4  0/0/4096                         *.443                 
tcp6  0/0/4096                         *.80                   
tcp4  0/0/4096                         *.80                   
tcp4  0/0/128                          *.22                   
tcp6  0/0/128                          *.22                   
tcp4  0/0/4096                         127.0.0.1.8891         
tcp4  0/0/5                            *.873                 
tcp6  0/0/5                            *.873                 
tcp4  0/0/128                          *.199


So, it seems I have too many connection on port 993/TCP, am I wrong?

I added this line on sysctl.conf:

kern.ipc.somaxconn=4096



Code:
kern.ipc.somaxconn=4096

Why is netstat showing 100 and not 4096?

Thank you
 
Hi,
Code:
# netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen                           Local Address        
tcp4  0/0/100                          *.25                  
tcp6  0/0/100                          *.993                
tcp4  123/0/100                        *.993                
tcp6  0/0/100                          *.143                
tcp4  18/0/100                         *.143                
tcp6  0/0/100                          *.995                
tcp4  0/0/100                          *.995                
tcp6  0/0/100                          *.110                
tcp4  0/0/100                          *.110                
tcp4  0/0/4096                         *.443                
tcp6  0/0/4096                         *.80                  
tcp4  0/0/4096                         *.80                  
tcp4  0/0/128                          *.22                  
tcp6  0/0/128                          *.22                  
tcp4  0/0/4096                         127.0.0.1.8891        
tcp4  0/0/5                            *.873                
tcp6  0/0/5                            *.873                
tcp4  0/0/128                          *.199


So, it seems I have too many connection on port 993/TCP, am I wrong?

I added this line on sysctl.conf:

kern.ipc.somaxconn=4096



Code:
kern.ipc.somaxconn=4096

Why is netstat showing 100 and not 4096?

Thank you
Because netstat(1) is showing you what's in use. Not you're committed max. :)

--Chris
 
By default dovecot limit is set to 100k connections. (1000 clients * 100 processes). When your dovecot start to respond slowly to the new connections and cannot accept any new connections the OS start to queue those new connections that you see in netstat. So instead of changing the queue using kern.ipc.soacceptqueue you should investigate why dovecot is processing the new connections slowly.

sysctl -d kern.ipc.somaxconn
 
VladiBG is correct. kern.ipc.somaxconn is your target. But unless you are running a torrent server, or are an ISP. There are more reasons to not run that number at 4096 , than to.
For example. I bumped kern.ipc.somaxconn from 128 to 512 because I service anywhere from 60-160 domains, and their associated services. Running today, at ~90 domains, with dovecot. The same netstat(1) output on that box reflects:
Code:
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address
tcp4  0/0/128        127.0.0.1.953
tcp4  0/0/10         127.0.0.1.53
tcp6  0/0/10         ::1.53
tcp4  0/0/10         24.113.41.81.53
tcp4  0/0/128        *.6000
tcp6  0/0/128        *.6000
tcp4  0/0/208        127.0.0.1.5432
tcp6  0/0/208        ::1.5432
tcp6  0/0/512        *.21
tcp4  0/0/512        *.21
tcp4  0/0/10         *.587
tcp6  0/0/10         *.25
tcp4  0/0/10         *.25
tcp4  0/0/511        *.443
tcp6  0/0/511        *.443
tcp4  0/0/511        *.80
tcp6  0/0/511        *.80
tcp6  0/0/100        *.993
tcp4  0/0/100        *.993
unix  0/0/5          /tmp/.esd-0/socket
unix  0/0/5          /tmp/gpg-pEb8bJ/S.gpg-agent.ssh
unix  0/0/5          /tmp/gpg-pggayY/S.gpg-agent
unix  0/0/128        /tmp/.ICE-unix/1707
unix  0/0/30         /tmp/dbus-KGbsa3P1Zq
unix  0/0/128        /tmp/.X11-unix/X0
unix  0/0/208        /tmp/.s.PGSQL.5432
unix  0/0/30         /var/run/hald/dbus-jQAOAjHjHe
unix  0/0/30         /var/run/hald/dbus-G0WqVLFesc
unix  0/0/50         /tmp/mysql.sock
unix  0/0/511        /var/run/dovecot/anvil-auth-penalty
unix  0/0/511        /var/run/dovecot/anvil
unix  0/0/100        /var/run/dovecot/auth-worker
unix  0/0/511        /var/run/dovecot/auth-master
unix  0/0/511        /var/run/dovecot/auth-userdb
unix  0/0/511        /var/run/dovecot/auth-client
unix  0/0/511        /var/run/dovecot/auth-login
unix  0/0/511        /var/run/dovecot/token-login/tokenlogin
unix  0/0/511        /var/run/dovecot/login/login
unix  0/0/511        /var/run/dovecot/config
unix  0/0/100        /var/run/dovecot/dict
unix  0/0/511        /var/run/dovecot/director-userdb
unix  0/0/511        /var/run/dovecot/director-admin
unix  0/0/100        /var/run/dovecot/dns-client
unix  0/0/100        /var/run/dovecot/doveadm-server
unix  0/0/100        /var/run/dovecot/imap-urlauth
unix  0/0/511        /var/run/dovecot/token-login/imap-urlauth
unix  0/0/511        /var/run/dovecot/imap-urlauth-worker
unix  0/0/511        /var/run/dovecot/login/imap
unix  0/0/511        /var/run/dovecot/indexer
unix  0/0/10         /var/run/dovecot/indexer-worker
unix  0/0/511        /var/run/dovecot/login/ipc-proxy
unix  0/0/511        /var/run/dovecot/ipc
unix  0/0/511        /var/run/dovecot/log-errors
unix  0/0/511        /var/run/dovecot/replication-notify
unix  0/0/511        /var/run/dovecot/replicator
unix  0/0/511        /var/run/dovecot/login/ssl-params
unix  0/0/511        /var/run/dovecot/ssl-params
unix  0/0/511        /var/run/dovecot/stats
unix  0/0/30         /var/run/dbus/system_bus_socket
unix  0/0/4          /var/run/devd.pipe
At 512 . I never run into port/connection starvation. Nor are miscreants permitted to abuse those ports/connections in ways that also allow abuse of others (DNS deflection). Unless the services you're running are blocking on wanted traffic. There is no reason to bump that number. Further, if those connections are blocking on your wanted connections, because of UNwanted traffic. It's time to deal with that unwanted traffic. :)

--Chris
 
Hi,
I confirm that I have many users ( > 10000) on this mailserver.
I also confirm that, after blocking a specific public network range through ipfw, it seems stable now.
I would like to improve my knowledge about fine tuning of certain parameters.
Can you suggest me a good starting point?
thank you very much
 
I confirm that I have many users ( > 10000) on this mailserver.
It's probably a good idea to spread some of that load onto multiple servers. And you don't want to have 10000 users depending on a single point of failure either. Heck, I don't want to have 10 users depending on a single point of failure.
 
Hi,
I confirm that I have many users ( > 10000) on this mailserver.
I also confirm that, after blocking a specific public network range through ipfw, it seems stable now.
I would like to improve my knowledge about fine tuning of certain parameters.
Can you suggest me a good starting point?
thank you very much
Hello, circus78 !
Glad you were able to narrow it down. I would like to suggest, as I did above. That you'd do well to ease that kern.ipc.somaxconn number up, if you change it at all. Because it's reasonably low for a reason. Just bump it a small bit, at a time. Until you find the "sweet spot".
As to additional resources; I, myself, use pf(8). Because I find that I have more control, and it's a bit more powerful. The syntax is also simpler. :) So I don't have a lot of experience(s) with ipfw(8). I might suggest a couple of resources (if you haven't tried them already). 1) The ipfw(8) entry in the FreeBSD documentation: IPFW. 2) You might like to join the FreeBSD IPFW mailing list: IPFW mailing list.
Others who use ipfw regularly might chime in with additional suggestions. :)

All the best to you, circus78 !

--Chris
 
Blocking a specific public range is not a good practice unless you know from where your clients will access the specific service. That's why it's better to use some log monitoring to dynamically block failed logins after N attempts this will reduce the overall number of failed connections. Also you may consider to separate your services on different machines. You can use dovecot dsync for replication between two machines.
What type of storage do you use for your mail? Is there's enough IOPS on it to serve such number of users when they access they mailboxes?

Regarding the type of the firewall: use the one that you are familiar with. Both pf(4) and ipfw(8) are good and will make the job done.
 
Blocking a specific public range is not a good practice unless you know from where your clients will access the specific service.

Hi, yes, I am aware. It is a foreign (=not in my country) /24, so I think I will not stop legitimate users in such case.


That's why it's better to use some log monitoring to dynamically block failed logins after N attempts this will reduce the overall number of failed connections.

Yes, I am going to implement fail2ban, as suggested.


Also you may consider to separate your services on different machines. You can use dovecot dsync for replication between two machines.

dsync will "spread" load on different servers, or just replicate (for redundancy)?

What type of storage do you use for your mail? Is there's enough IOPS on it to serve such number of users when they access they mailboxes?

I use 15 SATA disk , with ZFS, with this layout:

Code:
# zpool status
  pool: mail
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        mail        ONLINE       0     0     0
          raidz2-0  ONLINE       0     0     0
            ada0    ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0
            ada4    ONLINE       0     0     0
            ada5    ONLINE       0     0     0
            ada6    ONLINE       0     0     0
          raidz2-1  ONLINE       0     0     0
            ada7    ONLINE       0     0     0
            ada8    ONLINE       0     0     0
            ada9    ONLINE       0     0     0
            ada10   ONLINE       0     0     0
            ada11   ONLINE       0     0     0
            ada12   ONLINE       0     0     0
            ada13   ONLINE       0     0     0
        spares
          ada14     AVAIL




Regarding the type of the firewall: use the one that you are familiar with. Both pf(4) and ipfw(8) are good and will make the job done.

Thank you very much.
 
dsync will "spread" load on different servers, or just replicate (for redundancy)?

it's a master/master program level replication between the servers for redundancy.
It's better to have a storage with clustered shared volume which can be accessed by several nodes and to have redundancy on every service so you can take down a server for upgrade without disturbing the users.
 

Attachments

  • draw.png
    draw.png
    27.4 KB · Views: 1,531
Back
Top