Hello All,
I'm posting this after reading a number of posts both here and elsewhere. Unfortunately, I have not been able to catch the process in question as described in https://blog.tyk.nu/blog/fun-with-freebsd-listen-queue-overflow/, so I'm trying to figure out what happened based on the logs.
The server is FreeBSD 12.0-STABLE r345381, a virtual instance with 4 CPUs, 4 GB RAM, 4GB swap, single-disk ZFS, lightly/moderately trafficked mail server, Cyrus/Postfix/Amavisd/ClamAV/Spamassassin + Unbound, all ports recently updated.
These are the log entries in chronological order:
/var/log/amavisd.log:
/var/log/messages:
/var/log/local6.log (Cyrus):
/var/log/messages:
/var/log/maillog:
I also looked at message processing times by Amavisd/ClamAV/Spamassassin as reported in amavisd.log (TIMING in milliseconds, TIMINGs are usually under 1,000 ms, may be 3-4,000, but not 300,000 ms):
Am I correct to conclude that:
1:
2: Falls back on
3:
4: Once in a while a socket is freed and
5: FreeBSD tries to keep alive by killing
6: But starved of RAM and swap, sockets cannot be freed (?)
7: Cyrus gives up and writes ^@^@^@^@^@ in the log
8: Postfix cannot deliver to Cyrus for lack of available sockets
At this point, I can't stop/restart Cyrus, Postfix and other daemons; may be I could if I waited long enough, but unresponsive mail service irritates people... so I had to resort to hard-rebooting the server.
I have also increased
from the default 128 to 512, and it didn't help (what I describe here is the second occurrence of the same problem).
If this is a plausible explanation would increasing the number of
Hope to have your insightful comments...
I'm posting this after reading a number of posts both here and elsewhere. Unfortunately, I have not been able to catch the process in question as described in https://blog.tyk.nu/blog/fun-with-freebsd-listen-queue-overflow/, so I'm trying to figure out what happened based on the logs.
The server is FreeBSD 12.0-STABLE r345381, a virtual instance with 4 CPUs, 4 GB RAM, 4GB swap, single-disk ZFS, lightly/moderately trafficked mail server, Cyrus/Postfix/Amavisd/ClamAV/Spamassassin + Unbound, all ports recently updated.
These are the log entries in chronological order:
/var/log/amavisd.log:
Code:
Jun 3 09:04:06 [server] /usr/local/sbin/amavisd[41944]: (41944-05) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock
Jun 3 09:04:07 [server] /usr/local/sbin/amavisd[41944]: (41944-05) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #1
Jun 3 09:04:13 [server] /usr/local/sbin/amavisd[41944]: (41944-05) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #2
Jun 3 09:04:30 [server] /usr/local/sbin/amavisd[41829]: (41829-06) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock
Jun 3 09:04:31 [server] /usr/local/sbin/amavisd[41829]: (41829-06) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #1
Jun 3 09:04:37 [server] /usr/local/sbin/amavisd[41829]: (41829-06) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #2
[...]
Jun 3 10:13:25 [server] /usr/local/sbin/amavisd[41829]: (41829-16) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock
Jun 3 10:13:26 [server] /usr/local/sbin/amavisd[41829]: (41829-16) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #1
Jun 3 10:13:32 [server] /usr/local/sbin/amavisd[41829]: (41829-16) ClamAV-clamd: Connecting to socket /var/run/clamav/clamd.sock, retry #2
/var/log/messages:
Code:
Jun 3 09:08:05 [server] kernel: swap_pager_getswapspace(32): failed
Jun 3 09:08:05 [server] syslogd: last message repeated 103 times
Jun 3 09:08:05 [server] kernel: swap_pager_getswapspace(31): failed
Jun 3 09:08:05 [server] kernel: swap_pager_getswapspace(32): failed
Jun 3 09:08:05 [server] syslogd: last message repeated 3 times
Jun 3 09:08:05 [server] kernel: swap_pager_getswapspace(31): failed
Jun 3 09:08:05 [server] syslogd: last message repeated 1 times
[...]
Jun 3 09:09:03 [server] kernel: pid 42283 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:18:49 [server] kernel: pid 42392 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:20:10 [server] kernel: pid 42438 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:20:32 [server] kernel: pid 42443 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:33:43 [server] kernel: pid 42620 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:39:16 [server] kernel: pid 42707 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:40:02 [server] kernel: pid 42710 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:40:02 [server] kernel: pid 42711 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 09:57:49 [server] kernel: pid 42923 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 10:18:58 [server] kernel: pid 43117 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 10:20:24 [server] kernel: pid 43135 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 10:24:42 [server] kernel: pid 43158 (clamscan), jid 0, uid 110, was killed: out of swap space
Jun 3 10:36:02 [server] kernel: pid 43323 (clamscan), jid 0, uid 110, was killed: out of swap space
/var/log/local6.log (Cyrus):
Code:
Jun 3 10:14:44 [server] pop3[43092]: login: [client host] [ip.add.re.ss] [client] APOP User logged in SESSIONID=<[server]-43092-1559571284-1-7423282183343598449>
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
/var/log/messages:
Code:
Jun 3 10:26:18 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (1 occurrences)
Jun 3 10:27:20 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (22 occurrences)
Jun 3 10:28:45 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (8 occurrences)
Jun 3 10:30:03 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (12 occurrences)
Jun 3 10:31:03 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (28 occurrences)
Jun 3 10:32:03 [server] kernel: sonewconn: pcb 0xfffff800639f9988: Listen queue overflow: 49 already in queue awaiting acceptance (22 occurrences)
Jun 3 10:33:06 [server] kernel: sonewconn: pcb 0xfffff800639f95b8: Listen queue overflow: 49 already in queue awaiting acceptance (29 occurrences)
Jun 3 10:34:11 [server] kernel: sonewconn: pcb 0xfffff800639f95b8: Listen queue overflow: 49 already in queue awaiting acceptance (61 occurrences)
Jun 3 10:35:16 [server] kernel: sonewconn: pcb 0xfffff800639f95b8: Listen queue overflow: 49 already in queue awaiting acceptance (45 occurrences)
/var/log/maillog:
Code:
Jun 3 10:33:44 [server] postfix/pipe[43113]: warning: pipe_command_write: write time limit exceeded
Jun 3 10:35:59 [server] postfix/pipe[43142]: warning: pipe_command_write: write time limit exceeded
I also looked at message processing times by Amavisd/ClamAV/Spamassassin as reported in amavisd.log (TIMING in milliseconds, TIMINGs are usually under 1,000 ms, may be 3-4,000, but not 300,000 ms):
Code:
Jun 3 09:03:59 size: 16781, TIMING 290
Jun 3 09:04:02 size: 5544, TIMING 430
Jun 3 09:05:28 size: 38149, TIMING 3303
Jun 3 09:05:33 size: 22735, TIMING 4628
Jun 3 09:05:35 size: 22725, TIMING 1614
Jun 3 09:06:32 size: 73412, TIMING 2831
Jun 3 09:09:23 size: 40542, TIMING 317690
Jun 3 09:09:30 size: 8307, TIMING 299901
Jun 3 09:10:51 size: 17161, TIMING 329098
Jun 3 09:11:23 size: 18160, TIMING 302487
Jun 3 09:11:23 size: 75761, TIMING 308312
Jun 3 09:11:24 size: 22750, TIMING 709
Jun 3 09:11:42 size: 22356, TIMING 299189
Jun 3 09:11:42 size: 18059, TIMING 300968
Jun 3 09:11:56 size: 1405, TIMING 1230
Jun 3 09:12:16 size: 126818, TIMING 301309
Jun 3 09:13:13 size: 542117, TIMING 3533
Jun 3 09:14:24 size: 15877, TIMING 300708
Jun 3 09:14:28 size: 64391, TIMING 297870
Jun 3 09:15:59 size: 27882, TIMING 307546
Jun 3 09:16:34 size: 45929, TIMING 307134
Jun 3 09:16:35 size: 13796, TIMING 2308
Jun 3 09:17:27 size: 281934, TIMING 14667
Jun 3 09:19:12 size: 51672, TIMING 305754
Jun 3 09:19:29 size: 18476, TIMING 305202
Jun 3 09:19:48 size: 29764, TIMING 302856
Jun 3 09:19:49 size: 29767, TIMING 303809
Jun 3 09:19:52 size: 281896, TIMING 2427
Jun 3 09:19:53 size: 281929, TIMING 780
Jun 3 09:20:59 size: 30336, TIMING 340438
Jun 3 09:20:59 size: 10377, TIMING 296543
Jun 3 09:21:00 size: 281897, TIMING 945
Jun 3 09:21:01 size: 281940, TIMING 543
Jun 3 09:21:01 size: 281936, TIMING 253
Jun 3 09:22:11 size: 28265, TIMING 309174
Jun 3 09:22:48 size: 18476, TIMING 307903
Am I correct to conclude that:
1:
ClamAV-clamd
runs out of available Unix sockets2: Falls back on
clamscan
3:
clamscan
is much slower than ClamAV-clamd
but eats up all memory and swap4: Once in a while a socket is freed and
ClamAV-clamd
can connect, this would explain the wildly varying message processing TIMINGs5: FreeBSD tries to keep alive by killing
clamscan
6: But starved of RAM and swap, sockets cannot be freed (?)
7: Cyrus gives up and writes ^@^@^@^@^@ in the log
8: Postfix cannot deliver to Cyrus for lack of available sockets
At this point, I can't stop/restart Cyrus, Postfix and other daemons; may be I could if I waited long enough, but unresponsive mail service irritates people... so I had to resort to hard-rebooting the server.
I have also increased
Code:
kern.ipc.somaxconn
If this is a plausible explanation would increasing the number of
amavisd
servers? It is currently set to 8.Hope to have your insightful comments...