Listen queue overflow - finding culprit

spork · Aug 29, 2018

So there's some interesting thoughts in this blog post:

https://blog.tyk.nu/blog/fun-with-freebsd-listen-queue-overflow/

This person does take you through the process of tracking down which process is generating kernel messages like this:

Code:

sonewconn: pcb 0xfffff80036a761d0: Listen queue overflow: 76 already in queue awaiting acceptance (30 occurrences)
sonewconn: pcb 0xfffff80036a761d0: Listen queue overflow: 76 already in queue awaiting acceptance (10 occurrences)
sonewconn: pcb 0xfffff80339c7f910: Listen queue overflow: 151 already in queue awaiting acceptance (28 occurrences)
sonewconn: pcb 0xfffff80339c7f910: Listen queue overflow: 151 already in queue awaiting acceptance (172 occurrences)
sonewconn: pcb 0xfffff80036a761d0: Listen queue overflow: 76 already in queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffff80339c7f910: Listen queue overflow: 151 already in queue awaiting acceptance (49 occurrences)
sonewconn: pcb 0xfffff80339c7f910: Listen queue overflow: 151 already in queue awaiting acceptance (77 occurrences)

The "TL;DR" version of the above blog is that if you notice this after the process generating the errors has exited, you're basically out of luck:

This would have been a lot easier if FreeBSD logged the pid or socket info along with the error when it happens. I was fortunate that this was a long-running process so the pcb stayed the same. If this had been a short-lived process it would have been considerably more difficult to find it. Process accounting combined with logging the pid with the error would be preferable. Alternatively one could jimmy up something to keep an eye on /var/log/messages and run netstat to find the pcb immediately after the error happens.

Any thoughts on a good way to deal with this situation? The two places I run into this all involve short-running processes, so it's a mystery as to who the culprit is.

TitanHQ_jmatz · May 27, 2021

netstat -Lan

SirDice · May 27, 2021

This is a three year old post.

TitanHQ_jmatz said:
netstat -Lan

By the time you enter this command the process that caused the queue overflow doesn't exist anymore, thus won't show up. That's exactly the problem.

Don't have an answer for it either. Most of the time I can find the faulty process through some educated guesses and poking around. With a website that uses a database backend for example it's usually a problem with the database not supplying answers to the queries quick enough, this causes the web application to stall long enough for the connection queue to fill up and overflow.

Listen queue overflow - finding culprit

spork

TitanHQ_jmatz

SirDice

Administrator