TCP buffer in selecting from accepted TCP-connections

escape · Mar 10, 2026

When developing in Linux,SELECT() behaved unexpectedly when there were a large number (tens of) accepted TCP-sockets. This was fixed by implementing a buffer constantly reading from the TCP-connection and writing to a UNIX-domain pipe and then selecting from the pipes instead. The error was a read error from read().

Is this kind of behaviour found in FreeBSD? Does anyone know why the Linux implementation could do this?

Should this be included as an operating system service - should there be internal TCP buffers as a service?

cracauer@ · Mar 10, 2026

Are you talking about the Linuxulator?

atax1a · Mar 10, 2026

freebsd prefers kqueue over select. most event loop abstractions let you switch to this backend. writing direct event loop code is kinda bogus.

cy@ · Mar 10, 2026

atax1a said:
freebsd prefers kqueue over select. most event loop abstractions let you switch to this backend. writing direct event loop code is kinda bogus.

By the same token one should use epoll(2) on Linux.

cracauer@ · Mar 10, 2026

If it is the Linuxulator (OP, what's up?) it should still behave the same as the Linux kernel. Especially for simple system calls lie select and read.

escape · Mar 11, 2026

cracauer@ said:
If it is the Linuxulator ...

Unfortunately no, not Linuxulator. A true Linux as the only OS in hardware.

What would be the best approach to inspect kqueue(2) and select(2) functionality in kernel and userland to see if there is automated in-kernel buffering of incoming network packets? Is there any design philosophy to know why and why not.

(I posted this thread first in another forum).

bakul · Mar 11, 2026

See "sysctl net.inet.tcp.recvbuf_max" and "sysctl net.inet.tcp.recvbuf_auto". Anyway, if you are having buffering issue with "tens of" accepted connections on linux, that seems strange. You said, "SELECT() behaved unexpectedly". Spell out what exactly happens that is unexpected.

escape · Mar 11, 2026

The case is actually old one. I hope I remember it correctly. There were multiple client processes fork(2)ed after the accept(2)-function. Each process were reading from a file descriptor from the accept(2). It was meant that there would be more than 1000 processes. Each client process would connect to a gateway -process using TCP where there was after a similar accept a select(2) between the client file descriptors from again a TCP -socket. The select did not choose from the socket file descriptors correctly and there were read errors.

The problem with select(2) was solved by removing the gateway -process distributed by the multiple TCP connections from the client processes and changing the TCP -connections to local UNIX-domain pipes. The actual problem solver to the additional read errors were to add a buffer process just to read in a buffer and writing a buffer from the TCP socket file descriptors to the UNIX domain pipe. This way there was continuously something reading from the TCP -socket, individually in all of the client processes interrupted by the OS. An event like behaviour.

It looked like at first that actually select(2) created the read errors. Later to debug more, a better result was get by buffering inside the program and most of the premature read errors disappeared.

The error from the select(2) was that the program just jammed, there were no messages going through, and the select did not find anything to read from the TCP -socket file descriptors. In some cases
select(2) found something to read (to check if there is anything to read) too often. Error if I remember it correctly appeared as an error from read(2). I've gone through the notes and did not find to add here which error it was from the read(2) (section ERRORS). For example EINVAL appears in multiple error explanations in FreeBSD man page. The notes did not have the error code from the read(2).

escape · Mar 13, 2026

At least I'm thinking that if there is a TCP buffer size (as there is in Linux as well), these connected small programs have all an own connection. Is this the same TCP buffer or do all of the programs (TCP sockets) get the same size buffer?

bakul · Mar 13, 2026

See setsockopt(2). System wide limits can be set through sysctls kern.ipc.maxsockbuf, net.inet.tcp.sendbuf_max and net.inet.tcp.recvbuf_max.

bakul · Mar 13, 2026

escape said:
The select did not choose from the socket file descriptors correctly and there were read errors.

The problem with select(2) was solved by removing the gateway -process distributed by the multiple TCP connections from the client processes and changing the TCP -connections to local UNIX-domain pipes. The actual problem solver to the additional read errors were to add a buffer process just to read in a buffer and writing a buffer from the TCP socket file descriptors to the UNIX domain pipe. This way there was continuously something reading from the TCP -socket, individually in all of the client processes interrupted by the OS. An event like behaviour.

It looked like at first that actually select(2) created the read errors. Later to debug more, a better result was get by buffering inside the program and most of the premature read errors disappeared.

Sounds like the problem was with the user code which was beaten into shape without really investigating the root cause. I was able handle 1000+ concurrent tcp connections on a 1995 era 100Mhz Pentium machine with 16MB of RAM without anything special but I didn't start 1000+ processes. It was all handled in one process. To deal with some other limits later on I changed the structure where a main process would hand-off open tcp connection file decriptors to a small number of io processes via sendmsg(2)/recvmsg(2) over a unix socket connection.

atax1a · Mar 13, 2026

this seems like a job for dtrace, have you given that a shot?

escape · Mar 21, 2026

Thank you for the extensive testing. You had gone through some kind of work to test this. I'm just evaluating if my case was important enough and feel thankful. What can I say. Is this important enough?

bakul said:
... It was all handled in one process. ...

To me it looks like that you had used a single process reading and writing. In my case there were multiple processes reading and writing. Writes had a lock to the same file descriptor to write complete messages in a row. All of the processes were reading individually.

When using a single process, the performance is slower. A good feature for the programmer is when using a single process that the program becomes rude to the requests. It is not always reading when needed because it is serving the other connections at the same time. It of course looks better because "it works" and the fault appears to be elsewhere. The truth is that this is a bottle neck. I had to add multiple tens of microseconds in between the reads and writes to accomplish to read anything. To me my program was faster because it worked faster than the one to be tested. It is not clear what was wrong. The Linux select() behaved in undetermined way because the latched file descriptors did not always have anything to read and there were read errors when using multiple processes.

The dtrace was not usable in the original case. I'm not sure if strace was used. Maby information how it could be used was missing. (To me it looks like that the new one, the test case did not yet have the same problem - if dtrace could be used) I don't think that the test setting reproduced the problem.

bakul · Mar 21, 2026

escape said:
To me it looks like that you had used a single process reading and writing. In my case there were multiple processes reading and writing. Writes had a lock to the same file descriptor to write complete messages in a row. All of the processes were reading individually.

Having multiple processes write to the same tcp connection using locks is not a good idea in general. But hard to recommend an arch. without knowing the details.

escape · Mar 24, 2026

Would it help if the operating system would buffer the incoming TCP-sockets of the individual processes automatically? In this case the program did this and it helped in the problem.