TCP buffer in selecting from accepted TCP-connections

When developing in Linux,SELECT() behaved unexpectedly when there were a large number (tens of) accepted TCP-sockets. This was fixed by implementing a buffer constantly reading from the TCP-connection and writing to a UNIX-domain pipe and then selecting from the pipes instead. The error was a read error from read().

Is this kind of behaviour found in FreeBSD? Does anyone know why the Linux implementation could do this?

Should this be included as an operating system service - should there be internal TCP buffers as a service?
 
freebsd prefers kqueue over select. most event loop abstractions let you switch to this backend. writing direct event loop code is kinda bogus.
 
If it is the Linuxulator (OP, what's up?) it should still behave the same as the Linux kernel. Especially for simple system calls lie select and read.
 
If it is the Linuxulator ...
Unfortunately no, not Linuxulator. A true Linux as the only OS in hardware.

What would be the best approach to inspect kqueue(2) and select(2) functionality in kernel and userland to see if there is automated in-kernel buffering of incoming network packets? Is there any design philosophy to know why and why not.

(I posted this thread first in another forum).
 
See "sysctl net.inet.tcp.recvbuf_max" and "sysctl net.inet.tcp.recvbuf_auto". Anyway, if you are having buffering issue with "tens of" accepted connections on linux, that seems strange. You said, "SELECT() behaved unexpectedly". Spell out what exactly happens that is unexpected.
 
The case is actually old one. I hope I remember it correctly. There were multiple client processes fork(2)ed after the accept(2)-function. Each process were reading from a file descriptor from the accept(2). It was meant that there would be more than 1000 processes. Each client process would connect to a gateway -process using TCP where there was after a similar accept a select(2) between the client file descriptors from again a TCP -socket. The select did not choose from the socket file descriptors correctly and there were read errors.

The problem with select(2) was solved by removing the gateway -process distributed by the multiple TCP connections from the client processes and changing the TCP -connections to local UNIX-domain pipes. The actual problem solver to the additional read errors were to add a buffer process just to read in a buffer and writing a buffer from the TCP socket file descriptors to the UNIX domain pipe. This way there was continuously something reading from the TCP -socket, individually in all of the client processes interrupted by the OS. An event like behaviour.

It looked like at first that actually select(2) created the read errors. Later to debug more, a better result was get by buffering inside the program and most of the premature read errors disappeared.

The error from the select(2) was that the program just jammed, there were no messages going through, and the select did not find anything to read from the TCP -socket file descriptors. In some cases
select(2) found something to read (to check if there is anything to read) too often. Error if I remember it correctly appeared as an error from read(2). I've gone through the notes and did not find to add here which error it was from the read(2) (section ERRORS). For example EINVAL appears in multiple error explanations in FreeBSD man page. The notes did not have the error code from the read(2).
 
At least I'm thinking that if there is a TCP buffer size (as there is in Linux as well), these connected small programs have all an own connection. Is this the same TCP buffer or do all of the programs (TCP sockets) get the same size buffer?
 
The select did not choose from the socket file descriptors correctly and there were read errors.

The problem with select(2) was solved by removing the gateway -process distributed by the multiple TCP connections from the client processes and changing the TCP -connections to local UNIX-domain pipes. The actual problem solver to the additional read errors were to add a buffer process just to read in a buffer and writing a buffer from the TCP socket file descriptors to the UNIX domain pipe. This way there was continuously something reading from the TCP -socket, individually in all of the client processes interrupted by the OS. An event like behaviour.

It looked like at first that actually select(2) created the read errors. Later to debug more, a better result was get by buffering inside the program and most of the premature read errors disappeared.
Sounds like the problem was with the user code which was beaten into shape without really investigating the root cause. I was able handle 1000+ concurrent tcp connections on a 1995 era 100Mhz Pentium machine with 16MB of RAM without anything special but I didn't start 1000+ processes. It was all handled in one process. To deal with some other limits later on I changed the structure where a main process would hand-off open tcp connection file decriptors to a small number of io processes via sendmsg(2)/recvmsg(2) over a unix socket connection.
 
Back
Top