C Behavior of connect() with O_NONBLOCK on a Unix domain socket

zirias@ · Jun 28, 2020

For a service that can optionally listen on a local (Unix) socket, I want to implement detection of a "stale" socket, so it can startup without user intervention in this case by just unlinking the stale socket. As my service is designed around an event loop using pselect(2), I put all sockets into O_NONBLOCK mode, so my main thread is never blocked anywhere other than the central pselect(2) call.

In my current design, listening sockets are set up before starting the event loop. Now, to "probe" a local socket for being still active, I need to connect(2) to it. My observation on both Linux and FreeBSD is that, regardless of O_NONBLOCK mode, this connect(2) returns an error (other than EINPROGRESS) immediately if there is no listener on the local socket. This is perfect for my usecase, as I don't need any special handling -- if the connect(2) succeeds (with my logic that doesn't treat EINPROGRESS as an error, but just sets a "connecting" flag), I know the socket is still active.

What I'm asking here is the following: Is my understanding correct that, if there is no listener on a local socket, POSIX would allow connect(2) to set EINPROGRESS and report the error later via SO_ERROR? And if so, are there systems behaving that way?

kpedersen · Jun 28, 2020

Your observations are in line with mine in that it does return -1 and errno is a connection error (ECONNREFUSED).

After reading the open(2) manpage regarding O_NONBLOCK (I know it is a little different to connect(2) but it was the closest I could find!):

Note that this flag has no effect for regular files and block
devices; that is, I/O operations will (briefly) block when
device activity is required, regardless of whether O_NONBLOCK
is set. Since O_NONBLOCK semantics might eventually be
implemented, applications should not depend upon blocking
behavior when specifying this flag for regular files and block
devices.

So I would infer from this that since a domain socket is neither a regular file or block device (and it even suggests to not depend upon blocking behaviour in future) that it could return -1 and set EINPROGRESS even if there is no listener. However if someone more experienced says otherwise, then go with their input instead of mine

zirias@ · Jun 28, 2020

Thanks kpedersen , that's a great find, because it helped me to think about a scenario where EINPROGRESS would actually make sense when connecting to a local socket without listener

In a nutshell, two things have to be checked for connecting to a local socket: (a) is there a listener and (b) do we have access permission. Well, (b) is an I/O operation on the filesystem. So, if the implementation supports async semantics for filesystem I/O, and if (b) is checked before (a), getting EINPROGRESS makes sense.

Consequently, I guess the right thing to do for my probing for a stale local socket is to do this connect(2) in normal, blocking mode. This leaves me with the small doubt whether there could be a situation causing it to block "indefinitely". From what I understand so far, for connect(2) to succeed, it's enough that there is a listener (that still has room in its listen queue), the listener doesn't have to accept(2) the connection, it just won't be writable until it's accepted. Is this correct?

edit: looks like I'm at a dead end now, thinking about the case of a full listen queue. I remember experimenting with that case and on FreeBSD, a connect() returned with an error, while on Linux, a connect() just blocked waiting for room in that queue.

So, the right thing to do probably is using O_NONBLOCK and waiting with a (sane) timeout. Dammit...

edit2: this is what I came up with, which hopefully works as designed with any POSIX-compliant implementation

It sure looks "messy" and I can't really test without a system at hand, that may give EINPROGRESS for connecting to a local socket… Can't even test on Linux, because, although connect() will block there when the listen queue is full, it gives EAGAIN instead of EINPROGRESS in non-blocking mode, and the socket will be writable immediately without any error in SO_ERROR -- is such a strange behavior ok with POSIX?

kpedersen · Jun 28, 2020

I try to avoid blocking mode when at all possible, especially for things that I cannot use poll() or select() on prior to access to ensure that it wont block.

I even had a weird one where I used mkfifo(3) to create a special file which I then used open() on using normal blocking mode. It would block indefinately. I tried the same thing with non-blocking (and a 10 second timeout) and it would succeed immediately and allow me to write to it no problem. Very strange.

So yes, for your usage I would probably use non-blocking and keep trying for access up to around 5 seconds.

Unless perhaps you can use flock(2) and check that file lock. If it is OK, assume the socket is still good? I believe Xorg does that (it has the socket in /tmp/.X11-unix/X0 but then the lock file at /tmp/.X0-lock)

zirias@ · Jun 28, 2020

kpedersen said:
I try to avoid blocking mode when at all possible, especially for things that I cannot use poll() or select() on prior to access to ensure that it wont block.

On an only slightly related note, I found this hidden bomb in Linux' select manpage, section BUGS:

On Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks.

So, if your software should run reliably on Linux, well, probably *never* use blocking mode

kpedersen · Jun 28, 2020

Zirias said:
On an only slightly related note, I found this hidden bomb in Linux' select manpage, section BUGS:

Haha, what the hell? That does indeed seem like quite a bug. The poll(2) manpage seems to suggest that poll is affected too.

When I get time I will now have to revisit some of my past projects and change them. Thanks for showing me this (I think

).

mark_j · Jun 29, 2020

Zirias said:
On an only slightly related note, I found this hidden bomb in Linux' select manpage, section BUGS:

So, if your software should run reliably on Linux, well, probably *never* use blocking mode

Yes linux does some real stupid stuff when select() says here's data then retracts for whatever reason. Absolute madness. Fortunately that's not a bsd thing. As I understand it while listed as BUGS it should really be under a heading of SIDE EFFECTS....

zirias@ · Jun 29, 2020

mark_j said:
Fortunately that's not a bsd thing.

Hmmm. Given Linux dominance nowadays, if you want to write "portable" software, I guess you can't ignore a Linux bug, as outrageous and idiotic as it might be. Kind of reminds me of the situation with some other "dominant" OS...

But this is getting off-topic, sorry

Any thoughts about my fixed code? Is it safe, as I think it should be?

mark_j · Jun 29, 2020

Well, I only came in at the end of this discussion.
If Linux is a target audience, then yes, you've got to work around it.

If I'm understanding you correctly (regarding non-blocking):
Once your connect(2) call triggers an error, you'll now need to use pselect(2) call to wait on the socket.
If you do this, what happens? Is that file descriptor triggered/set?
If it is, then recv(2) the socket. It will return 0, EOF.

zirias@ · Jun 29, 2020

For me personally, it must run on both FreeBSD and Linux (it's a tool for tunneling unix socket connections through a TCP connection). As the usecase probably never needs a huge number of concurrent clients, I just use pselect(2) for building an event loop around. So, I try to keep the code fully portable

Asynchronous connect(2) (on an O_NONBLOCK socket) works perfectly, my doubt was only at startup when I try to check whether an existing Unix socket is still alive. Neither FreeBSD nor Linux ever give EINPROGRESS when connecting to a Unix socket, but immediately succeed, or fail if there is no listener. When there is a listener, but its listen queue is full, connect(2) on FreeBSD fails in both blocking and non-blocking mode. Linux does a strange thing in that case, the blocking connect(2) blocks indefinitely, and the non-blocking gives EAGAIN without a chance to get a result later, as select(2) immediately succeeds

-- so just doing my initial check in blocking mode isn't a reliable option.

As discussed above, EINPROGRESS is allowed by POSIX for Unix sockets as well, and it might make sense when filesystem I/O supports async semantics with O_NONBLOCK.Therefore I changed my code to deal with that possibility, but can't test it, as I don't know a system that would behave that way

Maybe I could fake the scenario using TCP...

mark_j · Jun 30, 2020

I am confused by your code and your description.

(Which is not at all surprising, I might add.

)

Why aren't you testing your file descriptor after the select(2)? That is, FD_ISSET().
Only if FD_ISSET is true for the file descriptor (read or write), should you test getsockopt(2).
That's what EINPROGRESS means.

You should also explicitly test for timeout on the select(2), which you're not doing.

Disclaimer: I didn't look at all your code. I am not certain of your socket listener's blocking status.

While it doesn't apply to you here, with the way your code is structured, note that POSIX allows timeout to be modified (Linux) or unmodified (*BSD) by select(2).

zirias@ · Jun 30, 2020

mark_j said:
Why aren't you testing your file descriptor after the select(2)? That is, FD_ISSET().

Because this select() only tests one single fd, and its return value is the number of fds that are ready, so FD_ISSET() would be redundant here.

mark_j said:
You should also explicitly test for timeout on the select(2), which you're not doing.

For my usecase, I don't need to distinguish timeout (returns 0) from an otherwise failed select() (-1).

mark_j said:
While it doesn't apply to you here, with the way your code is structured, note that POSIX allows timeout to be modified (Linux) or unmodified (*BSD) by select(2).

I'm aware of that. It's the only place in the code where I use the timeout argument at all, and it's only used once here

Anyways, thanks for having a look and trying to find something! Gives me a bit more confidence in a piece of code I can't really test

kpedersen · Feb 13, 2021

I recently found this document:

https://cr.yp.to/docs/connect.html

It lists a number of alternative solutions to testing for success in a non-blocking connect and also lists their pros and cons. The most interesting one is the second to last (getpeername). I never thought of this but it seems like a side-effect free version of the re-call connect() approach.

The problem with the re-call connect() approach for me is actually that I don't want to refactor my code to lug around the address info data required as parameters to the call. getpeername() solves that.

Currently I still use select() and check for writability, and then using SO_ERROR (The manpages suggest this approach). However apparently this would have issues with older machines.

zirias@ · Feb 13, 2021

kpedersen said:
I recently found this document:

https://cr.yp.to/docs/connect.html

Uhh, yes, this IS interesting although it addresses a problem I wasn't even aware of. I just assumed getting the error with getsockopt() would work fine. Is this a non-standard approach or just "not portable" because some arcane systems don't get it right?

kpedersen · Feb 13, 2021

Zirias said:
I just assumed getting the error with getsockopt() would work fine. Is this a non-standard approach or just "not portable" because some arcane systems don't get it right?

So with an approach similar to yours I have not noticed a problem in the platforms I typically test on:

- FreeBSD 10+
- OpenBSD 6+
- Debian 8+
- Solaris 10+

Certainly no crashes (like the web page suggested!). So whilst I am not really testing on ancient machines, I am now very interested in seeing what the solutions were back then (was it even possible to reliably and standardly test a successful non-blocking connection?). I had a play with the `getpeername` trick earlier and whilst it can tell me success, it can't actually tell me failure. So I have to guess that if the socket is select() writable and yet getpeername() fails, the socket was invalid, drop it and try others.

Apparently recv and MSG_PEEK is also a bad idea (I suppose I did actually have my suspicions). Annoyingly after a select() it provides a nice way of reporting what kind of state the connection is in.

So I am at a bit of a loss as to an official answer. Lots of things work and lots of literature do different things. Beej's guide also conveniently misses off a non-blocking connect example and a definitive answer.

zirias@ · Feb 13, 2021

Well, I learned a lot implementing this little tool

One thing is, you WANT async APIs for anything IO, but you don't always get them, e.g. if you want to allow getnameinfo() to do actual name lookups. So I ended up implementing my own little thread pool, just to simulate async with blocking calls. Having gone THAT far, you could also just offload connect() to a worker thread, with a little bit more overhead. But then, who says that POSIX threads don't have similar portability issues on similar arcane systems