Solved A tiny socket-to-TLS tunnel ...

zirias@

Developer
Edit 2023-05-02: Port is now created: security/tlsc
---

So, once again, I wrote a tiny piece of software for my own needs. It's a little daemon that can listen on (plain, unencrypted) sockets and "forward" clients to some TLS-enabled service.

Background is that I was fed up with security/stunnel. That's because I use libressl, and the stunnel author seems to really hate it (well, search the web for that). I maintained patches locally for years now, making it work with libressl nevertheless. And ever so often, some update breaks them. Therefore, I've finally had enough and implemented something myself for my own specific usecase.

I called it tlsc, the "TLS connect daemon". See: https://github.com/Zirias/tlsc

It only implements a fraction of what stunnel can do. It doesn't offer any TLS-related options but just uses whatever the SSL library defaults to. All I added was the ability to use a client certificate (because I already had that code from another project where I needed it). It doesn't implement one key feature of stunnel: offering a TLS-enabled service itself, backed by a plain unencrypted service. And I have no plans to add that because I don't need it myself.

I did already create a little port including an rc-script, also for my own needs.

Now simple question: Does anyone have a use case for such a simple tool? IOW, should I add it to the official tree?
 
Last edited:
Thanks for the likes ... does that mean you think it could be useful?

Meanwhile, seems the very rapid development seduced me to also rush the v1.0 tag 🙈. So, v1.1 coming soon, so far mainly including this commit, but this time, I'll give it one or two days 😉
 
stunnel is fairly annoying in that it has a weird config system (I run it from inetd and it is even worse). Before I put in the investment for stunnel, your tlsc would have been a better choice.

My thoughts are that I think adding it to the official tree would be useful. However unfortunately I feel that visibility will be an issue. People will google their issue and probably be redirected to stunnel anyway.

I almost wonder if it is worthwhile contacting individual projects that only implement plain sockets (small web servers, proxies, chat servers/clients) and let them know tlsc exists and see if they want to bundle your useful software as part of their solution?
 
I almost wonder if it is worthwhile contacting individual projects that only implement plain sockets (small web servers, proxies, chat servers/clients) and let them know tlsc exists and see if they want to bundle your useful software as part of their solution?
Probably not, because as I said, I didn't implement the "server side". Stunnel can do both directions. As I only needed it to connect some very old client to a TLS-enabled service, I only implemented that direction.

Also, I guess to do server side, some inner redesign would be necessary. It currently uses select() for its event loop, which doesn't scale to a huge number of connections (but is perfectly fine for my use case, and, simple ...) 😉
 
Yep, I definitely rushed the v1.0 :(

Currently busy fixing bugs, some obvious, some more subtle ... and I ran into a situation where service tlsc stop didn't work as expected: The process indeed stopped working, but didn't terminate 🙈. Maybe that's already fixed, at least I couldn't reproduce it on my dev workstation no matter what I tried.

All in all, only very tiny code changes happening now, I guess that's the typical 80:20 rule. Took a day to actually make it work (with LOTS of code), and now already another day and a half hunting bugs with very little changes needed.

I will just let it do its job here for a while (2 or 3 days maybe?), see if it still terminates correctly, if all is fine now create a v1.1 tag and then maybe commit the port for those who could use such a super simple and small TLS "tunnel" 😉
 
Progress has been made. Definitely fixed the issue with SIGTERM, and any other issue I could find. Tried to break it with lots of simulated network errors, still running :cool:. So yes, give it maybe 2 more days, then this will be v1.1.

Meanwhile, if anyone wants to test it, here's my current port: https://github.com/Zirias/zfbsd-ports/commit/11f95e2.patch(*) https://github.com/Zirias/zfbsd-ports/commit/ddcd06e.patch(**)
You can easily apply it to your ports tree (best use a local branch) like that:
fetch -o- https://github.com/Zirias/zfbsd-ports/commit/ddcd06e.patch | git am
Note this link will go dead once I replace it with the port for v1.1.

I currently use it with this in /etc/rc.conf:
Code:
tlsc_enable="YES"
tlsc_tunnels="localhost:8563:news.eternal-september.org:563"
tlsc_user="nobody"

(*) [edit] yet another stupid bug fixed, it failed to correctly fall back to IP addresses when name resolving failed or was disabled 🙈
(**) [edit2] all links gone by now, see below
 
Last edited:
Well then, v1.1 is tagged and released:
Code:
Fixes and improvements:
* Avoid possible crash when connecting to remote host fails
  synchronously
* Avoid losing data a client might send before the forwarded connection
  is up
* Fix a rare race condition causing the daemon to hang on shutdown
  because of inconsistent thread state
* Fix possible race condition resolving a remote name when connection is
  already closed, cancel the resolving job in that case
* Fix logging configuration, use async syslog when daemonizing, sync
  stderr output otherwise
* Several smaller code cleanups
* Improve logging, INFO level now logs established and closed tunnels
Ok, quite a lot of little things 😉 For me, it's now running rock stable for a few days, no matter what was happening.

Still not sure whether I should really add the port to the tree. For now, I placed it somewhere a bit more durable: https://people.freebsd.org/~zirias/patches/0001-security-tlsc-Add-new-port.patch (edit: this is now replaced by a preview of v1.2, see below)

So, to apply it to your ports branch, just use
fetch -o- https://people.freebsd.org/~zirias/patches/0001-security-tlsc-Add-new-port.patch | git am
 
Last edited:
I will hold back adding the port for a while :(

Today, I started working on some further improvements for a possible v1.2, first project was to provide completely asynchronous creation of "backend" connections, which includes name resolving. While doing that, I ran valgrind's "helgrind" tool on it, just to see whether there are any threading issues hiding, and got an unpleasant surprise: It complained about stuff getaddrinfo(3) and getnameinfo(3) are doing :eek:

I could work around that by adding another lock around these functions, but I'm pretty sure, according to POSIX, they should be thread-safe.

So I created a minimal example and sure enough could reproduce the problem. Therefore, I now asked on freebsd-hackers@:

If anyone knows more about this issue, please share (preferably on the -hackers ML, but here in the forums would be fine as well), thanks!
 
For network servers I tend to do the first implementation without threads. Instead via non-blocking sockets.

Then I generally keep with non-blocking sockets for the main bulk but then offload established connections to i.e a thread pool. In particular getaddrinfo/getnameinfo should not need to be called in a thread. Even when threads are introduced, poll/select will be used rather than any blocking or thread-unsafe calls.

Yes, not the fastest (and I would prefer short-lived threads) but I dislike the idea of anything blocking in a thread because (as you have seen) there are a lot of complex rules around the thread safety of functions; especially on janky older platforms.

I have a recent project here which is potentially quite similar to yours (albeit websocket nonsense rather than pure passthrough). This is completely "unthreaded" but might give some hints on non-blocking sockets with SSL which is awkward in some areas. The non-blocking connect in Client.c might be most relevant? And maybe the non-blocking SSL send/recv in SslSocket.c.
 
kpedersen, this whole project works non-blocking and all networking stuff is single-threaded, with an event loop built around pselect(). Unfortunately, not every API offers async/non-blocking modes, like, for example, getaddrinfo(3). 😉

What you do in that case is "simulate" async operation via threads. Otherwise, it would stall everything when there's a need to resolve some name.

What remains is that these functions should be thread-safe, but now it looks they aren't. Using some locks is an acceptable workaround, but not ideal.
 
What you do in that case is "simulate" async operation via threads. Otherwise, it would stall everything when there's a need to resolve some name.
Right. I see.

Yes, getaddrinfo is... weak. I remember back in the day using libresolv as a replacement. Would that be an option? I can't even recall if that was a 3rd party library or something only on Solaris.

Closest in FreeBSD base I could find is resolver(3). I have not used it; unsure if it is even achieving the same goal.
 
Yes, getaddrinfo is... weak. I remember back in the day using libresolv as a replacement. Would that be an option? I can't even recall if that was a 3rd party library or something only on Solaris.
No, I really want to stick to portable (POSIX) APIs (plus OpenSSL although I link with LibreSSL, which is one horrific beast of API but obviously unavoidable for the purpose 🙈)

Now reading again, this also caught my eye:
In particular getaddrinfo/getnameinfo should not need to be called in a thread.
I agree with most of your response and actually use exactly this design (non-blocking/async wherever possible, pselect()-eventloop plus a threadpool that integrates in that loop via self-pipes and is used to offload stuff that would block), but this sentence, I have to say: Quite the opposite! If you can get away with resolving names only on startup, that's fine. But if you need these functions for resolving "mid-flight", they must go to a job executed on a pool thread to avoid stalling your main event loop ... 😳
 
So I created a minimal example and sure enough could reproduce the problem. Therefore, I now asked on freebsd-hackers@:
Replying here so that Paul Floyd could chime in.

It seems to be about h_errno which is properly implemented in thread storage, and h_errno is actually accessible via function (same as errno these days), so it looks like a false-positive to me.
 
Thanks yuripv79! Well, I get lots of reports, many seem to be about h_errno, but there are others as well, e.g.

Code:
==64550== Possible data race during write of size 2 at 0x4A76478 by thread #5
==64550== Locks held: 1, at address 0x4A76500
==64550==    at 0x49A44D8: ??? (in /lib/libc.so.7)
==64550==    by 0x49A6D4B: ??? (in /lib/libc.so.7)
==64550==    by 0x49A02CC: fgets (in /lib/libc.so.7)
==64550==    by 0x4998810: __res_vinit (in /lib/libc.so.7)
==64550==    by 0x49683E7: ??? (in /lib/libc.so.7)
==64550==    by 0x4978E1C: nsdispatch (in /lib/libc.so.7)
==64550==    by 0x496694C: ??? (in /lib/libc.so.7)
==64550==    by 0x49664EA: getaddrinfo (in /lib/libc.so.7)
==64550==    by 0x201A2E: resolve (resolvtest.c:23)
==64550==    by 0x485E066: ??? (in /usr/local/libexec/valgrind/vgpreload_helgrind-amd64-freebsd.so)
==64550==    by 0x4871A79: ??? (in /lib/libthr.so.3)
==64550==
==64550== This conflicts with a previous write of size 1 by thread #4
==64550== Locks held: 1, at address 0x4A76500
==64550==    at 0x499F952: ??? (in /lib/libc.so.7)
==64550==    by 0x499F9EC: fclose (in /lib/libc.so.7)
==64550==    by 0x49991E9: __res_vinit (in /lib/libc.so.7)
==64550==    by 0x49683E7: ??? (in /lib/libc.so.7)
==64550==    by 0x4978E1C: nsdispatch (in /lib/libc.so.7)
==64550==    by 0x496694C: ??? (in /lib/libc.so.7)
==64550==    by 0x49664EA: getaddrinfo (in /lib/libc.so.7)
==64550==    by 0x201A2E: resolve (resolvtest.c:23)
==64550==  Address 0x4a76478 is in the BSS segment of /lib/libc.so.7

They're not always the same either, guess it depends on what the local name service cache currently has? 😳

Maybe I should add this info to the mailing list as well, but hey, I posted some small example code for people to reproduce 😉
 
Thanks yuripv79! Well, I get lots of reports, many seem to be about h_errno, but there are others as well, e.g.

Ding!

I already replied on freebsd-hackers.

Helgrind (and DRD) need to suppress things like this since they do not understand the libc and libthr internal locks.

I've pushed a fix to the Valgrind repo. You can either build it from source or wait a week or so when Valgrind 3.21 gets released and the devel/valgrind port gets bumped to match.
 
but this sentence, I have to say: Quite the opposite! If you can get away with resolving names only on startup, that's fine. But if you need these functions for resolving "mid-flight", they must go to a job executed on a pool thread to avoid stalling your main event loop ... 😳
Yeah, you are right; you may not have a choice other than to try to thread around that. That said, would the resolved name change during the duration of your daemon? Presumably the tunnel target is specified once at startup and doesn't change throughout. Obviously if the DNS changes outside of your control, that is an issue so a single thread could deal with that, store the resolved info and not have individual threads do it separately.


For a non-posix solution there are async versions here (i.e uv_getaddrinfo_cb):

But I agree, POSIX is the better approach.
 
I will soon release a v1.2 which fixes/improves one more thing: It makes sure name lookups for "backend connections" also happen asynchronously, so no established connection would ever be stalled by that. Now that I know the POSIX functions for lookups are indeed thread-safe on FreeBSD (as POSIX requires), it seems all fine :cool: (and just in case some other OS still violates that, I don't really care ...)

When this is done, I'll consider adding the port indeed, after all, it's still a pretty small daemon doing just one thing in a simple way, so could be useful to others 😉

That said, would the resolved name change during the duration of your daemon? Presumably the tunnel target is specified once at startup and doesn't change throughout. Obviously if the DNS changes outside of your control, that is an issue so a single thread could deal with that, store the resolved info and not have individual threads do it separately.
Actually, you guessed right. The daemon should be able to run for any time, so DNS changes might happen and this shouldn't require restarting it. I think just doing the lookup every single time is by far the simplest solution (the OS already caches names). Adding your own "cache" (IOW, a mechanism to periodically resolve again) would just add unnecessary complexity. Plus, some services use DNS to distribute clients to many instances, often delivering more than one response to a DNS query. Confronted with that, you want to make sure not to try the same instance immediately again if it failed. For that purpose, I added a little "blacklist" in my connection logic. Would probably be nicer if it would be configurable, I'll maybe add this in a future release.

Btw, kpedersen, as you obviously like the same simple style to implement a networking service, maybe you'd be interested in the abstractions I'm using for that, all available in my github repo 😉 They include "classes" for server, client, connection, events, all using a generic pselect()-based service loop, a threadpool, a simple logging abstraction and generic daemonizing code. All this stuff gets fixes and new features with any service I develop... but it already enabled me to implement the main functionality of tlsc in just one day. Of course, then finding and fixing all the little issues began which took a lot longer 🙈
 
Btw, kpedersen, as you obviously like the same simple style to implement a networking service, maybe you'd be interested in the abstractions I'm using for that, all available in my github repo 😉
I did grab some time to look through it today. I do approve of the design; interestingly there are very many similarities in architecture between our servers (albeit mine is cluttered with libstent "cruft"). Even down to the very naming of the abstractions.

You do handle some of the events / handling differently to mine, yours is more C-centric (i.e Event structure) and in many ways also a little more extensible. One difference is I wrap the SSL stuff more into "classes". I simply can't deal with it directly. Its gross!

It is also good to see usage of goto where it is appropriate rather than fudging around with duplicating code just to blindly avoid the keyword. I see that too often, even in C which is bizarre.

In your code here:
https://github.com/Zirias/tlsc/blob/master/src/bin/tlsc/daemon.c#L183

I was surprised to see so many signals ignored. However in particular I am surprised to *not* see SIGPIPE ignored. This was the problematic one where SSL doesn't allow you to pass MSG_NOSIGNAL into the send() (or i.e SSL_write) so you either have to create a custom Bio (no thanks, I'm lazy) or you have to ignore it globally (affecting all threads in process). You can see I took the latter approach (fairly flippantly) here.

Finally, for the threadpool, would a semaphore not simplify the code vs mutex lock/unlock/cond_wait? Is there an advantage to using the latter?
 
Thanks for your comments!

It is also good to see usage of goto where it is appropriate rather than fudging around with duplicating code just to blindly avoid the keyword. I see that too often, even in C which is bizarre.
Quick remark on that, I think it's all about a huge misunderstanding of Dijkstra's famous paper. The "problem" with goto is that it doesn't have any intrinsic semantics as a control structure and is much "too flexible", so can be (ab)used to create a horrible mess of "spaghetti code". And that's what was criticized there. He didn't request to ban goto altogether, although the title was a bit provocative. If, in your language, goto is the cleanest and most readable way to solve a specific problem, then, by all means, use it. 😉

In your code here:
https://github.com/Zirias/tlsc/blob/master/src/bin/tlsc/daemon.c#L183

I was surprised to see so many signals ignored.
To explain that first: This is the generic daemonizing code, and the first three of these signals are those typically used by daemons (and SIGUSR1 is what my threadpool uses internally to signal a cancellation request). So, the idea here is, just ignore all these signals during daemonization so they won't mess anything when arriving during startup. Instead, expect the daemonized code to install proper handlers for the signals it will use during its startup code (which e.g. happens in my service.c).

However in particular I am surprised to *not* see SIGPIPE ignored. This was the problematic one where SSL doesn't allow you to pass MSG_NOSIGNAL into the send() (or i.e SSL_write) so you either have to create a custom Bio (no thanks, I'm lazy) or you have to ignore it globally (affecting all threads in process). You can see I took the latter approach (fairly flippantly) here.
This now confuses me. Well, first, thanks for the hint, I should probably add SIGPIPE to the list of signals ignored during daemonization, for daemons dealing with ("external") pipes!

What I don't understand here, how would you ever receive a SIGPIPE in some networking daemon just using non-blocking sockets (with optionally SSL_read_ex() and SSL_write_ex() operating on them)? I thought you'd get this signal on the writing end of a pipe when the reader is gone. And although I have some pipes, I always control both ends of them. So, what am I missing here? :-/

Finally, for the threadpool, would a semaphore not simplify the code vs mutex lock/unlock/cond_wait? Is there an advantage to using the latter?
TBH, no idea. This code is already a bit older. Back then, synchronizing with condvars just worked 😉
 
Ah yes your daemon, actually daemonizes. I completely overlooked that ;)

What I don't understand here, how would you ever receive a SIGPIPE in some networking daemon just using non-blocking sockets.
I believe here.

[EPIPE] An attempt is made to write to a pipe or FIFO that is not open for reading by any process, or that only has one end open. A SIGPIPE signal shall also be sent to the thread.

So if you pselect for writability, it suggests yes but between now and calling send() (via SSL's SSL_write()), the pipe is broken, even if it is non-blocking, the call will emit this signal, it won't just return -1 with EAGAIN/EWOULDBLOCK in errno slippage.

Edit: Slightly difficult to find external sources. One here that pretty much says the same.

I can often test this by sending across a massive buffer of data to i.e netcat and then closing netcat during the sending. The server does occasionally trigger that signal and wipes out unless it is ignored. I most find this with websocket parts of the server rather than http because it deals with continual streams (i.e of pixel data).
 
Yes, that's the part I knew about, it only talks about actual pipes (and FIFOs) as I expected :-/

Edit: Slightly difficult to find external sources. One here that pretty much says the same.
But this is pretty insightful! So, in some other places, sockets are mentioned as well. Looks a bit inconsistent 🙈

I'm almost sure with my current design, it's very unlikely I run in this situation, "selecting" all sockets for reading will tell my daemon quickly when the remote end closed the connection (and my "Connection" class reacts upon that by emitting a "closed" event and then delete itself). So, that's the reason I never noticed.

Anyways, I'll add SIGPIPE now to the signals ignored by default in my daemonizing code, it just makes sense: A daemon should either ignore or specifically handle it. Thanks for the hint!
 
Looks a bit inconsistent 🙈
Inconsistency is the only way I can know if I am still working with a computer ;)

It seems a fair few guys run into this searching for "SSL_write SIGPIPE". For example this.

But I wonder if some of this conflicting info arises because some network servers are written using i.e inetd where the socket is really a pipe.

All I know is if the "pipe" closes during a blocking or non-blocking send, this signal might be emitted. I just want to avoid that!
 
But I wonder if some of this conflicting info arises because some network servers are written using i.e inetd where the socket is really a pipe.
Sounds kind of plausible. Or maybe those parts of the specs were written before sockets were added and just forgotten to update? :-/

All I know is if the "pipe" closes during a blocking or non-blocking send, this signal might be emitted. I just want to avoid that!
It certainly makes sense, I now committed a fix. If you don't want to explicitly handle it, ignore it. Maybe (speculating...) the signal is there so you can learn about errors that might occur after the non-blocking write call already returned? But if so, my daemon doesn't need to know ... on a closed connection, I'm not interested in incomplete writes 😉

Meanwhile, v1.2 is getting close. I reworked the command-line, so everything that IMHO might make sense can be configured in a tunspec, looking like this now:
Code:
Usage: tlsc [-fnv] [-b hits] [-g group] [-p pidfile] [-u user]
       tunspec [tunspec ...]

        tunspec        description of a tunnel in the format
                       host:port:remotehost[:remoteport][:k=v[:...]]
                       using these values:

                host        hostname or IP address to bind to and listen
                port        port to listen on
                remotehost  remote host name to forward to with TLS
                remoteport  port of remote service, default: same as `port'
                k=v         key-value pair of additional tunnel options,
                            the following are available:
                  b=hits    a positive number enables blacklisting
                            specific socket addresses for `hits'
                            connection attempts after failure to connect
                  c=cert    `cert' is used as a client certificate file to
                            present to the remote
                  k=key     `key' is the key file for the certificate
                  p=[4|6]   only use IPv4 or IPv6
                  pc=[4|6]  only use IPv4 or IPv6 when connecting as client
                  ps=[4|6]  only use IPv4 or IPv6 when listening as server

                       Example:

                       "localhost:12345:foo.example:443:b=2:pc=6"

                       This will listen on localhost:12345 using any
                       IP version available, and connect clients to
                       foo.example:443 with TLS using only IPv6.
                       Specific socket addresses of foo.example:443
                       will be blacklisted for 2 hits after a
                       connection error.

        -f             run in foreground, do not detach
        -g group       group name/id to run as
                       (defaults to primary group of user, see -u)
        -n             use numeric hosts only, do not attempt
                       to resolve addresses
        -p pidfile     use `pidfile' instead of /var/run/tlsc.pid
        -u user        user name/id to run as
                       (defaults to current user)
        -v             debug mode - will log [DEBUG] messages
This kind of made my command-line parser code "explode" 🙈 but makes sense to expose all these possibilities to the user before actually adding a port.

I'll give that code another few days of testing here and then probably really add a port. If you want to help testing, I uploaded a new patch adding a "preview port" of this here: https://people.freebsd.org/~zirias/patches/0001-security-tlsc-Add-new-port.patch (yes, overwriting my previous patch...)
 
Back
Top