Solved A tiny socket-to-TLS tunnel ...

Paul Floyd · May 2, 2023

Paul Floyd said:
I've pushed a fix to the Valgrind repo. You can either build it from source or wait a week or so when Valgrind 3.21 gets released and the devel/valgrind port gets bumped to match.

Valgrind 3.21 has been released, and the devel/valgrind port has been updated to match. This fix is included.

zirias@ · May 9, 2023

Upcoming: tlsc v1.3

Really interesting what kind of bugs you still find after quite a while of using the software yourself ?

Port update is already prepared, and will probably add a dedicated service account. For background, see also this discussion:

Unprivileged default user for "tiny" daemons?

Jose · May 9, 2023

zirias@ said:
Upcoming: tlsc v1.3
Really interesting what kind of bugs you still find after quite a while of using the software yourself ?

Port update is already prepared, and will probably add a dedicated service account. For background, see also this discussion:

Unprivileged default user for "tiny" daemons?

Dog food! Ain't it yummy? (I been where you at.)

zirias@ · May 9, 2023

Jose said:
Dog food! Ain't it yummy? (I been where you at.)

Ah well, haha ... I mean, I wrote this tool exactly for a need I had myself, so it's not too surprising I'm using it now

But there's still some truth in that, as I took a few extra steps to make it more "generally" useful, most importantly exposing all configurable stuff on the commandline. And that's indeed where one of the bugs now fixed was hiding. ?

zirias@ · May 11, 2023

And it's done: https://cgit.freebsd.org/ports/commit/?id=5fa431d4ebd345765d2ce8ca3e6fbc118293c5f8

I guess this will be the final v1.x version, at least I really don't know where to look for bugs now.

I did sneak in yet another commit (guess as long as I'm the only one both working on the main branch and packaging that thing, I can get away with it ?)

This commit fixes a somewhat obscure problem, to trigger it, I had to hardcode a pretty small buffer for my connection "objects". But still, it could happen, although very unlikely as soon as the buffer is at least the max size of some TLS frame. The issue here was: When using sockets directly, you don't have to do anything after reading a full buffer worth of data, because the next select() call will inform you if there's still more to read. When using OpenSSL or compatible replacements though, the library might have already read everything from the socket, but keep the rest of its decrypted stuff in its own buffer. You have to actively check whether there's more. Oh, wow, at least I'm learning more things ?

kpedersen · May 11, 2023

zirias@ said:
When using OpenSSL or compatible replacements though, the library might have already read everything from the socket, but keep the rest of its decrypted stuff in its own buffer. You have to actively check whether there's more. Oh, wow, at least I'm learning more things ?

Indeed. It is like this part I find fiddly with SSL:

https://github.com/Zirias/tlsc/blob/master/src/bin/tlsc/connection.c#L357

An incomplete write *may* be due to a read being required due to the encryption protocol stuff below. And an incomplete read can also be due to a write being required, so this also needs to be handled.

So like you have done, you flag it to later be registered for a read/write select/poll later on.

zirias@ · May 11, 2023

kpedersen said:
An incomplete write *may* be due to a read being required due to the encryption protocol stuff below. And an incomplete read can also be due to a write being required, so this also needs to be handled.

So like you have done, you flag it to later be registered for a read/write select/poll later on.

And as of now, I'm pretty sure this works reliably. If you find some problem with it, please let me know ?

I have to admit of course it doesn't exactly lead to readable code

kpedersen · May 11, 2023

zirias@ said:
And as of now, I'm pretty sure this works reliably. If you find some problem with it, please let me know ?

I have to admit of course it doesn't exactly lead to readable code

It looks pretty good, I don't see anything wrong. The nature of SSL code is just a little difficult to scan through and verify. My "rule of thumb" for robustness with SSL (sometimes at the expense of efficiency).

Always register all sockets for read polling
Register a socket for write polling if userdata in buffer exists to be sent
Register a socket for write polling if SSL_read last returned SSL_ERROR_WANT_WRITE
Register a socket for write polling if SSL_write last returned SSL_ERROR_WANT_WRITE
Perform SSL_read on every socket regardless every ~3 seconds to ensure no blockage. (This function may end up sending, i.e SSL re-negotiation)

That last one is a hack but one that is quite affordable and might just get us over the finish line

zirias@ · May 11, 2023

kpedersen said:
Always register all sockets for read polling

I don't do that, because it would make my code less flexible. Consumers of my "dataReceived" event can set the "handling" flag in the event args to signal "hey, I'm still working on that buffer, please don't touch it unless I tell you", so the connection must stop reading until told to resume with Connection_confirmDataReceived(). This feature actually comes handy for tlsc, I can avoid doing yet another copy of the buffer but just reuse it for sending on the other connection.

kpedersen said:
Perform SSL_read on every socket regardless every ~3 seconds to ensure no blockage. (This function may end up sending, i.e SSL negotiation)

That's indeed a "hack" as you say. I hope having fixed exactly that with my latest commit

But then, I fully agree with you having to deal with reads requiring writes, writes requiring reads, handshakes requiring whatever ... is a major PITA.

Another reason I try to hide all that crap behind my own abstractions ?

zirias@ · May 16, 2023

Observing behavior a bit, this is how it's looking now here after running for 4.5 days:

This is an instance with only a single configured "tunnel" (I just don't need more) which is used every 2 minutes (by fetchnews running from cron). I guess for nowadays, less than 5.4 MiB of resident set can indeed still be considered "lightweight"

Maybe the next version should expose configuration of the thread pool to the user. It currently uses as many worker threads as there are CPUs. but I see here, only 2 of these were ever used (judging from their priority) and only 1 had actually something relevant to do (judging from the TIME column).

zirias@ · Jun 1, 2023

Tlsc will see a new release pretty soon, after some huge refactoring, finally pulling out all the common/generic code into a separate library.

Today, I released poser v1.0, which currently only contains libposercore.so, a library containing the most basic building blocks (service loop, threadpool, daemon code, socket connections and servers, etc). In the future, I might add e.g. a "poserweb" library (with support for HTTP and HTML), or a "poserirc" library (supporting the IRC protocol), because I already have such code somewhere ?

With poser, tlsc instantly gets a new feature (which should have been there from the very beginning to be honest): Verification of the server certificate! Sure, the functionality to connect to some TLS server doesn't require it, but you probably use TLS because you want to transmit some sensitive data (like, credentials), so you want to make sure you're talking to a legit server ?

Also, poser already has support for TLS-enabled servers as well. Therefore, in the future, I might add support for that in tlsc as well, making it a more complete BSD-licensed replacement for stunnel (minus a lot of configurability, I still want to keep it simple).

So, stay tuned

zirias@ · Jun 1, 2023

Must be Murphy's law. No matter how much testing you do (here: round about one week in "production" use), as soon as you finally create a release, you suddenly spot a bug ?

Here, "TLS shutdown" was broken. It just wasn't done at all (which still "works", but isn't nice to the peer ...). This was caused by a simple typo. Fixing it brought up other issues ... the full TLS shutdown is 2-way (send something, wait for the peer to confirm). This doesn't work well on shutdown of a server or the whole service. But, "1-way" (close the connection immediately after sending the close notification) is acceptable according to standards. So, I ended up with having both options, do a proper/clean 2-way TLS shutdown during normal operation, just do a quick 1-way shutdown on server or service shutdown.

TLS is really a PITA ?

Well, here comes poser v1.0.1 then ?

It's fully API/ABI backwards compatible, so at least, this was a first reality check for sane API/ABI design ?

zirias@ · Jun 2, 2023

So, this went quicker than I thought (but I guess that's the nice thing when you already have all the technical foo solved in a lib) ...

Here comes tlsc v2.0, and it already includes "server mode" (which allows to connect TLS-enabled clients to non-TLS services) ? So now, tlsc can replace stunnel for all its basic use cases, using a BSD license! (it's still a very simple design with little to configure).

Updating the port will take a while, I want to verify it here "in production" first, but once again, my build server is busy building chromium and a few other huge ports ?