C Threads and blocking syscalls with N:1 or N:M thread model

zirias@ · Apr 2, 2021

This "special" recent thread:

C - When pthreads will work properly on FreeBSD?

Last time when I have tested FreeBSD a few years ago, threads created with pthread was still not able to utilize multiple CPU cores. This made me to permanently halt all software development for BSD. Currently however I plan to restart some development for FreeBSD, and release binaries for BSD...

forums.freebsd.org

made me think about a potential portability issue.

Some time ago, I wrote an event-driven network service for my own use, built around pselect(2). It's single threaded by design and should never block (except, of course, on the pselect(2) call in the center of the event loop).

Then, I needed some APIs (namely getnameinfo(3) and syslog(3)) that lack an async version, so I built a little thread pool to delegate the potentially blocking stuff to different threads, thus "faking" async behavior.

This works perfectly fine on Linux and FreeBSD (didn't test any other POSIX systems), but both default to a 1:1 thread model: all threads are kernel-level threads. Now, what if the implementation would use N:M or even N:1 (PULT) threads? Is my assumption correct that, in that case, a blocking syscall would block my whole process? IOW, should I add pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM) for best portability to at least have kernel-level threads on any system that optionally supports them?

mark_j · Apr 3, 2021

Well, I'll answer a question with a question. Where is N:M threading implemented in FreeBSD? Didn't they abandon libkse(2) in favour of libthr(3) long ago?

In fact, you answered your own question in the topic you cited:
"Analyzing my own little tool using pthreads, I found it doesn't link libpthread.so but libthr.so, which seems to use different defaults. At least, I didn't change any attributes and ended up with kernel-level threads only."

Also PTHREAD_SCOPE_SYSTEM is the default for FreeBSD, process scope was abandoned (I'm guessing around the time libkse was?).
Including it empathically in your code cannot hurt though (especially for non FreeBSD. OpenBSD is a different beast)
See lib/libthr/thread/thr_init.c, particularly:

Code:

struct pthread_attr _pthread_attr_default = {
    .sched_policy = SCHED_OTHER,
    .sched_inherit = PTHREAD_INHERIT_SCHED,
    .prio = 0,
    .suspend = THR_CREATE_RUNNING,
    .flags = PTHREAD_SCOPE_SYSTEM,
    .stackaddr_attr = NULL,
    .stacksize_attr = THR_STACK_DEFAULT,
    .guardsize_attr = 0,
    .cpusetsize = 0,
    .cpuset = NULL
};

PostgreSQL, for example, links against pthreads and run fine on multiple cores.

I'm not sold on costs/benefits of N:M threading, but, hey I'm open to changing my mind.

mark_j · Apr 3, 2021

I found a reference for you:

The first threading models that were deployed in systems such as FreeBSD 5 and Solaris used an N:M threading model in which many user level threads (N) were supported by a smaller number of threads (M) that could run in the kernel [Simpleton, 2008]. The N:M threading model was light-weight but incurred extra overhead when a user-level thread needed to enter the kernel. The model assumed that application developers would write server applications in which potentially thousands of clients would each use a thread, most of which would be idle waiting for an I/O request.
While many of the early applications using threads, such as file servers, worked well with the N:M threading model, later applications tended to use pools of dozens to hundreds of worker threads, most of which would regularly enter the kernel. The application writers took this approach because they wanted to run on a wide range of platforms and key platforms like Windows and Linux could not support tens of thousands of threads. For better efficiency with these applications, the N:M threading model evolved over time to a 1:1 threading model in which every user thread is backed by a kernel thread.

Taken from 2nd ed. of The Design and Implementation of the FreeBSD Operating System.

zirias@ · Apr 3, 2021

mark_j said:
Where is N:M threading implemented in FreeBSD?

So, it seems you didn't fully read my question. It's about other (POSIX-compliant) systems and a potential portability issue. You could of course also assume a very old version of FreeBSD, doesn't really matter for the question.

The key question is: If an implementation uses user-level threads, is there some way to execute blocking syscalls asynchronously, or does it mean they will block the whole process? My assumption is the latter, which would mean that, if blocking is a problem, you should explicitly set PTHREAD_SCOPE_SYSTEM?

mark_j · Apr 3, 2021

I'm sorry, when you say "what if the implementation would use N:M", I assume you're actually talking about them in FreeBSD...

zirias@ · Apr 3, 2021

mark_j said:
I'm sorry, when you say "what if the implementation would use N:M", I assume you're actually talking about them in FreeBSD...

Uhhhm:

Zirias said:
This works perfectly fine on Linux and FreeBSD (didn't test any other POSIX systems), but both default to a 1:1 thread model: all threads are kernel-level threads.

No, the question is explicitly about other/unknown systems.

edit: This part of your quote

The N:M threading model was light-weight but incurred extra overhead when a user-level thread needed to enter the kernel.

makes me think that, at least with N:M (as opposed to N:1), blocking wouldn't be a problem? And then, a "thread pool" is exactly my scenario

What I try to find out is: does it make sense to explicitly set PTHREAD_SCOPE_SYSTEM for maximum portability?

ralphbsz · Apr 3, 2021

It's been a very long time that I used anything other than the 1:1 kernel level threads. But IF I REMEMBER RIGHT, in the PULT N:M model, if more than M threads enter into blocking kernel calls, then everything comes to a grinding halt. Here is an easy way to think about it: In the user-space threading model, the thread library starts up no more than M real (kernel) threads. It then uses those as a pool, and multiplexes the N (>M) user threads over them. Whenever one thread starts a blocking kernel call, it first goes through the thread library (which has wrappers around system calls), and that reduces the pool to M-1 working threads. Do that a few times, with long-running kernel calls (like disk or network IO), and pretty soon you're not multithreaded any more.

Is there still any OS in practical use that doesn't support 1:1 kernel threads, and doesn't make that the default? Your portability problem may no longer exist in the wild.

I think there were some really dirty hacks in thread libraries to work around these problems. I vaguely remember early RedHat versions (long before there was a split between RHEL and Fedora), where you could only issue 255 kernel async IOs (the aio_... Posix calls), because then you'd run out of user-space threads, and the library didn't handle that gracefully. There was also lots of madness with people (mis-) using select calls to code around these threading limitations, only to find that they just hid the problem, and now suddenly the kernel ran ridiculously slowly. And then there was the heavy use of the sendfile() call, so with a single blocking kernel call you could move a whole file into a socket.

mark_j · Apr 3, 2021

Zirias said:
Uhhhm:

Jeez louise, again, I thought you were pondering using something like libkse, which is indeed N:M but abandoned years ago.
Otherwise, why ask the question? I don't know of any OS that uses it. It's cumbersome, high overhead context switching (especially for OSs like OpenVMS and which are POSIX compliant) and just plain old.

If you want to plan coding for "other/unknown systems", then I can't assist in any way.

Anyway, I am out. I obviously don't understand the context of what this esoteric question seems to be asking. I won't add to the noise.

Geri · Apr 3, 2021

back then (about 15 years ago, when i was still newbie with C programming and linux) i have discovered, researched, and used vfork under linux, which is (was) a shady and vaguely defined way to create user level threds (when they have desinged this, they didnt knew what a thread is, so they define it as a ,,virtual'' (vicariously) process with unified memory to the parent process). it worked properly, and the whole process was locked to one core... in theory... because later on, i have seen cases when it was scheduled. i dont know how it works nowadays, but blocking api calls didnt blocked the rest of the ,,threads'' even if it was treated in an n:m fashion. according to the manual, they have removed this function from bsd, and i am not yet sure if a modern compiler will compile this.

zirias@ · Apr 3, 2021

This was about portability, so mentioning the misuse of vfork() for "threading" is… the exact opposite direction. POSIX basically says UB if you do anything other than terminating or calling exec(3) and friends.

And mark_j no need to get snappish just because you misunderstood the question or the background in terms of portability on POSIX.

I already got two insights:

At least with N:M threading, some effort to avoid blocking the process is typically done. I still assume this is impossible with N:1 threading.
Setting a contention scope explicitly for portability probably isn't worth the effort any more, as everyone seems to agree you won't find a system any more that doesn't default to 1:1 threading. (Is this really a "safe assumption"?)

ralphbsz · Apr 3, 2021

Zirias said:
... as everyone seems to agree you won't find a system any more that doesn't default to 1:1 threading. (Is this really a "safe assumption"?)

Let's see. Server OSes: Linux, *BSD, and Windows. Desktop OSes: Add MacOS. We can leave Windows out, since there one uses non-Posix calls anyway. The remaining three all have good 1:1 kernel threading. And for the most part they also have good asynchronous IO support for read() and write().

I don't know what the situation on iOS and Android is; those are the only other two operating systems that exist in numbers significant enough to worry about portability.

And, as you said, the only sane way to use threads (and async IO) today is either Posix.4 calls. Ancient workarounds like vfork have become obsolete. I don't know what the situation with select() for highly multithreaded network servers (like web servers) is; haven't looked at the coding for those in ages.

mark_j · Apr 8, 2021

Zirias said:
This was about portability, so mentioning the misuse of vfork() for "threading" is… the exact opposite direction. POSIX basically says UB if you do

~~Portability implies something to port to. Name one.~~
No, forget it, I forgot this is too esoteric for me.

zirias@ · Apr 8, 2021

mark_j maybe learn what "portable software" means. If you write software for a few specific systems, this is not portable.

mark_j · Apr 8, 2021

Zirias said:
mark_j maybe learn what "portable software" means. If you write software for a few specific systems, this is not portable.

So define portable software in the context of this mythical N:M threading model.
Oh, and ps, I never mentioned vfork(), though perhaps you don't understand context switching is == thread switching; they're both a context change.

zirias@ · Apr 8, 2021

Did you read the POSIX threading documentation? Writing portable software involves writing against a standard, and not assuming a specific behavior. It's quite surprising to see you have no idea about these things.

mark_j · Apr 8, 2021

You answer a question with a question? Ok, here's a more specific one: What standard has m:n threading been specified in?
Regardless, I'm not sure what the ad hominems are attempting to achieve.

No, actually, forget it.

zirias@ · Apr 8, 2021

So, you're too stupid to understand the specification of contention scopes? Or, more likely, you just want to ridicule, cause you think you're smart and are still pissed you completely misunderstood the question and were told so. Too bad.