kqueue scalability across multiple CPU threads

RockaRolla · Nov 13, 2020

I'am trying to find some info on kqueue scalability across multiple CPU threads. Does it scale more or less linearly with enough RSS lines to backup with a one kqueue per one CPU thread scheme?

Hopefully no kernel locks congestion exists between multiple kqueues on multiple corresponding threads in one process? (it seems that nginx uses 1 process per 1 kqueue purposely to combat extra locking).

Basically I'am less interested in a peak throughput of individual kqueue but in max scalability of multiple kqueues on multiple threads instead - for example could reasonably powerful server running FreeBSD scale above 10 millions packages per second with a few millions active tcp/ip connections? (such benchmarks already exist for Win and Linux but I'd prefer to write code for FreeBSD, if possible).

mark_j · Nov 14, 2020

Have you thought this through? Why? Because the least of your worries is kqueue events. It's the kernel. You'll need to offload the IP stack from the kernel; bypassing it into 'userland' and direct to your application. You'll need to design your memory pool requirements, because cpu cycles are your guide/limitation and wasting time (cycles) on memory allocation is a killer.Take note of numa(4). Locks in the kernel are also somewhat slow. That many transactions would be impossible for the kernel to handle, but you can prove me wrong.

Kqueue by definition is scalable, but so is select to a certain extent, and that's the key. There are limitations to everything. Unless you're fully conversant with kqueue I'm not sure how you will accomplish your goal? Sure kqueue scales across threads and even cores but you'll have to be a master at mutexes and the use of EV_ONESHOT.

If you want a better example of core scaling than nginx, then look no further than PostgreSQL v9+.

So does this answer your question? No. You can only answer that by actually doing the hard yards.

RockaRolla · Nov 14, 2020

Really surprised that there are still no proper benchmarks comparing network stacks and their scalability, below is the best one I was able to find :-D

View: https://www.youtube.com/watch?v=RGo4iyNwo_Y

I'am mainly interested in whether some sort of regression in performance exists due to the possible kernel lock contention for kqueues when scaling them past dozens of CPU units.

mark_j · Nov 14, 2020

RockaRolla said:
Really surprised that there are still no proper benchmarks comparing network stacks and their scalability, below is the best one I was able to find :-D

View: https://www.youtube.com/watch?v=RGo4iyNwo_Y

I'am mainly interested in whether some sort of regression in performance exists due to the possible kernel lock contention for kqueues when scaling them past dozens of CPU units.

I have to say, if you want them so much, do them. I'm not trying to be offensive, but someone's not seen a cost-benefit to doing it; at least not for your benefit. (We did similar with NetBSD & FreeBSD a few years back although not quite to the scale you're pondering here, but that's commercial stuff).

If you'd like to pay I'm sure someone can come to some arrangement.

Edit:
Follow-up. Perhaps it's worth joining a mailing list or two and ask this question. I'm sure both Netflix and Juniper have embarked on these sorts of benchmarks. A developer may be willing to share?

kqueue scalability across multiple CPU threads

RockaRolla

mark_j

RockaRolla

mark_j