C Create a thread with a separate file descriptor table (set RFFDG flag)

Doubleble · Aug 13, 2018

I want to create a thread with a separate file descriptor table to have better performance with kevent. In Linux, I was using unshare syscall to achieve this, which as far as I know there is no equivalent or similar syscall in FreeBSD. Then I tried creating such thread directly with rfork via its wrapper rfork_thread. But it does not work correctly, usually it terminates in the middle of the function and waitpid collected status with 0x8b (or some other code that does not match with any number in bits/waitstatus.h).

While I am trying to debug the kernel to find out what are the problems, I would like to post my test code here to see if anyone sees a problem, which can greatly save my time. The test code runs well except status in the end is not zero. I implemented same logic in my program and the status is 0x8b which I guess is because of invalid page access.

Thanks for any help in advance!

C++:

#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/mman.h>
#include <vector>

const size_t VEC_SIZE = 1000;

static int thread_routine(void* arg) {
  std::cout<<"init thread "<<arg<<std::endl;
  // the problem seems related to some memory allocation
  std::vector<int>** vectors = new std::vector<int>*[VEC_SIZE];
  for(int i = 0; i < VEC_SIZE; i++) {
    vectors[i] = new std::vector<int>(10000);
    std::cout<<"vec "<<i<<" initialized"<<std::endl;
  }
}

int main() {
  const int STACK_SIZE = 8000000;
  //void* stackaddr = malloc(STACK_SIZE); // should also work
  void* stackaddr = mmap(NULL, STACK_SIZE, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0);
  std::cout<<"stackaddr: 0x"<<std::hex<<stackaddr<<std::endl;
  void* stacktop  = (char*)stackaddr + STACK_SIZE; // assuming stack going downwards
 
  pid_t child = 0;
  child = rfork_thread(RFFDG|RFPROC|RFMEM|RFSIGSHARE,stacktop,&thread_routine, reinterpret_cast<void*>(2));
  int status = 0;
  waitpid(child, &status, 0x0);
  std::cout<<"return status 0x"<<std::hex<<status<<std::endl; // should return 0? but usually not
}

Bobi B. · Aug 13, 2018

Do you mind linking to a rationale of benefits of threads with a separate file descriptor tables?

My shoot in the dark would be that standard C/C++ library is not properly initialized or designed to run in your new "thread", that is not really a POSIX thread. Perhaps you can try with custom allocators.

Also from rfork(2), that might or might not be related to rfork_thread(3), regarding RFMEM:

Note that a lot of code will not run correctly in such an environment.

Doubleble · Aug 14, 2018

Bobi B. said:
Do you mind linking to a rationale of benefits of threads with a separate file descriptor tables?

My shoot in the dark would be that standard C/C++ library is not properly initialized or designed to run in your new "thread", that is not really a POSIX thread. Perhaps you can try with custom allocators.

Also from rfork(2), that might or might not be related to rfork_thread(3), regarding RFMEM:

Thank you Bobi!

It is my supervisor's idea to implement this. I don't completely understand the details, but I guess syscalls like kevent (FreeBSD) and epoll (Linux) may not be constant time complexity. As the number of file descriptors it has to monitor grows, it is reasonable to guess these calls take longer. Especially when the program is used as an IO multiplexing application, where it handles thousands of TCP long connections. My supervisor said he had done it before, and witnessed big performance increase on Linux. I am trying to test it myself on Linux, and I will post some data later.

I tried not using vector, simply allocate with new, but still has the problem. From my understanding, pthread_create allocate a pthread structure pd on the top of the stack, and then calls clone to create new thread, and then freebsd create its version of clone which ultimately calls rfork (I didn't find these part of code). So theoretically, since we are not using any pthread functions further (so missing pthread struct should be fine?), we think it is possible to use rfork_thread to create a new thread with flags we want and should work properly.

Maelstorm · Aug 14, 2018

Taking a cursory glance at your code, you should not be using rfork_thread anyways. The man page for it says that rfork_thread has been depreciated in favor of pthread_create. Depreciated functions may be removed in future releases. In FreeBSD, the only way that you can separate the file descriptor table between threads is to fork each thread off into its own process. Threads, by design, share everything within the same process (except TIDs and Stack.). This includes global variables, memory, and file descriptors.

Any time that you transverse a list, you will not have O(1) complexity. It's O(N) complexity. In some implementations, the file descriptor is an index to the table so the kernel can access it immediately. But to poll the table, that requires the kernel to transverse the list. FreeBSD, to my knowledge, does not have the ability to have separate file descriptor tables for each thread. The implementation of FreeBSD and Linux are very different. You might get better responses from the mailing lists. I think the hackers or threads mailing lists would be of help here.

https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/eresources-mail.html

Bobi B. · Aug 14, 2018

Well, frankly, I fail to see how using a thread with a separate file descriptor table would help, given your particular case. If you separate file descriptors in several tables, that would mean, that you can no longer multiplex file descriptors from within one thread. If you're going to use several threads anyways, why bother with separate file descriptor tables? This "performance increase" might be valid for select(2) or poll(2), but kqueue(2) and epoll were designed specifically to overcome performance issues and O(N) cost of older APIs.

Well, perhaps using several tables will lead to less lock waits, but not sure about much else. How much time passed since your tests showed performance increase of using separate file descriptor for threads and what multiplex API were in use?

Doubleble · Aug 15, 2018

Bobi B. said:
Well, frankly, I fail to see how using a thread with a separate file descriptor table would help, given your particular case. If you separate file descriptors in several tables, that would mean, that you can no longer multiplex file descriptors from within one thread. If you're going to use several threads anyways, why bother with separate file descriptor tables? This "performance increase" might be valid for select(2) or poll(2), but kqueue(2) and epoll were designed specifically to overcome performance issues and O(N) cost of older APIs.

Well, perhaps using several tables will lead to less lock waits, but not sure about much else. How much time passed since your tests showed performance increase of using separate file descriptor for threads and what multiplex API were in use?

Sorry, I was not given much details about why this would increase performance, so I have to do more studying to answer your questions. I just tested using httperf, with 1000 concurrent connections (each connection requests 500 times, and the server returns plain text "hello world" on each request), the time reduced by 10% on Linux with unshare(CLONE_FILES) called. Threads are grouped into clusters and clusters do not share FD tables (threads in one cluster share one FD table). Currently, I have trouble increasing the number of concurrent connections on my machine, but the system is designed for handling up to a million of connections. So I would expect more performance increase when there are more concurrent connections.

C Create a thread with a separate file descriptor table (set RFFDG flag)

Doubleble

Bobi B.

Doubleble

Maelstorm

Bobi B.

Doubleble