Reasonable number of FreeBSD jails per server?

I'm looking to deploy a server capable of accepting arbitrary programs from many users (possibly thousands). These programs would vary in resource requirements. The typical program would likely wake up several times a second, perform some fairly light computation (approximately 5 microseconds of work or so), and go back to sleep.

The main concern is stability and security -- users shouldn't be able to impact other users on the system. I'd like to be able to limit resource usage per user -- memory, CPU time, etc. Additionally, I'd like to keep wake-up latency as low as possible (receiving data from the network and waking up the necessary program).

The systems running FreeBSD will be server-class. 8+ cores with 64GB+ of RAM.

I'm wondering if placing each user in a jail is a reasonable option. What might a maximum number of jails per-server be, assuming each jail contained 1 user with N programs? Or am I primarily limited by the total number of programs running, with jails being fairly negligible overhead?
 
Thanks for the link. It's good to see it's at least possible to host 1000 jails on a single system. I'm going to try and put together some test cases and see how a system performs under some load. I'll report back in a bit.
 
It certainly depends on the load of the jails and/or the host. But a jail in and of itself doesn't require a lot of resources. So it shouldn't be a problem running a lot of them.

The load of the machine will increase though, but this has more to do with the number of processes than CPU load.
 
Reporting back with some results. I set up an experiment to mimick the server I'm looking to build. The experiment setup was as follows:

  1. The server had 12GB of RAM, and 2 Intel Xeon processors running at 2.6GHz with 8 cores.
  2. Jails created on the server were made from the default 'ezjail-admin' configuration, plus a special startup script to launch the "work simulator" program. (NOTE: kern.maxproc and kern.maxfiles needed to be increased)
  3. The work simulator was basic: it would open a device file and listen for an event on that device. When an event occurred, it would read a 'start' timestamp from the device, simulate 50 microseconds of work using a busy-for loop, and record a 'finish' timestamp.
  4. The delta between start and finish represented the time for an individual worker simulator to completely process a signal.
  5. The device file was a custom kernel module I wrote. Essentially a "writer" process would run from outside the jails. The writer would write to a device node. The data written was simply a timestamp taken just before the write. The kernel module would then use kevents to signal all the work simulator programs that data was ready. The work simulator programs would 'read' the timestamp by looking into shared memory for the timestamp value (no copy from kernel to userspace).
  6. I then spawned 1000 jails (and subsequently 1 work simulator program within each jail). It took roughly 8 min to create and launch all 1000.
  7. Once all 1000 jails were running, I started the writer program. The writer produced a signal every 2 seconds. I let it run for 10 minutes or so.
  8. After stopping the writer process, I gathered all the measured time-deltas and computed the average.
  9. I then repeated the same test without using any jails at all.
  10. Finally, I repeated once more, this time arranging the workload into a "bucket" configuration (so, 8 workload simulator programs -- one per core. Each simulator would then simulate the workload 1000/8 times per signal).

The results, I think, are pretty positive.

  • Running the experiment in a bucket configuration resulted in an average processing time of about 3300 us (microseconds).
  • Running the experiment with 1000 work simulator programs outside of jails had an averagetime of about 3800 us.
  • Running the experiment with 1000 work simulator programs each inside their own jail had an averagetime of about 4000 us.

This was a rather extreme scenario. If I ran the test such that only a few of the work simulator programs were signaled at a time (instead of all 1000), the numbers were much smaller (under 100 microseconds) and much closer.

Hopefully someone else finds this information useful!
 
Back
Top