Response under very high load

I have a general question on whether or not this is expected behavior, or if there's something wrong with my system. I'm running version 9.1.0-STABLE, quad-core processor, 12 GB RAM.

When I execute very CPU-intensive operations for long periods of time, various portions of the system start to become unresponsive. Completely unresponsive, but not crashed. For example, running multithreaded encryption programs utilizing GELI that pull the load on the system up to somewhere between 12 and 15 cause this behavior.

  • For 30-45 minutes at load 12-15, everything works fine (SSH, Apache, ping, console, etc).
  • After a few hours at load 12-15, SSH will quit responding, but I can still ping, and the local console works.
  • After a few more hours, even the local console will stop responding. And by not responding, I mean, it will not answer any keyboard inputs for days at a time.
  • After a few days, pretty much everything is non responsive, but I can still ping the system with very low latency (few ms).
The interesting thing is, if the job completes on its own without my intervention and the load comes back down, everything becomes responsive again and starts working just fine like nothing ever happened. Even those keystrokes I entered into the local keyboard days ago will magically get typed and everything will proceed as normal.

So my question is: is this expected? Or is there a problem I should try to fix? I can work around it by niceing the encryption processes (maybe), or just waiting for the job to complete, so it's not a dealbreaker, just curious what's going on. Thanks!
 
I guess what you are seeing here is that the process start consuming memory, pushing the system in a few hours to start paginating, so much that the I/O bandwidth makes it almost impossible to communicate with the rest of the world. In other words, the machine is so much loaded to "handle" the program load that it cannot do anything else in a reasonable time.

Renicing the process will not solve the issue, since it appears to me is not the scheduling of the process the problem here, but the fact the process eats more resources than those the machine has.

My opinion.
 
Solved

I solved this. For future Google searches, the answer is basically: FreeBSD 9.1 + ESXI 5.0 = doesn't work. Upgraded to latest patched version of ESXI 5.1 = fixed.
 
I'd still run normal users at a higher nice than root and multiuser services. Also, do you have resource limits in place?

Kevin Barry
 
Back
Top