Outage Report

brd@

Administrator
Administrator
Developer
Hi all,

I wanted to give you all an idea what happened and what we are doing to mitigate similar outages.

The machine gracefully powered itself down due to temperature thresholds. An air conditioner failure caused the temperature to spike. So this means that all the data is safe.

Future Mitigation:
We are investigating adding a temperature sensor to this server room so that we can be proactive in monitoring the facility.

We are also considering migrating it to a different data center facility.


Thanks,
Brad Davis
FreeBSD Cluster Administrator
 
One thing on temperature probes - I'd be looking for one that has been tested properly and confirmed to work well.

I mention this because we added a generic (i.e., I do not recall the brand) temperature probe to a previous datacentre that I had gear in, and the air conditioning failed. The only reason I found out was due to being located next door to the room and hearing a huge amount of fan noise from everything. The room's ambient temperature got up to 54 degrees C. The only device to fail was... you guessed it, the temperature probe (though we did have a drive failure in the array the next week) - prior to generating an alert for the temperature we set (which was something like 30 degrees).
 
brd@ said:
Hi all,

I wanted to give you all an idea what happened and what we are doing to mitigate similar outages.

The machine gracefully powered itself down due to temperature thresholds. An air conditioner failure caused the temperature to spike. So this means that all the data is safe.

Future Mitigation:
We are investigating adding a temperature sensor to this server room so that we can be proactive in monitoring the facility.

We are also considering migrating it to a different data center facility.


Thanks,
Brad Davis
FreeBSD Cluster Administrator

In other words, not FreeBSD's fault. :) I thought some wacko was trying to DOS/DDOS the forum when it stopped responding.
 
throAU said:
One thing on temperature probes - I'd be looking for one that has been tested properly and confirmed to work well.

I mention this because we added a generic (i.e., I do not recall the brand) temperature probe to a previous datacentre that I had gear in, and the air conditioning failed. The only reason I found out was due to being located next door to the room and hearing a huge amount of fan noise from everything. The room's ambient temperature got up to 54 degrees C. The only device to fail was... you guessed it, the temperature probe (though we did have a drive failure in the array the next week) - prior to generating an alert for the temperature we set (which was something like 30 degrees).

Yeah, that is why I always do active probes for things like that. I am not a fan of SNMP traps, because something in the middle could fail and you would never know.

I have been using OpenGear for a few years now and theirs work pretty well.
 
The traceroute listed San Jose as the origin yet Google traced the physical location to Cancun. The server setup is quite impressive; but, Debian can be added as an emulated layer. Why use Ubuntu instead of what Ubuntu is built upon; i.e. Debian? The server software is also impressive.
http://www.undeadly.org/cgi?action=article&sid=20131017073054
http://quigon.bsws.de/papers/2012/eurobsdcon/

Would that ever be used in conjunction with FreeBSD?

Shouldn't someone be starting a new topic about now?
 
Back
Top