• This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn more.

Outage Report

brd@

Administrator
Staff member
Administrator
Moderator
Developer

Thanks: 85
Messages: 292

#1
Hi all,

I wanted to give you all an idea what happened and what we are doing to mitigate similar outages.

The machine gracefully powered itself down due to temperature thresholds. An air conditioner failure caused the temperature to spike. So this means that all the data is safe.

Future Mitigation:
We are investigating adding a temperature sensor to this server room so that we can be proactive in monitoring the facility.

We are also considering migrating it to a different data center facility.


Thanks,
Brad Davis
FreeBSD Cluster Administrator
 

throAU

Aspiring Daemon

Thanks: 142
Messages: 910

#2
One thing on temperature probes - I'd be looking for one that has been tested properly and confirmed to work well.

I mention this because we added a generic (i.e., I do not recall the brand) temperature probe to a previous datacentre that I had gear in, and the air conditioning failed. The only reason I found out was due to being located next door to the room and hearing a huge amount of fan noise from everything. The room's ambient temperature got up to 54 degrees C. The only device to fail was... you guessed it, the temperature probe (though we did have a drive failure in the array the next week) - prior to generating an alert for the temperature we set (which was something like 30 degrees).
 

zspider

Aspiring Daemon

Thanks: 111
Messages: 582

#3
brd@ said:
Hi all,

I wanted to give you all an idea what happened and what we are doing to mitigate similar outages.

The machine gracefully powered itself down due to temperature thresholds. An air conditioner failure caused the temperature to spike. So this means that all the data is safe.

Future Mitigation:
We are investigating adding a temperature sensor to this server room so that we can be proactive in monitoring the facility.

We are also considering migrating it to a different data center facility.


Thanks,
Brad Davis
FreeBSD Cluster Administrator
In other words, not FreeBSD's fault. :) I thought some wacko was trying to DOS/DDOS the forum when it stopped responding.
 

brd@

Administrator
Staff member
Administrator
Moderator
Developer

Thanks: 85
Messages: 292

#4
throAU said:
One thing on temperature probes - I'd be looking for one that has been tested properly and confirmed to work well.

I mention this because we added a generic (i.e., I do not recall the brand) temperature probe to a previous datacentre that I had gear in, and the air conditioning failed. The only reason I found out was due to being located next door to the room and hearing a huge amount of fan noise from everything. The room's ambient temperature got up to 54 degrees C. The only device to fail was... you guessed it, the temperature probe (though we did have a drive failure in the array the next week) - prior to generating an alert for the temperature we set (which was something like 30 degrees).
Yeah, that is why I always do active probes for things like that. I am not a fan of SNMP traps, because something in the middle could fail and you would never know.

I have been using OpenGear for a few years now and theirs work pretty well.
 

sossego

Retired from the forums

Thanks: 142
Messages: 1,557

#5
The traceroute listed San Jose as the origin yet Google traced the physical location to Cancun. The server setup is quite impressive; but, Debian can be added as an emulated layer. Why use Ubuntu instead of what Ubuntu is built upon; i.e. Debian? The server software is also impressive.
http://www.undeadly.org/cgi?action=article&sid=20131017073054
http://quigon.bsws.de/papers/2012/eurobsdcon/

Would that ever be used in conjunction with FreeBSD?

Shouldn't someone be starting a new topic about now?
 
Top