Facebook global outage

sko

Aspiring Daemon

Reaction score: 394
Messages: 704

I'm betting that was a big "Oops" moment.
It was more like a "I know this stuff works, so how bad can this be?". I have a 2-hour maintenance window (or longer, but then my evening is ruined..) and this was right at the beginning, so plenty of time to watch an "edge-case" happening IRL.
 

astyle

Daemon

Reaction score: 467
Messages: 1,078

It was more like a "I know this stuff works, so how bad can this be?". I have a 2-hour maintenance window (or longer, but then my evening is ruined..) and this was right at the beginning, so plenty of time to watch an "edge-case" happening IRL.
yeah, this points to the need to do your homework, and have a way to go back if you realize you made a mistake. Something I adopted as my MO lately.
 

Jose

Daemon

Reaction score: 975
Messages: 1,179

The funny thing about this is that I didn't know that any of the big hyper-scalers have data centers in Silicon Valley. With the insanely high cost of real estate and electricity here...
And the likelihood of earthquakes!

I've heard stories that some of the most fundamental security keys (like the ultimate root password to all of Amazon AWS, just as a hypothetical example) are stored in a physical safe (a big steel box with thick walls) near the CEOs office, using a standalone security device. That safe uses traditional mechanical locks (the thing with a dial). I've also heard stories that some of those security devices rely on being unlocked by a pass phrase which is memorized by a small number of humans, but not recorded otherwise (not on a piece of hardware). Part of the long delay in getting Facebook back online might have been caused by the need for one of those humans to be brought to the correct location. If someone has some spare time, they could track what flights Facebook's corporate aircraft took yesterday, it might give us a clue.
There are software versions of this scheme too:

And everyone else in the industry will also have long meetings, to make sure that "this can't happen to us". Those meetings won't be quite as painful, but no means amusing.
There was some gloating at $WORK, which made me fear the jinx. Hubris comes before the fall.

View: https://m.youtube.com/watch?v=hESunUuFrzk


Someone mentioned that the high rate of preppers among military/police/rescue is due to them knowing the plans and chances should things get sideways for any gouvernment action to be of any use. Who can make fire from nothing? Who has an EMP resistent watch?

Don't panic.
 
OP
Zirias

Zirias

Son of Beastie

Reaction score: 1,517
Messages: 2,637

I have to state it: A thread about an outage of facebook(!) quickly turned into discussing the end of civilization. Dafuq? :what:
 

sko

Aspiring Daemon

Reaction score: 394
Messages: 704

yeah, this points to the need to do your homework, and have a way to go back if you realize you made a mistake. Something I adopted as my MO lately.
All the switch configs are in revision control and these switches are 30seconds away from my desk; so yes - if this really would have went bad I had a pretty straightforward backup plan: just revert the config change and reboot them one by one. As said: I had plenty of time, knew this 'should' work and wanted to see how it would handle this edge case (e.g. in case of a long-term power outage our UPS won't handle).
 

Jose

Daemon

Reaction score: 975
Messages: 1,179

Yep, but that wasn't what I asked. I had a look yesterday, all the facebook-related domains have 4 nameservers in subdomains, e.g. a.ns.whatsapp.net etc. for whatsapp.com. So I'd expect an A (and probably AAAA) record to be present in the net. zone.
The root servers return an NS record, which is a host name, not an IP. They will also return a glue record if the host name is under the domain that is being queried, which is very common. (Apparently this is called in-bailiwick.)
 

Jose

Daemon

Reaction score: 975
Messages: 1,179

Exactly. And my question still is: doesn't drill(1) show these when tracing?
drill ns does show them for me.
Code:
drill ns facebook.com                                      
;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 61759
;; flags: qr rd ra ; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 8 
;; QUESTION SECTION:
;; facebook.com.    IN    NS

;; ANSWER SECTION:
facebook.com.    9510    IN    NS    c.ns.facebook.com.
facebook.com.    9510    IN    NS    b.ns.facebook.com.
facebook.com.    9510    IN    NS    a.ns.facebook.com.
facebook.com.    9510    IN    NS    d.ns.facebook.com.

;; AUTHORITY SECTION:

;; ADDITIONAL SECTION:
a.ns.facebook.com.    9510    IN    A    129.134.30.12
b.ns.facebook.com.    9510    IN    A    129.134.31.12
c.ns.facebook.com.    9510    IN    A    185.89.218.12
d.ns.facebook.com.    9510    IN    A    185.89.219.12
a.ns.facebook.com.    9510    IN    AAAA    2a03:2880:f0fc:c:face:b00c:0:35
b.ns.facebook.com.    9510    IN    AAAA    2a03:2880:f0fd:c:face:b00c:0:35
c.ns.facebook.com.    9510    IN    AAAA    2a03:2880:f1fc:c:face:b00c:0:35
d.ns.facebook.com.    9510    IN    AAAA    2a03:2880:f1fd:c:face:b00c:0:35

;; Query time: 0 msec
;; SERVER: 172.16.1.4
;; WHEN: Wed Oct  6 08:12:26 2021
;; MSG SIZE  rcvd: 285
They're the A records in the additional section.
 
OP
Zirias

Zirias

Son of Beastie

Reaction score: 1,517
Messages: 2,637

Ah! ok, so I have a way to check for them. I guess drill -T a.ns.facebook.com doesn't show them, even if they are used to query the authorative nameservers and that fails. That's somewhat confusing as I first thought facebook might have withdrawn their glue records.
 

Jose

Daemon

Reaction score: 975
Messages: 1,179

Ah! ok, so I have a way to check for them. I guess drill -T a.ns.facebook.com doesn't show them, even if they are used to query the authorative nameservers and that fails. That's somewhat confusing as I first thought facebook might have withdrawn their glue records.
If you think about it, the glue records don't live on the Facebook DNS servers. They'd be useless there, since the address of those servers is precisely what you're trying to find. How would you query the Facebook DNS servers for a glue record if the address of the Facebook DNS servers is precisely what you're trying to find?

I've only set up small-time domains; many, many orders of magnitude smaller than Facebook, but FWIW in those glue records are hosted at my registrar.

Thought experiment. Suppose I think glue records are an Evil Hack, and I've come up with this scheme to work around them. I have two domains, example.com and example.net. The name server for example.com is ns.example.net, and the name server for example.net is ns.example.com. Why won't that work?
 
OP
Zirias

Zirias

Son of Beastie

Reaction score: 1,517
Messages: 2,637

If you think about it, the glue records don't live on the Facebook DNS servers. They'd be useless there, since the address of those servers is precisely what you're trying to find. How would you query the Facebook DNS servers for a glue record if the address of the Facebook DNS servers is precisely what you're trying to find?
Although you kind of answered my question (gotta remember how to "see" glue records with drill(1) :cool:), you still don't understand me. The trace option (-T) of drill is specifically for debugging purposes. So I just assumed it would show glue records it finds on the way. After all, these are needed to be able to ask facebook's nameservers in the first place. There's no doubt the whole operation will fail if none of these authorative nameservers can be contacted, that was never my question.
 

astyle

Daemon

Reaction score: 467
Messages: 1,078

Thought experiment. Suppose I think glue records are an Evil Hack, and I've come up with this scheme to work around them. I have two domains, example.com and example.net. The name server for example.com is ns.example.net, and the name server for example.net is ns.example.com. Why won't that work?
Circular references.
 
OP
Zirias

Zirias

Son of Beastie

Reaction score: 1,517
Messages: 2,637

Just to get that straight: You can mess up glue records. If you request none in a domain update, you get none. And yes, there are usecases where you don't need any (if your authorative nameservers are in different domains that are "glued").

drill -T not showing anything surprised me and led me to the (false) assumption facebook might have messed up their glue records with erroneous domain updates. That's it…
 

Jose

Daemon

Reaction score: 975
Messages: 1,179

This appears to be an implementation detail of drill(1). dig(1) with +trace doesn't show them either, but it actually queries one of the servers it must've got through a glue record!
Code:
 dig facebook.com ns +trace

; <<>> DiG 9.16.4 <<>> facebook.com ns +trace
;; global options: +cmd
.            37065    IN    NS    k.root-servers.net.
.            37065    IN    NS    b.root-servers.net.
.            37065    IN    NS    i.root-servers.net.
.            37065    IN    NS    h.root-servers.net.
...
facebook.com.        172800    IN    NS    c.ns.facebook.com.
facebook.com.        172800    IN    NS    d.ns.facebook.com.
facebook.com.        172800    IN    NS    b.ns.facebook.com.
facebook.com.        172800    IN    NS    a.ns.facebook.com.
;; Received 284 bytes from 129.134.30.12#53(a.ns.facebook.com) in 0 ms

It's also much faster than drill, for some reason.
 
OP
Zirias

Zirias

Son of Beastie

Reaction score: 1,517
Messages: 2,637

Yep, gotta remember that: To see glue records with drill(1), do an explicit NS query.
 

Crivens

Moderator
Staff member
Moderator

Reaction score: 1,649
Messages: 2,514

I have to state it: A thread about an outage of facebook(!) quickly turned into discussing the end of civilization. Dafuq? :what:
I don't know about the network stuff involved, and I could hardly care less about FB being gone, but I do worry about the increasing dependence on fragile infrastructure and how interwoven some fragile stuff is. Texas Snowstorm anyone? Pipeline shutdown?
 

Sevendogsbsd

Daemon

Reaction score: 673
Messages: 1,121

Crivens: exactly. I am in Texas and the idiots running the local government have a private corporation in control of the power grid. The state is not on the national grid for some stupid reason. In the interest of saving $, they did not winterize the equipment and we all know the outcome of that.
 

jbodenmann

Well-Known Member

Reaction score: 142
Messages: 306

Crivens: exactly. I am in Texas and the idiots running the local government have a private corporation in control of the power grid. The state is not on the national grid for some stupid reason. In the interest of saving $, they did not winterize the equipment and we all know the outcome of that.
Wait... I hope I misunderstood this. Are you saying that the state of Texas is not connected to the U.S. national power grid? i.e. Texas' power grip is decoupled/isolated/separated from that?
If so, are there physical links that are simply decativated "when not needed" or is it really, truly an isolated grid?
 

Sevendogsbsd

Daemon

Reaction score: 673
Messages: 1,121

Wait... I hope I misunderstood this. Are you saying that the state of Texas is not connected to the U.S. national power grid? i.e. Texas' power grip is decoupled/isolated/separated from that?
If so, are there physical links that are simply decativated "when not needed" or is it really, truly an isolated grid?
That is correct. It is quite idiotic actually. The state government wants independence from the federal government, but they will sure take federal money when it is offered…

As for the mechanics of the separation, I am not sure of the details.
 

Beastie7

Aspiring Daemon

Reaction score: 590
Messages: 709

I would love to go back to the early 00's when all this garbage never existed, and people actually went outside. I hope Facebook, etc. dies a good death.
 

Sevendogsbsd

Daemon

Reaction score: 673
Messages: 1,121

Agree. I had a fleeting interest in social media but then I saw what it was/is. I have had a Twitter account 2x and rage quit both times because frankly I get obsessed with it and there are so many trolls. At least I had the good sense to only make fake burner accounts.
 
Top