Anyone who uses the Internet was aware of the big Facebook outage yesterday, October 4th, 2021. Of course, it didn’t just affect Facebook; WhatsApp and Instagram were down too. Here’s what happened, according to Facebook. That this system-wide, multi-hour outage came on the heels of Facebook whistleblower Frances Haugen revealing, among other things, that (she alleges) Facebook knew that the service was being used to facilitate human trafficking, knew that the Instagram platform was contributing to poor body image among teenaged girls, and intentionally was eliciting angry responses from Facebook users, seems to be pure coincidence, at least if Facebook’s engineers are to be believed.
We covered yesterday’s Facebook, Instagram, and WhatsApp outages here. Now we want to talk about why it happened.
Basically, what Facebook says happened (see full statement below, along with link to full statement on Facebook’s engineering section) is that changes had been made to the routers that connect Facebook and Facebook properties to the Internet, and to each other, and those changes caused issues, which in turn “had a cascading effect on the way our data centers communicate, bringing our services to a halt.”
Those with whom we are acquainted who were watching from the outside believe that it had to do specifically with issues with BGP. BGP stands for Border Gateway Protocol, and is the protocol used for exchanging (sharing) routing information between systems on the Internet.
Here’s the full explanation from the Facebook engineers:
Facebook Engineer’s Statement and Explanation of the 10/4/21 Incident
To all the people and businesses around the world who depend on us, we are sorry for the inconvenience caused by today’s outage across our platforms. We’ve been working as hard as we can to restore access, and our systems are now back up and running. The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem.
Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.
Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear that there was no malicious activity behind this outage — its root cause was a faulty configuration change on our end. We also have no evidence that user data was compromised as a result of this downtime. (Updated on Oct. 5, 2021 to reflect the latest information)
People and businesses around the world rely on us everyday to stay connected. We understand the impact outages like these have on people’s lives, and our responsibility to keep people informed about disruptions to our services. We apologize to all those affected, and we’re working to understand more about what happened today so we can continue to make our infrastructure more resilient.