Big social media is still in big trouble: Instagram, WhatsApp, Facebook and Facebook Messenger experienced widespread outages Monday morning, and many users around the world can’t use some of the most popular sites – Facebook’s services. While this isn’t unknown, the length of the outage, spanning over six hours at the time of this writing, is very rare – and it’s not clear when they’ll be fully back online across the globe.

Facebook.com seems to be coming back for some users, which started with the company’s Newsroom and seems to be extending to the full site and social network. But Instagram and other services remain down in some areas, leading us to believe services are resuming on a region-by-region basis.

DownDetector – a website that tracks outages of online services – had shown that all services were struggling in many territories. There had also been reports that services with accounts tied to Facebook logins, like Airbnb and Strava, are not working. 

There’s currently no way to avoid the issues, so you’ll just have to wait until they’ve been solved to reconnect to WhatsApp, Instagram or Facebook. The company seems to be slowly implementing fixes for each service.

Facebook sites are starting to come back online, with official tweets from Instagram, WhatsApp, and other services claiming a return to service.

We’ve yet to hear an official reason for the outage, but we’ll update here when we have more definitive answers from Facebook itself. Analysts have speculated that the outage was caused by severe networking issues, which we’ve detailed below. 

Shortly after the outage began, Facebook’s communications executive, Andy Stone, was the first to share an update on Twitter to say the company is aware of the issues and it’s currently working on a fix, followed by WhatsApp’s Twitter account. Soon thereafter, Facebook’s official account acknowledged that users were having trouble accessing the company’s apps and products. Facebook CTO Mike Schroepfer tweeted an apology four hours into the outage, and after six hours, the company sent another tweet apologizing as its services come back online:

See more

The issue may have affected other Facebook products, too: some users have also reported issues with the company’s Oculus virtual reality gaming services. Noted Facebook and Twitter data miner Jane Manchun Wong warned users via tweet not to restart their Oculus devices during the outage lest they lose their games. VR game and software designer Julien Dorra tweeted a video of what it’s like to load up Oculus in the midst of the outage:

See more

And the outage might have impacted Facebook’s real-world infrastructure as well: according to a tweet by New York Times reporter Sheera Frenkel, a Facebook employee reportedly can’t even enter company buildings due to malfunctioning badges. Another NYT report claims employees struggled to make calls from work-issued phones or receive emails from outside the company.

Facebook outages: what happened?

None of the Facebook, Whatsapp, or Instagram accounts have explained what originally caused the outage, leading to speculation and analysis. At this point, most agree that this isn’t a hack or directed attack on Facebook’s infrastructure, and sources have told the New York Times that it probably wasn’t a cyberattack because ‘one hack was unlikely to affect so many apps at once.’

Instead, evidence shows the company’s network paths to the outside web just disappeared without explanation this morning.

Brian Krebs of cybersecurity firm Krebs on Security tweeted his conclusion that the domain name system (DNS) records routing traffic to Facebook sites and services were simply withdrawn – as in, gone from the web – this morning:

See more

In a follow-up tweet, Krebs clarified with his belief that the border gateway protocol (BGP) routes serving Facebook’s DNS were gone, making every site on a Facebook domain inaccessible. This presumably explains why its services and third-party login access, as well as Instagram/WhatsApp/Facebook Messenger, are completely down. 

Other networking companies have noticed and theorized the issue is with BGP routes, including Cloudflare SVP Dane Knecht, who tweeted an observation that Facebook DNS and other services are down and ‘their BGP routes have been withdrawn from the internet.’ He noted that Cloudflare also saw its own failures, but a follow-up tweet suggested it was recovering. Separately, Cloudflare CTO John Graham-Cumming tweeted seeing Facebook’s BGP changes as they happened and suggested they were mostly BGP route withdrawals.

This BGP de-routing might be a very severe issue that is harder to fix than a DNS error

BGP is a big (global) problem

While DNS is a website’s numerical address on the internet (which is translated from the ‘www.___.com’ you type in your search bar), BGP routes are the pathways that requests take through servers and computers to get to their destination. When Facebook’s BGP routes were reportedly withdrawn from the internet, sites connected to those routes (like Cloudflare above) saw them collapse, and Facebook sites and services become inaccessible. 

Internet theorizing on the r/sysadmin subreddit suggested that a configuration change happened this morning that caused the BGP routes to go down, and this cut Facebook off from making remote changes – from here on, only physical access could fix the damage (emphasized in a screenshot by Twitter user Andree Toonk). 

An aforementioned New York Times report supports this theory, citing an alleged internal Facebook memo that a small team of  employees was dispatched to the company’s Santa Clara, CA data center to manually reset the company servers.

Just before Facebook services started coming back online, Krebs cited a source in stating that the outage was caused by a faulty BGP update that blocked remote users from reverting changes while locking out local access: 

See more

Outages: a continuing problem?

“Outages are increasing in volume and can often point towards a cyber-attack, but this can add to the confusion early on when we are diagnosing the causes,” said Jake More, expert at cybersecurity and antivirus company ESET, in an emailed comment to TechRadar. “As we saw with Fastly in the summer, web-blackouts are more often originate from undiscovered software bug or even human error.”

March and April 2021 saw a similarly major outage where each of Facebook’s services affected today – Facebook, Instagram, WhatsApp, and Facebook Messenger – was down for over half an hour each time. But given how much faster those issues were resolved, the latest outage seems to be a catastrophe of a much higher magnitude. 

Those last outages were due to a bug in the Domain Name System (DNS) of these services, but seemingly not as severe as a BGP issue.