Big social media is still in big trouble: Instagram, WhatsApp, Facebook and Facebook Messenger experienced widespread outages Monday morning, and many users around the world can’t use some of the most popular sites – Facebook’s services. While this isn’t unknown, the length of the outage, spanning over four hours at the time of this writing, is very rare – and it’s not clear when they’ll be fully back online.
Facebook.com seems to be coming back for some users, which started with the company’s Newsroom and seems to be extending to the full site and social network. But Instagram and other services remain down.
DownDetector – a website that tracks outages of online services – had shown that all services were struggling in many territories. There had also been reports that services with accounts tied to Facebook logins, like Airbnb and Strava, are not working.
There’s currently no way to avoid the issues, so you’ll just have to wait until they’ve been solved to reconnect to WhatsApp, Instagram or Facebook. The company claims it has been working on a fix, though it’s unclear when they’ll be resolved or which will be fixed first.
Facebook sites may be starting to come back online – users can access Facebook Newsroom, which is likely where we’ll hear about the outage from the company itself soon – but as of this writing, Instagram, WhatsApp, and other major services remain down.
We’ve yet to hear an official reason for the outage, but we’ll update here when we have more definitive answers from Facebook itself. Analysts have speculated that the outage was caused by severe networking issues, which we’ve detailed below.
Facebook’s communications executive, Andy Stone, was the first to share an update on Twitter to say the company is aware of the issues and it’s currently working on a fix. Shortly thereafter, Facebook’s official account acknowledged that users were having trouble accessing the company’s apps and products:
We’re aware that some people are having trouble accessing our apps and products. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.October 4, 2021
WhatsApp’s Twitter account also shared a similar statement on Twitter, beating Facebook’s account to acknowledging the issue by minutes.
But as the outage headed into its fourth hour, Facebook CTO Mike Schroepfer tweeted an apology that “teams are working as fast as possible to debug and restore as fast as possible.” It should be noted that Twitter is the only platform Facebook is able to use to reach out publicly, as the company’s official Newsroom page is down as well.
The issue may affect other Facebook products, too: some users have also reported issues with the company’s Oculus virtual reality gaming services. Noted Facebook and Twitter data miner Jane Manchun Wong warned users via tweet not to restart their Oculus devices during the outage lest they lose their games. VR game and software designer Julien Dorra tweeted a video of what it’s like to load up Oculus in the midst of the outage:
Facebook brought Oculus down with them 🙁 pic.twitter.com/rfapj1yaSUOctober 4, 2021
And the outage might have affected Facebook’s real-world infrastructure as well: according to a tweet by New York Times reporter Sheera Frenkel, a Facebook employee reportedly can’t even enter company buildings due to malfunctioning badges. Another NYT report claims employees struggled to make calls from work-issued phones or receive emails from outside the company.
Facebook outages: what’s going on?
None of the Facebook, Whatsapp, or Instagram accounts have explained what originally caused the outage, leading to speculation and analysis. At this point, most agree that this isn’t a hack or directed attack on Facebook’s infrastructure, and sources have told the New York Times that it probably wasn’t a cyberattack because ‘one hack was unlikely to affect so many apps at once.’
Instead, evidence shows the company’s network paths to the outside web just disappeared without explanation this morning.
Brian Krebs of cybersecurity firm Krebs on Security tweeted his conclusion that the domain name system (DNS) records routing traffic to Facebook sites and services were simply withdrawn – as in, gone from the web – this morning:
Confirmed: The DNS records that tell systems how to find https://t.co/qHzVq2Mr4E or https://t.co/JoIPxXI9GI got withdrawn this morning from the global routing tables. Can you imagine working at FB right now, when your email no longer works & all your internal FB-based tools fail?October 4, 2021
In a follow-up tweet, Krebs clarified with his belief that the border gateway protocol (BGP) routes serving Facebook’s DNS were gone, making every site on a Facebook domain inaccessible. This presumably explains why its services and third-party login access, as well as Instagram/WhatsApp/Facebook Messenger, are completely down.
Other networking companies have noticed and theorized the issue is with BGP routes, including Cloudflare SVP Dane Knecht, who tweeted an observation that Facebook DNS and other services are down and ‘their BGP routes have been withdrawn from the internet.’ He noted that Cloudflare also saw its own failures, but a follow-up tweet suggested it was recovering. Separately, Cloudflare CTO John Graham-Cumming tweeted seeing Facebook’s BGP changes as they happened and suggested they were mostly BGP route withdrawals.
This BGP de-routing might be a very severe issue that is harder to fix than a DNS error, and other analysts were quick to point out how this might have happened.
BGP is a big (global) problem
While DNS is a website’s numerical address on the internet (which is translated from the ‘www.___.com’ you type in your search bar), BGP routes are the pathways that requests take through servers and computers to get to their destination. When Facebook’s BGP routes were reportedly withdrawn from the internet, sites connected to those routes (like Cloudflare above) saw them collapse, and Facebook sites and services become inaccessible.
Internet theorizing on the r/sysadmin subreddit suggested that a configuration change happened this morning that caused the BGP routes to go down, and this cut Facebook off from making remote changes – from here on, only physical access could fix the damage (emphasized in a screenshot by Twitter user Andree Toonk).
An aforementioned New York Times report supports this theory, citing an alleged internal Facebook memo that a small team of employees was dispatched to the company’s Santa Clara, CA data center to manually reset the company servers.
Outages: a continuing problem?
“Outages are increasing in volume and can often point towards a cyber-attack, but this can add to the confusion early on when we are diagnosing the causes,” said Jake More, expert at cybersecurity and antivirus company ESET, in an emailed comment to TechRadar. “As we saw with Fastly in the summer, web-blackouts are more often originate from undiscovered software bug or even human error.”
March and April 2021 saw a similarly major outage where each of Facebook’s services affected today – Facebook, Instagram, WhatsApp, and Facebook Messenger – was down for over half an hour each time. But given how much faster those issues were resolved, the latest outage seems to be a catastrophe of a much higher magnitude.
Those last outages were due to a bug in the Domain Name System (DNS) of these services, but seemingly not as severe as a BGP issue.