The CrowdStrike fail and next global IT meltdown already in the making

In this article

CRWD

When computer screens went blue worldwide on Friday, flights were grounded, hotel check-ins became impossible, and freight deliveries were brought to a stand-still. Businesses resorted to paper and pen. And initial suspicions landed on some sort of cyberterrorist attack. The reality, however, was much more mundane: a botched software update from the cybersecurity company CrowdStrike.

“In this case, it was a content update,” said Nick Hyatt, director of threat intelligence at security firm Blackpoint Cyber.

And because CrowdStrike has such a broad base of customers, it was the content update felt around the world.

“One mistake has had catastrophic results. This is a great example of how closely tied to IT our modern society is — from coffee shops to hospitals to airports, a mistake like this has massive ramifications,” Hyatt said.

In this case, the content update was tied to the CrowdStrike Falcon monitoring software. Falcon, Hyatt says, has deep connections to monitor for malware and other malicious behavior on endpoints, in this case, laptops, desktops, and servers. Falcon updates itself automatically to account for new threats.

“Buggy code was rolled out via the auto-update feature, and, well, here we are,” Hyatt said. Auto-update capability is standard in many software applications, and isn’t unique to CrowdStrike. “It’s just that due to what CrowdStrike does, the fallout here is catastrophic,” Hyatt added.

The blue screen of death errors on computer screens are viewed due to the global communications outage caused by CrowdStrike, which provides cyber security services to US technology company Microsoft, on July 19, 2024 in Ankara, Turkey.

Harun Ozalp | Anadolu | Getty Images

Even though CrowdStrike quickly identified the problem, and many systems were back up and running within hours, the global cascade of damage isn’t easily reversed for organizations with complex systems.

“We think three to five days before things are resolved,” said Eric O’Neill, a former FBI counterterrorism and counterintelligence operative and cybersecurity expert. “This is a bunch of downtime for organizations.”

It did not help, O’Neill said, that the outage happened on a summer Friday with many offices empty, and IT to help to resolve the issue in short supply.

Software updates should be rolled out incrementally

One lesson from the global IT outage, O’Neill said, is that CrowdStrike’s update should have been rolled out incrementally.

“What Crowdstrike was doing was rolling out its updates to everyone at once. That is not the best idea. Send it to one group and test it. There are levels of quality control it should go through,” O’Neill said.

“It should have been tested in sandboxes, in many environments before it went out,” said Peter Avery, vice president of security and compliance at Visual Edge IT.

He expects more safeguards are needed to prevent future incidents that repeat this type of failure.

“You need the right checks and balances in companies. It could have been a single person that decided to push this update, or somebody picked the wrong file to execute on,” Avery said.

The IT industry calls this a single-point failure — an error in one part of a system that creates a technical disaster across industries, functions, and interconnected communications networks; a massive domino effect.

Call to build redundancy into IT systems

Friday’s event could cause companies and individuals to heighten their level of cyber preparedness.

“The bigger picture is how fragile the world is; it’s not just a cyber or technical issue. There are a ton of different phenomena that can cause an outage, like solar flares that can take out our communications and electronics,” Avery said.

Ultimately, Friday’s meltdown wasn’t an indictment of Crowdstrike or Microsoft, but of how businesses view cybersecurity, said Javad Abed is an assistant professor of information systems at Johns Hopkins Carey Business School. “Business owners need to stop viewing cybersecurity services as merely a cost and instead as an essential investment in their company’s future,” Abed said.

Businesses should be doing this by building redundancy into their systems.

“A single point of failure shouldn’t be able to stop a business, and that is what happened,” Abed said. “You can’t rely on only one cybersecurity tool, cybersecurity 101,” Abed said.

While building redundancy into enterprise systems is costly, what happened Friday is more expensive.

“I hope this is a wake-up call, and I hope it causes some changes in the mindsets of the business owners and organizations to revise their cybersecurity strategies,” Abed said.

What to do about ‘kernel-level’ code

On a macro level, it is fair to assign some systemic blame within a world of enterprise IT that often views cybersecurity, data security, and the tech supply chain as “nice-to-have things” instead of essentials, and a general lack of cybersecurity leadership within organizations, said Nicholas Reese, former Department of Homeland Security official and instructor at New York University’s SPS Center for Global Affairs.

On a micro level, Reese said the code that caused this disruption was kernel-level code, impacting every computer hardware and software communication aspect. “Kernel-level code should get the highest level of scrutiny,” Reese said, with approval and implementation needing to be entirely separate processes with accountability.

That’s a problem that will continue for the entire ecosystem, awash in third-party vendor products, all with vulnerabilities.

“How do we look across the ecosystem of third-party vendors and see where the next vulnerability will be? It is almost impossible, but we have to try,” Reese said. “It is not a maybe, but a certainty until we grapple with the number of potential vulnerabilities. We need to focus on backup and redundancy and invest in it, but businesses say they can’t afford to pay for things that might never happen. It’s a hard case to make,” he said.

The CrowdStrike fail and next global IT meltdown already in the making

Software updates should be rolled out incrementally

How Facebook Marketplace is keeping young people on the platform

After Trump pulled NASA nomination, Musk ally Jared Isaacman says stint in politics was ‘thrilling’

Indian mobile giant Airtel raises $1 billion for data centers from Carlyle, other PE firms

House committee asks Microsoft’s Brad Smith to attend hearing on security lapses

Call to build redundancy into IT systems

What to do about ‘kernel-level’ code

Memory crisis hits such extremes that ‘even Apple can’t be safe’

Apple’s upcoming price hikes are good for the company (not so good for consumers)

Allbirds continues AI pivot with name change and CEO hire, sending stock soaring

China’s huge OLED screen factory is finally rolling at full speed â and I’m excited about what this means for cheaper OLED monitors and laptops

Think you know Pikachu’s world? Prove it by acing our 30-question Pokémon quiz to celebrate the franchise’s 30th anniversary

Sennheiser just entered the cuff-style earbuds space like a wrecking ball, but I think the company’s trying to fix a problem that no longer exists

Memory crisis hits such extremes that ‘even Apple can’t be safe’

Microsoft warns AI agents are being ‘AutoJack’-ed to deliver RCE payloads by browsing untrusted websites

Apple users told to watch out for ‘unpatchable’ iPhone security issues – here’s what we know