Causes of IT outages explained
In this video, TechTarget editor Jamison Cush talks about the causes of IT outages.
The CrowdStrike outage ... it was a software fault, right?
An IT outage occurs when an organization's resources, services or applications become unavailable.
- Resources include storage, or computer and network elements.
- Services include firewalls, logging tools and even operating systems.
- Applications can span a wide range, including web servers, databases, CRM/ERP, HR and other business platforms.
An outage in any of these elements can render that element unavailable and result in downtime, which reduces (even stops) the organization's ability to operate.
As experienced with the CrowdStrike outage, downtime can have dramatic impacts on the business, and catastrophic impacts on users that depend on the business. In some cases, downtime can pose life-threatening consequences.
What causes IT outages?
- Utility disruptions. Power and network utilities can be disrupted by acts of nature, war, terrorism, human error and malfeasance.
- Hardware faults. Data centers rely on a ton of physical equipment, including servers, storage devices, networking gear and varied hardware devices (such as hardware firewalls). A failure here can have cascading effects.
- Configuration errors. Every element within an IT environment involves a configuration. Simply stated, a configuration is a collection of details and parameters (usually stored in an accessible file) that carefully specifies the setup of every IT resource, service and application -- as well as any relationships or dependencies that exist between those elements. It can be very complex, and an error or oversight in any configuration can lead to an outage.
- Software faults. Even with extensive testing and validation, software services and applications can experience unexpected defects (bugs) that cause the software to crash or behave in undesirable ways. This -- and not a configuration error -- was at the heart of the CrowdStrike outage. The CrowdStrike software's ability to catch improper content or sensor signals was compromised, allowing the crash.
How can IT outages be prevented?
IT outages have been happening since the first computers. But the consequences of outages and downtime have steadily become more severe as more resources, services and applications become increasingly connected and interdependent, as we learned with the CrowdStrike outage.
This is why it's imperative for organizations to develop and deploy mitigation strategies, including the following:
- Redundancy and resilience.
- Data backups.
- Configuration management.
- Extended software testing.
- Software rollbacks.
- And disaster planning and training.
How have IT outages affected you? Please leave your comments below, and don't forget to like and subscribe.
Jamison Cush is a senior executive editor overseeing YouTube video production, definitions and feature content for WhatIs.com and TechTarget.