imageteam - Fotolia

Tip

Initiative aims to improve data center incident reporting

The Data Center Incident Reporting Network hopes to pull back the smoke screen on software and hardware issues to improve operations and reduce downtime risks -- with admins help.

As the data center industry matures and becomes the custodian of increasingly critical systems, organizations must adopt a more open and transparent view when it comes to incident reporting. Sharing data center incident reporting information can increase industry security, foster trust with consumers and troubleshoot incidents more easily.

The Data Center Incident Reporting Network (DCIRN) is an independent, voluntary, confidential reporting program for data center admins and managers to improve data center services' safety and reliability.

Through a confidential and anonymous system, admins can report failures and significant incidents to address the goal of improving the reliability of data centers globally. The DCIRN aggregates these reported issues and shares it with industry professionals.

Why share incident information?

In sharing incident information with the DCIRN, admins help the data center industry understand what problems occur, when they occur and how to avoid them in the future. Traditionally, organizations kept incidents secret behind intellectual property and vendor agreements and often solved any problems one at a time when someone reported the issue.

Don Carless, a past member of the DCIRN executive and technical authority, told the DCiRN why incident reporting should be open and shared across the data center industry.

After reporting a problem to a manufacturer, Carless received a firmware patch to fix the issue. The patch solved the problem, but the process could've been avoided entirely if the manufacturer sent the patch when it discovered and solved the problem.

Making incident reports and downtime information public can help data center operators and staff improve system maintenance.

Carless asked the manufacturer why he hadn't automatically received the patch and if there were any other patches he should know about. The manufacturer stated that they only sent patches to customers who experienced the issue.

Instead of helping customers achieve positive outcomes, it seemed the manufacturer was only concerned with its reputation and market perception, Carless wrote.

This data center incident reporting viewpoint is "a short-term and irresponsible attitude and not necessarily in the best interest of our industry," he stated.

Get smarter about downtime with data center incident reporting

Data centers can house healthcare information and smart city infrastructures that manage traffic lights, power consumption and emergency communications, as well as high-density data such as photo and music backups and video game graphics.

With data center infrastructure supporting all of these use cases and cloud-based connectivity, outages and downtime quickly becoming public knowledge, there's no sense to hide them within individual workflows and make it harder for admins to fix downtime issues.

Any outage is costly, from both a financial and reputational perspective; Gartner cites an average downtime cost of $5,600 per minute. From a customer viewpoint, increased downtime can lead to dissatisfaction, poor reputations and broken contracts.

How digital diversity affects data centers

The notion of digital diversity also affects the way organizations run and manage infrastructure. Admins must manage everything from outsourcing and SaaS vendor relationships to cloud setups, architectures and integrations, and tie it all together to support business outcomes.

As Carless explained, a missing firmware patch can have a more significant effect on a data center and the businesses it supports, especially if the patch can prevent an incident.

Given the increased trend of as-a-service offerings, this awareness gap in data center incident reporting causes future problems. Admins need access to all the necessary information in order to improve their management regimes and practices. As organizations increase private sector use cases, keeping the underlying infrastructure updated and running smoothly is non-negotiable.

Use DCIRN to address downtime

Making incident reports and downtime information public can help data center operators and staff improve system maintenance. There are three main benefits to participating in a program such as DCIRN.

DCIRN helps admins stay up to date on what's going on globally and identify potential hot spots and issues before they become a problem in the data center.

Organizations can use DCIRN's incident timelines as a baseline for their own health checks. These timelines can help admins ensure that their data center management protocols actively use and encourage tech best practices that increase service reliability and availability.

Open and public data center incident reporting can demonstrate to customers that the organization operates within accordance to the latest standards and maintains its systems with best practices beyond the status quo. For businesses that look for a marketplace differentiator, regular data center reporting practices might just be a positive step.

Dig Deeper on Data center ops, monitoring and management