alphaspirit - Fotolia

Tip

Conduct a blameless postmortem and focus on the problem

Postmortems don't need to be about who's at fault for a failure. Rather than point fingers, focus on the bigger picture. Nobody wants ridicule, but everyone wants a solution.

The incident postmortem is a vital tool through which IT operations, development and other teams understand the root causes of issues, correct them and -- ideally -- learn from them. But we're all human, and the fear of punishment for mistakes can often prevent people from sharing information or from adopting and implementing corrections.

The blameless postmortem arose to address human discomfort with analysis and correction. When conducted constructively, a team uses blameless postmortem to find and fix root issues, regardless of the cause -- and without fear of individual repercussions.

Blameless postmortems bring benefits to the team and the business, but they can be challenging to conduct, and effectiveness isn't guaranteed. Let's look closer at the blameless postmortem, consider the benefits and weigh tactics that can improve its outcomes.

The incident postmortem process

Modern IT ecosystems are large and complex: This is true for systems and infrastructures, as well as enterprise software development. Faults and failures are inevitable, no matter how much planning, preparation, skill and expertise are invested in a technology's construction and management.

But every incident also yields a chance to learn. The incident postmortem -- sometimes called a post-incident review -- is a formalized process organizations use to review incidents, find vulnerabilities, identify root causes and mitigate future occurrences. When performed objectively, a postmortem brings a team together and boosts the quality of an organization's products or services.

But postmortems can also have a dark side.

Eliminate blame games

The postmortem is a detailed analysis that usually requires investigation of human decisions and actions. For example, the software crashed because a junior developer used the wrong constant in a particular algorithm -- or the company's database was accessed improperly because an admin didn't configure an application's security.

Benefits of a blameless postmortem

Humans are wired to place blame -- and nobody likes to be the target of that blame, for fear of ridicule, job security or future advancement. As a result, postmortems are sometimes seen as stinging personal attacks, which leaves individuals reluctant to share details or accept recommendations for correction. This struggle makes postmortems less effective and more personal -- and that damages team morale.

The goal of a blameless postmortem is to achieve all the benefits of an incident postmortem without any of the personal or professional stress that usually accompanies blame games. In theory, the tendency to place blame -- and the collateral damage that placing blame can cause to the involved team members -- is never in the organization's best interest.

The single overriding assumption in a blameless postmortem is that all team members did the best job they could with the information and skill sets they had at that time. Eliminating the fear and consequences of blame fosters more honest and detailed discussions that improve business efficacy and value. This requires team members -- especially team and business leaders -- to set blame aside and focus on problem resolution and improving outcomes in the future.

Of course, a blameless postmortem does not overlook human fault or responsibility. But it focuses more on understanding why the fault occurred, and how to improve or prevent the fault from repeating.

Root cause discussions lead to workflow improvements. This can include better reviews and collaborations with stakeholders for more complete software requirements, or better repository management, where algorithm libraries are reviewed and updated more frequently. These outcomes are far more beneficial to the business -- and subsequent software projects -- than castigating a programmer.

Blameless postmortem process example

A programmer uses an improper algorithm to make a calculation, which causes a software fault. Rather than spend the postmortem discussion on why the programmer used the wrong algorithm, a blameless postmortem focuses the discussion on a review of software requirements and communication with the software's stakeholders.

Without the focus on blame, the involved IT professionals might ultimately determine that the programmer used the wrong algorithm because the software's requirements were incorrect or incomplete, or they used an older or obsolete algorithm library unintentionally.

Steps to move beyond the blame

Blameless postmortems are easier said than done. If they're not conducted properly, the attitude of blame remains while participants talk awkwardly around it. This situation is just as stressful as name-calling, and rarely productive.

Organizations build the foundation for blameless postmortems through conscious and continual effort. Team and business leaders must cultivate a genuine community of mutual communication and collaboration, alongside support that recognizes responsibility. But rather than focus on assigning responsibility, the postmortem process looks past it to focus on creative and actionable improvements.

The single overriding assumption in a blameless postmortem is that all team members did the best job they could with the information and skill sets they had at that time.

There are some strategies and steps business and team leaders can take to help build and enhance a blameless culture.

Don't delay postmortems. The postmortem process is usually reserved for serious or critical issues, so timeliness is important. Hold postmortems within two to three days after an incident, while the details and observations are fresh in everyone's minds. Expect the team to be present and participate actively. The benefits of these procedures decrease as the time between incident and postmortem grows.

Use a consistent format. Follow a consistent format, with similar data sets, timelines and schedules -- such as time of day. This alleviates some uncertainty from the meeting and helps hold the focus on open discussions and information-sharing during the process. The resulting "muscle memory" in the meeting mechanics also frees attention to address and mitigate the tendency to blame.

Highlight actions, not blame. Blame not only impairs morale and communication, it also blocks productive solutions and organizational improvements. Even team members that blame themselves for an issue could overlook the deeper context and contributing factors that led to the issue.

Perhaps there was incomplete or inaccurate documentation, flawed instructions or a prior issue that only came to light by their actions. Recognize the human tendency to blame, and take deliberate measures to eliminate blame. Get teams to look past blame and focus on actionable solutions to systems, software and processes.

Don't fear failure at the organizational level. Blame usually starts from the top, and senior managers can foster a competitive culture that drives blame -- even inadvertently. Eliminate abrasive management styles and blame-based management criteria, or managers might ask team leaders to assign blame anyway.

Give people the space to be honest and complete in their discussions about errors -- it's the best path to understand and remediate errors. Improving organizational performance and outcomes is more important -- and intrinsically more beneficial -- to the business than assigning blame.

Approvals and next steps

Any postmortem should conclude with recommendations for fixes and improvements. Prepare and discuss recommendations objectively with decision-makers, including C-suite leaders, engineers, architects and other senior staff. Those leaders should also understand the issues and be able to discuss and sign off on recommendations that arise from the postmortem.

Dig Deeper on DevOps