maxkabakov - Fotolia
What to include in a post-DR-test after-action review
What should go into your organization's after-action review following a disaster recovery test? #CIOChat participants suggest what to include in the report and why.
After completing a disaster recovery test, organizations must closely examine their DR processes from start to finish -- the good, the bad and even the ugly. And in order to do so, proper documentation must be recorded throughout, and after, the DR testing process.
Participants in our recent SearchCIO DR-themed tweet jam said that disaster recovery testing can be fraught with error. Once organizations finally get through the actual testing phase, SearchCIO asked #CIOChat-ters, "What should be in a post-test after-action report?"
An initial response was succinct and simply put:
A3: what didn't work! need to examine failures to avoid them again and successes to repeat them #CIOchat
— Denise Dubie (@DDubie)
June 25, 2014
One participant suggested that an after-action review include stats on recovery point objective (RPO) and recovery time objective (RTO):
@searchCIO A3 Full breakdown of RTO by critical app, married to RPO goals. Gaps in resources, training & process improvements #CIOChat
— Mark Thiele (@mthiele10)
June 25, 2014
In disaster recovery (DR) and business continuity (BC), RPO defines the age of files that must be recovered from backup in order for normal business operations to resume following an incident. Once RPO is defined, it determines the necessary frequency of backup for an organization's systems. RTO refers to the maximum tolerable length of downtime for compromised systems.
Understanding the effectiveness of disaster recovery plans through careful after-action review examination will help determine both RPO and RTO, but that's not all. According to our tweet jammers, there is a long list of documentation to include in a DR test evaluation, and it's important to extract it as quickly as possible:
@searchCIO A3 Identifying problems in the test, including process, resourcing & technical issues. Use that info for improvement #CIOchat
— Ant Stanley (@IamStan)
June 25, 2014
@AndiMann #ciochat and get that report written ASAP, during any BC/DR event take notes for inclusion immediately after. Best time.
— Texiwill (@Texiwill)
June 25, 2014
.@Texiwill Yes, ideally nominate scribe(s) to take copious notes during, and write up after, No other role, so not distracted. #CIOChat
— Andi Mann (@AndiMann)
June 25, 2014
One #CIOChat-ter suggests DR testers consider splitting their reports into more focused sections for each department or tested system:
#ciochat A3: Multiple reports/sections w/ diff. audiences from a straight pass/fail down to every timing and glitch. Lots of stakeholders.
— Forvalaka41 (@Forvalaka41)
June 25, 2014
Echoing similar comments about potential sources of DR failure, a tweet jammer advised organizations to be honest about where they might be drawing unsubstantiated conclusions:
@searchCIO Q3 1:2 It's also critical to truthfully analyze whether assumptions were made about recovery success. #CIOchat
— Mark Thiele (@mthiele10)
June 25, 2014
With all that said, businesses tend to be less concerned with the specifics and more concerned with the bottom line. Hard numbers in an after-action report can bolster the case to higher-ups that DR testing is worth the investment:
@searchCIO A3: Potential cost of downtime in specific $$ terms. Best way to get sign off is to show mgmnt what they stand to lose #CIOChat
— Robert Payne (@phillybobpayne)
June 25, 2014
Demonstrating what there is to lose in system downtime -- and what can be gained in disaster recovery testing -- is an important aspect of after-action reporting. Is your organization focused on post-test reporting to the business? Is your after-action review and report as thorough as tweet jammers suggest it should be? Sound off below in the comments section.