Free1970 - stock.adobe.com

Users note testing, analysis as disaster recovery best practices

Administrators at VeeamON 2024 emphasize testing, having a business impact analysis and gaining support from management as key disaster recovery guidance.

HOLLYWOOD, Fla. -- On April 12, 2023, the city of Fort Lauderdale, Fla., was supposed to get about three inches of rain. In six hours that day, it received 26 inches.

In the aftermath of the 1,000-year flood, Fort Lauderdale CIO Tamecka McKay has tried to make the best of a crisis, developing and implementing several disaster recovery best practices.

"We use that opportunity to help the business units understand the role that they need to play in disaster recovery and business continuity," McKay said in an interview with TechTarget Editorial at the VeeamON user conference. "It is now critical that the business and IT are aligned and develop disaster recovery and business continuity plans."

Continuous testing, for example, was one of those disaster recovery best practices echoed by several IT admins at VeeamON.

Recovering from the storm

No one died in the storm, but Fort Lauderdale fielded 800 rescue calls and made 600 rescues that day.

City Hall sustained irreparable damage. Not only did it lose power, but its generator also exploded. When officials came back to the building the next day, they found 8 feet of water in the basement, which contained the HVAC, elevator and electrical systems. The city plans to complete demolition of the building this year.

"We never expected that we would lose City Hall," McKay said.

The servers were on the sixth floor, but authorities had to physically move them to another data center where the city had its emergency operations center. Within about 72 hours, the city had to rewire its network and reconfigure the systems that moved to the new data center.

Tamecka McKay, CIO, City of Fort LauderdaleTamecka McKay

"Within about a week or so, we had probably about 70% of the services back online," McKay said. "I'm very thankful for the team that I had and our ability to recover, and literally migrate a data center in a matter of days, which normally takes months -- months of planning and execution."

Fort Lauderdale, home to about 200,000 residents, had been using both Veeam and Veritas NetBackup for its backup. The city has since switched to using just Veeam, McKay said.

"One of my goals was to centralize and modernize the backup and disaster recovery strategy, which was scheduled to happen in [2023]," McKay said. "Of course, after the flood, we were able to leverage that momentum and leverage the position of folks in the organization who had just recently experienced the pain of an outage and not having access to their data, to expedite that plan."

McKay said her team learned the need for better documentation and organization. To help move forward, a third party performed a business impact analysis. As a result, the city got a clearer picture of recovery time objectives and recovery point objectives.

"Obviously, you can never be 100% prepared for an emergency," McKay said. "But having that information on hand and using that as part of your playbook and prioritizing recovery, I think that's going to make a world of difference in the event that we have another one-in-1,000-year flood."

McKay said she sees a business impact analysis as one of the most important disaster recovery best practices.

It's not a matter of if, it's when. And what is your playbook to recover?
Tamecka McKay CIO, Fort Lauderdale

"Having that third party come in, it can ask the right questions without any bias or any prejudice," McKay said. "Get that information and use that to help the business units understand, 'This is what this means, and this is the impact in the event of a disaster. Is that acceptable to you?'"

In addition, every year the city performs a full-scale exercise that mocks what would happen in a disastrous hurricane. McKay said she wants the city to do a similar exercise involving a ransomware attack.

"It's not a matter of if, it's when," McKay said. "And what is your playbook to recover?"

Testing, testing

Paul Sylvester, head of IT at Ultra Energy, said his company runs disaster recovery scenarios every quarter. Organizations need to make sure their equipment works, especially as infrastructure continually changes.

"Test, test and test again," Sylvester said in a breakout session at VeeamON. "Don't be afraid to push the product."

Ultra Energy, a global engineering firm with a focus on providing safety systems for the nuclear industry, uses Veeam for backup and recovery.

"Even though Veeam is a fantastic product, the results are not always what you expect," Sylvester said.

For example, a failback test estimated to take two hours actually took nine. Sylvester noted that Veeam doesn't need to be as fast with failback, but the test taught him that his team should do that kind of testing on the weekend.

Tracey Bruner and Jerrick Spencer, system administrators for the Navajo Tribal Utility Authority, said in an interview with TechTarget Editorial that they also do quarterly testing. Bruner said they restore to a certain point to make sure everything runs as it should for the provider of utility services to the Navajo Nation, which spans 27,000 square miles across New Mexico, Arizona and Utah.

Further disaster recovery best practices include the importance of documenting tests and ensuring there is a failback plan, Spencer said.

Importance of company support for backup, DR

Sylvester also recommended getting buy-in from management regarding IT environment upgrades.

"Don't just meet with the IT guys, meet with the stakeholders," Sylvester said. "They need to understand the impact."

Sylvester oversaw a major storage, backup and recovery system overhaul, which included shifting from an old Veeam system to a new one, and adding an HPE StoreOnce appliance for the backup data and HPE Nimble Storage.

In speaking with management, mentioning ransomware helps, said Jochem de Zeeuw, a governance and advice specialist at marine contractor Van Oord in the Netherlands.

"Investing in backup is not typically [what] people want to do," de Zeeuw said at a VeeamON breakout session. "With ransomware, that became a bit different, because people started to see value in the product."

Paul Crocetti is an executive editor at TechTarget Editorial. Since 2015, he has worked on TechTarget's Storage, Data Backup and Disaster Recovery sites.

Dig Deeper on Disaster recovery planning and management