demonishen - Fotolia
How to handle service downtime in the cloud age
You know what happens when you assume. Don't set and forget those critical IT infrastructure services in the cloud without putting together a contingency plan.
As we all know, the cloud is not a cure-all. What happens when you have a service outage because your cloud went offline or your internet provider experiences issues?
If your organization depends on SaaS services, with some planning, you can weather the storm with IT infrastructure backup servers that can help you to interact with customers, answer questions and attempt some level of business functionality. When you lose the connection to the cloud or a service, it shouldn't be the end of your business work. You can set up a safety net in the form of backup virtual machines for the essential services that have moved to the cloud.
With so much focus today on the cloud, it's almost impossible to think about what would happen if it wasn't there. Azure, AWS and all the major cloud providers have measures in place to prevent them from ever fully going offline. While that is ideal for the cloud and cloud vendors, that doesn't mean you can always connect to that cloud. A DDoS attack against your location or internet provider could prevent you from getting access to the cloud. Something as simple as a backhoe that cuts through cables near your facility can remove that cloud connectivity in the most non-technical method possible. So, while the cloud might not go down, your connection to it might. How do you cope in that sort of situation?
Take stock of your application inventory
One of the fundamental questions you should ask is: where are your applications? If you run SaaS-based applications for Office, CRM, sales and just about everything else, then you have to come up with a backup to these cloud services. Having a working IT infrastructure without working applications does not help the business. One of the challenges with many SaaS-based applications is that they don't support an offline mode.
Some applications such as Office 365 -- depending on your licensing -- allow local installations, which is ideal so long as your files remain local as well. That is where we get into the challenges: each time you address one piece on-site or in the cloud, then something else comes up. It's not often that we map out what it takes to do a specific task, because we assume the interconnected pieces will always work. That lack of foresight puts your business in a dangerous situation.
There are limits to what an emergency backup system can do
So, this brings up the question of how functional would you want your staff to be in the event of an outage? It's not realistic to be fully functional in a service downtime situation; technically, anything is possible with unlimited resources. Paying for duplicate infrastructure and SaaS services can be done, but that would most likely be inefficient from a cost and coordination standpoint.
Instead, start with simple aspects such as email, documents and the desktop. The first hurdle to surmount is the ability to log in. The Windows OS almost requires it to be connected to the internet. Try disconnecting your desktop and powering it up; even on a home machine, the amount of lag will make the system crawl as it times out while trying to log in and find all the connected services. Unlike home machines with local accounts, when you have a domain then you need locally stored login credentials or you're not getting in. Now local accounts and cached credentials are not exactly security best practices, but you need to balance security with the need to get employees working. Most laptop users won't suffer through this since they are usually set up to work offline, but it takes additional steps that are not traditionally done for desktops.
If you can log in, what about the applications? If you enabled offline access in Outlook, then you will have access to email that was pulled down before the outage. The same can be said if you use OneDrive and had it sync before you lost connection. Simple offline email access, file shares and printing might be all you're limited to, depending on the infrastructure, but that is better than having your staff staring at their desktops because they can't access their data.
Look into ways to back up IT infrastructure services
You need to evaluate how much of the networking and infrastructure services -- specifically domain name services, internet protocol, file and print services -- you want to keep in the data center. While it's possible to put most of these infrastructure services in the cloud, you should consider backing up these services in reserve as virtual machines.
Unless you have moved everything offsite and have no on-premises resources, you could use a few Hyper-V hosts for those workloads that can't move to Azure. What prevents you from having backup infrastructure servers as powered-down virtual machines on your Hyper-V hosts? They don't need to be on shared storage; local storage would work just fine for these emergency servers.
The cloud comes with so many offerings; it's next to impossible to ignore everything it can do. But part of the challenge with this migration is to set up backup servers for the services that have moved into the cloud. For some services, such as Active Directory, it can be a bit trickier to keep them in cold storage, but it is possible.
It's important to maintain these backup infrastructure servers. It can't be an "install once and forget it" situation. Update them several times a year to ensure, should you need it, your backups can fill the void when a connectivity issue arises.
It helps to note when changes occur with key systems, such as DNS, DHCP, and other network servers. How often do you create additional DHCP scopes? During this review process, you might find much of the infrastructure is more static than you realized, which helps when you're formulating a plan to keep backup infrastructure up to date. These backups aren't meant to be a straight swap, but rather something to give you internal networking and some additional functionality even when you can't get traffic outside your building or it is limited in some fashion. It's not meant to be perfect, but something to keep the business going while repairs are under way.