Sergey Nivens - Fotolia

How to develop resilient IT operations for the COVID-19 era

To meet pandemic-based business demands, enterprises can take these five steps to transform their existing technology platforms. Wipro exec Murthy Malapaka explains the process.

The significant impact of COVID-19 has driven enterprises to embark on an unprecedented transformation journey. The pandemic has also resulted in a substantial shift in employees working from home, a model that was previously unorthodox for most companies.

During this transition, organizations have had to reconfigure the operation processes, technology foundation and employee trainings -- rolling it all out immediately, leaving much to be desired in terms of quality, efficacy and reliability. One key reason for continued inefficiency is that underlying technology platforms, while reconfigured for the pandemic's demands, remain the same.

To embrace the pandemic's business environment and meet the demands of IT budget cuts, CIOs and chief experience officers must consider transforming their existing technology platforms.

The following overview offers insight into how organizations can undertake such a transformation to achieve resilient IT operations.

Step 1: Make infrastructure 'invisible'

Applications enable business. Consequently, businesses must make a conscious effort to render IT infrastructure "invisible."

Invisible infrastructure refers to the practice of designing IT operation architecture based on application and business services, rather than focusing on tower-based models like server, network and storage.

Traditionally, infrastructure is architected in towers such as a data center, end-user computing, service desk and so forth. Each of these is wrapped in numerous metrics -- such as mean time to repair, mean time between failures, first-call resolution and average handle time -- that are often myopic at best. These metrics alone do not serve any useful purpose unless they are viewed within the context of business metrics or KPIs -- for example, number of orders, invoices or fulfillment failures. Organizations can connect these business metrics to underlying technology or tower service metrics to build new operational models, which a digital operations dashboard can show in real time. The dashboard can also identify anomalies.

These infrastructure metrics need to be made invisible and aggregated to serve a higher cause, so they are more understandable by the business organization.

Step 2: Consolidate operations into a single platform

The next step is to consolidate operations into one platform that drives the service lifecycle, enabling composability of services.

Global IT spending forecast chart
Global IT spending forecast at a glance

Whether it is new software development, bug fixes or enhancements provided by various application support teams or vendors, these offerings are typically set up as standalone services on a separate set of platforms. They need to be tightly integrated with the existing IT service management platforms, however, with all collaboration platforms around operational support integrated into a unified service management platform.

Additionally, there needs to be a "single operating model" or a run platform that enables a digital command center, which should show the enterprise health across any service at any given point of time. This is not a single pane of glass or unified console that is built on aggregation, but instead a true representation of integrated service state.

Step 3: Extend and integrate the single operating model

The next step is to prepare that single operating model through extensions and integrations, while also enhancing capabilities of the existing technology platform.

To determine the current state of integration of services start with a simple evaluation of the following:

  • Application workload impact on infrastructure
  • Infrastructure workload impact on application
  • Standalone application workload
  • Standalone infrastructure workload

This simple evaluation helps determine how far an organization is with respect to true integration. Once the evaluation is completed, look at the possibilities of integration through platforms in the present environment and consider extending them for the purpose of integrating them.

There is a lot of possibility and potential inherent in the platforms and technology investments that organizations have already made.

Recent developments in technology platforms, many sparked by acquisitions, demonstrate how they are driving tight integration of application services with invisible infrastructure services using infrastructure as code; scripting languages that allow self-service mechanisms to define and provision data center infrastructure; and a PaaS-based approach accompanied by software provisioning, configuration management and application-deployment tools.

One notable example of this was VMware's acquisition of Pivotal in January 2020. Pivotal offers PaaS that can show the platform metrics in operations rather than tower-based metrics. Supported by Terraform and Ansible, the technology platform can make infrastructure invisible through scripts and automation.

Step 4: Reskill engineering talent to benefit from platforms

This kind of tightly integrated technology platform will provide opportunities to deploy personnel with full-stack engineering skills. Such technicians need to have end-to-end software development skills and expertise in infrastructure as code. In addition, these emerging technology platforms will let organizations take advantage of site-reliability engineering design principles.

Build this technology platform as a pure private cloud, a pure public environment using the respective platform-provided native tools or using a hybrid model.

Step 5: Move operations management from fault to anomaly

When it comes to application operations, every possible tool and platform has announced support of AI and machine learning to enable real-time streaming analytics. This allows application behaviors to be learned and built-in real-time predictive analytics to enable predictive management.

This is a critical time for IT operations to make a fundamental shift in mindset from fault to anomaly. If we start moving from incidents to situations enabled by anomaly detection, a new-age digital operation command center will evolve to support the following:

  • Lean, purpose-oriented support teams with on-demand escalation
  • An intelligent and integrated service desk
  • The encouragement of reliability engineering with less toil and more engineering
  • Higher level of fault avoidance
  • Real-time visibility of enterprise health

In conclusion, there is a lot of possibility and potential inherent in the platforms and technology investments that organizations have already made. They can be realized through a set of transformations. Doing so will allow organizations to build and utilize a resilient IT operations platform now, which can enable true consolidation and cater to unprecedented demands.

About the author
Murthy Malapaka is CTO of cloud and infrastructure services, North America, at
Wipro Ltd. Malapaka has nearly three decades of experience in the technology space and has held various technology leadership positions across application and infrastructure architecture domains, specializing in service availability and reliability. In his current role, Malapaka helps enterprises digitally rearchitect their operations to run on multi-cloud IT environments.

Next Steps

The importance of building an agile and resilient infrastructure

Dig Deeper on MSP business strategy