recovery time objective (RTO)
What is recovery time objective?
The recovery time objective (RTO) is the maximum tolerable length of time that a computer, system, network or application can be down after a failure or disaster occurs.
The RTO is a function of the extent to which the interruption disrupts normal operations and the amount of revenue lost per unit time because of the disaster. These factors, in turn, depend on the affected equipment and application(s).
An RTO is measured in seconds, minutes, hours or days. It is an important consideration in a disaster recovery plan (DRP).
Once an organization has defined the RTO for an application, administrators can decide which disaster recovery (DR) technologies are best suited to the situation. For example, if the RTO for a given application is one hour, redundant data backup on external drives may be the best solution. If the RTO is five days, then tape or off-site cloud storage may be more practical.
Numerous studies have been conducted in an attempt to determine the cost of downtime for various applications in enterprise operations. These studies indicate the cost depends on long-term and intangible effects, as well as immediate, short-term or tangible factors.
How do you calculate RTO?
Calculating recovery time objective is a multistep process that needs to be considered from several different viewpoints, including business impact analysis (BIA), DR strategy and business continuity planning. The key goal of an RTO is to determine what duration of time it will take in a recovery process after a major incident to resume normal business operations.
The first step in the RTO process is to completely inventory all systems, business-critical applications, virtual environments and data. Without an accurate inventory, there is no way to accurately determine an RTO.
After completing the inventory, the next step is to evaluate the value of each service and business-critical application in terms of how much it contributes how a company operates and conducts business. That value should be determined based on duration of time and at as granular a level as possible. The value of the application can also be linked to any existing service-level agreements, which define how available a service needs to be and may include penalties if those service levels are not met.
By understanding what is running and what the value is of all the running systems and applications, it becomes possible to calculate RTO. Keep in mind, however, there can be different RTO requirements based on application priority as determined by the value the application brings to the organization.
Calculating RTO requires determining how quickly the recovery process for a given application, service, system or data needs to happen after a major incident based on the loss tolerance the organization has for that application, service, system or data as part of its BIA. Defining the loss tolerance involves how much operational time an organization can afford (or is willing) to lose after an incident before normal business operations must resume.
Examples of RTO
Based on the BIA for an application or service outage, the objective set for a recovery time objective can be variable. For example, mission-critical applications will have lower RTO, while less critical services will often have a higher RTO, as the duration of time for an outage -- and the associated loss tolerance -- will be higher.
Here are some RTO examples:
- Transaction/financial services. These are among the most critical services where RTO is as close to zero as possible (while RTOs in other areas can reach as high as several hours).
- Email. While email is a critical service for many, it can have an RTO of up to four hours for recovery. Email outages don't always directly correlate to lost revenue, as is the case when financial services go down.
- Printer services. Having a printer go offline or be unavailable is an inconvenience that might result in financial losses. These losses will likely be significantly less than those incurred during a financial services outage or even when email is disrupted. In some cases, RTO for print servers can be as high as 24 hours.
RTO in DR planning
Defining RTO is a critical component of a DRP, as the goal of disaster recovery is to have a strategy in place that helps the business recover and restore normal business operations. With an RTO in place as a top-level goal, an organization can align its data backup and failover policies and have the required level of additional services available for deployment to ensure the desired speed of recovery can, in fact, be achieved.
Without an RTO, a company won't know speed of recovery after a major incident or data loss event. Disaster recovery planning is about being prepared for unexpected outages, and being prepared requires having some idea -- or a plan to know -- how long it will take to recover.
As part of the DR planning process, organizations should have a clear business continuity plan in place where the business has a defined set of objectives. These objectives should include the RTO and what is called the recovery point objective (RPO) to help ensure an expected rate of recovery.
RTO vs. RPO: What's the difference?
While recovery time objective and recovery point objective are both core components of DR and business continuity planning, each serves a different and distinct purpose, however.
- Recovery time objective is about having policies and technologies in place that enable an organization to recover within a certain duration of time.
- Recovery point objective, by contrast, is about making sure, ahead of time, that the data recovery and backup capabilities are in place to minimize the amount of data that could be lost during an incident.
RTO and RPO work together to return an organization to normal business operations. They define the business impact based on the duration of time it takes to restore services, the former, and the maximum amount of lost data that is acceptable, the latter.