Planning for Resilience-Best Practices for Developing Reliable Disaster Recovery Plans
Adverse incidents cannot be stopped from happening. However, the effects of these incidents can be mitigated with efficient disaster recovery plans in place. The best practices for developing a reliable disaster recovery plan focus on factors like technology, infrastructure, physical work place, and human resources, as part of the emergency management and business continuity plan.
Back in 1970s, there were companies with detailed disaster recovery plans in place. But these plans were unable to produce the required results, as most organizations created recovery plans based on past events, and considered factors like IT infrastructure, physical workplace, and human resources in isolation. Thus these plans lacked interfacing with the users.
Adopting a systematic approach to risk tracking to enhance the effectiveness of the disaster recovery plan:
A systematic approach to risk tracking begins with an efficient risk planning process. The first step is risk identification and maintaining the risk register. Now a qualitative and quantitative analysis of the risks are performed and the risk register is updated. Finally, the risk response of the organization needs to be planned based on the risk register updates.
The next step is to monitor and control risks based on the risk register updates, change requests and project management plan updates. Here, one should remember that when a project management plan gets updated, there are chances that the timeline for any disaster recovery may get affected. Thus it is important to keep in mind that data recovery in different business set ups require different timeframe. For example, in case of a trading operation, the timeline for data recovery may be as sleek as a couple of hours, whereas in a manufacturing industry, it perhaps could wait for few days without affecting the customers.
The common threads in risk management, in terms of disaster recovery, emergency management and business continuity includes accessing the risks and impacts. The recovery team needs to architect a solution for disaster recovery and emergency management that should be approved by the management. Now, going ahead of time, the team needs to mitigate the possibility of the disaster that might happen. Finally, recovery solutions need to be implemented.
A major bank in New York can be an apt example in this connection. As part of their risk management and recovery solutions plan, the bank had implemented contracts to use tugboats to transport their employees from Manhattan to New Jersey if ever all the bridges in New York were shut. And on 9/11, their emergency management team was able to shift people to New Jersey, when the other companies were struggling to get people out of the city. Implementing a recovery solution ahead of time does not necessarily mean activating it, but just putting it in place.
Outlining the critical actions to take if an event affects the company or its partners:
While outlining the critical actions for disaster recovery, the emergency team needs to remember not just its organization, but the partner organizations as well.Designing the action plan for disaster recovery should involve IT that can provide support to the various business areas. Human Resource is another important factor here that can ensure the right set of people in the right place. Also, the legal department needs to be involved in the whole process because while the organization engages in any type of contracting, the legal department needs to understand the whole process. The public relations department of any organization also plays an active role in disaster recovery.
The entire process of outlining the critical actions to be taken during an adverse event involve compliance, which can be used as a guide in the business, preparedness, as the entire company needs to get prepared and gaps, if any, needs to be filled, and training, without which it is difficult to recover immediately. It also involves exercise or testing, which is vital to be able to face the actual disaster and recover on time, and update, as every time an exercise is performed, newer mistakes in the planning aspect comes out and the plan requires updating. Communication is another important factor to let everyone involved know about the training, exercise, and updates in the compliance areas.
At this stage, risk evaluation and control is essential. The risks that can adversely affect the organization and its resources leading to business interruption needs to be determined. Controls need to be implemented to avoid or mitigate the effects of those risks. Here’s an example. In New Orleans, a number of companies had their back-up tapes of the company records, at offsite storage location. One such company had arranged for underground offsite storage, and when the storage area flooded, all of their back-up tapes were lost. There were no available back-ups and it created a major problem.
Understanding an organization’s susceptibility to disasters:
A cost benefit analysis is essential to justify the investment in controls because if it costs a million dollars to save a 100 then it is not worth it.
There are possibly certain areas within an organization that are more likely to be damaged than others in case of a disaster. Thus while planning for disaster recovery, one needs to consider the entire enterprise, and the risks and impacts that it can have on its people, buildings, equipment, processes, reputation, and financial concerns. When proposing cost-effective controls to the upper management, it is advisable to base these on the cost benefit analysis.
Now to mitigate or recover from a disaster, organizations need to assess risks and impacts, architect solutions and implement mitigation. Moreover, it is absolutely critical to implement recovery solutions ahead of time.
Conducting a Business Impact Analysis (BIA) to address all gaps in the recovery plan:
It is essential to conduct a business impact analysis to address all gaps in the recovery plan. Moreover, the business impact analysis helps in filling all the gaps in every aspect of the recovery plan including the IT, and not just those related to the business units.
BIA helps in identifying the various impacts that can affect the organization and techniques that can be used to quantify and qualify such impacts. It also helps in identifying time-critical functions, their recovery priorities, and inter-dependencies. BIA also assists in establishing recovery time objectives, which is vital since the shorter the organization takes to recover, more expensive the recovery process is, and vice-versa. However, the longer an organization take to get back, the more it is susceptible to loss in business and customers and need to pay higher costs in operations.
- It establishes definition of criticality, and negotiate with management about single or multiple levels of criticality
- It identifies, analyzes and documents critical functions including business functions, support functions, and interdependencies
- It identifies, analyzes and documents vital records to support business continuity and business restoration by prioritizing critical business functions
- It identifies, analyzes and documents recovery timeframes and minimum resource requirements with recovery windows for critical business functions being based on level of criticality
- It identifies, analyzes and documents the order of recovery for critical business functions, and supports functions and systems based on parallel and interdependent activities with minimum resource requirements. These resources can be both internal and external, and either owned by the company or otherwise. It also helps in identifying, analyzing and documenting business processes and the interrelationship between business processes.
- It focuses on process dependencies which may be intra departmental, interdepartmental and dependency on technology
- It also identifies, analyzes and documents replacement times, equipment, key personnel, raw materials and sub-assemblies
Role of Technology:
The need now is to transform the business continuity and planning, and integrate emergency preparedness, crisis management, incident response and other related disciplines into it. Planning for continuity of business operations is becoming more challenging as the world is getting interconnected. Today, issues such as external hackings and network information leakage affect business operations and continuity and it is critical to involve IT to mitigate these issues.
According to the ‘Survey for Global State of Information Security’,[i] published in 2012, the business continuity and disaster recovery was identified as the number one in the list of the top priorities for the executives around the world.
The focus of BCM should be on four interrelated disciplines- business continuity, information security, emergency management and traditional risk management. Although, these disciplines come from different perspective, they all have a common overlapping goal of ensuring continuity of business operations. BCM is concerned with processes that evaluate potential risk of an organization and ensure that necessary resources are available to meet critical objectives at any event of destruction. Now, organizations are realizing that in order to ensure that resources necessary to meet particular objectives are available, all three disciplines of business continuity, information security, and traditional risk management need to get on the same page at the earliest.
Benefits of converging the disciplines of business continuity and risk management:
Risk management provides BCM with broader view of risk, access to the board, systems for monitoring and managing risks, and a better view and understanding of the evolving threats and risks for supporting the business process. Similarly, BCM provides risk management with a better understanding of the important activities, and the resources that support these. It also provides an existing risk mitigation framework, and a pragmatic approach to understanding on-the-ground challenges.
The BCM Solution Capabilities:
Manage business continuity requirements-
- Map organizational hierarchy
- Define processes with MTD, RPO and RTO
- Conduct Business Impact Analysis
- Establish risk register
- Identify preventive controls
- Develop incident response structure
- Define business resumption and DR plan
- Define communication plan
Test, maintain, and review plan-
- Conduct damage assessment
- Invoke disaster recovery plan
- Initiate recovery activities
- Terminate alternate site and close DR process
Respond to business interruptions-
- Develop test objectives from BCM plan
- Embed learning process in BCM plan and training personnel
- Enforce document management and control
- Enable certification- ISO 22301 and ISO 27001
- Enable internal audit and governance review process
As far as the best practices for convergence of business continuity, information security and risk management is concerned, it involves a common GRC platform for a 360 degree view of governance, risk, and compliance management with ‘a single version of the truth’. It helps in developing common terminology within threat reports, and implement a common policy, risk, control framework and issue management. It also implements common processes for incident response and crisis management. The convergence involves business continuity, considering end-to-end ecosystem, including third parties and suppliers. It also involves risk management and information security, collecting and developing information and evidence about attack vectors, and impact achieved by threat agents. It also performs a shift in security controls to accommodate emerging threat trends.
An efficient disaster recovery solution should respond with speed and agility, while empowering businesses to maintain continuous operations during a disaster. A common GRC platform with a 360 degree view of governance, risk, and compliance management can ensure an efficient disaster recovery plan focusing on emergency planning, business continuity planning and crisis management.