Business Continuity and Disaster (BCDR) Recovery Strategy for IT Professionals

Business Continuity and Disaster Recovery (BCDR) Strategy for IT Professionals We already discussed BCDR scenarios. Although the departing positions are different and each situation requires a tailored approach, there are several common components to these scenarios.

A logical sequence to discuss these components is location, data replication, functionality replication, event anticipation, failover event, and return to normal.

As always in risk management, it is important to take the business requirements into account when developing and evaluating alternatives.

These alternatives should strike an acceptable balance between mitigation and cost.

It may be necessary to iterate a few times. Consider the main components of a sample failover architecture.

Keep this in mind as you explore the components of Business Continuity and Disaster Recovery (BCDR) Strategy for IT Professionals in the following sections

BCDR Strategy Location

Location For Business Continuity And Disaster (BCDR)

As each BCDR Business Continuity and Disaster Recovery (BCDR) Strategy for IT Professionals the loss of important assets, replication of those assets across multiple locations is more or less assumed.

The relevant locations to be considered depending on the geographic scale of the calamity anticipated.

Power or network failure may be mitigated in a different zone in the same data center.

Flooding, fire, and earthquakes likely require more remote locations.

Switching to a different CSP will also likely affect the sites of operations.

This is unique to the cloud model because traditional IT solutions do not readily lend themselves to contemplating a switch to a different provider.

Unless some sort of outsourcing scenario were to be contemplated and executed, a switch in IT providers would not be possible.

The IT Professionals needs to understand this difference because they have to account for the possibility of a switch in CSPs as part of their due diligence planning to address risk.

The use of a memo of understanding, along with SLAs to regulate and guide a switch, if necessary, should be thought out ahead of time and put in place before a switch taking place.

BCDR Strategy Data Replication

Data Replication For Business Continuity And Disaster (BCDR)

Data replication is about maintaining an up-to-date copy of the required data in a different location.

It can be done on several technical levels and with different granularity.

For example, data can be replicated at the block level, the file level, and the database level.

Replication can be in bulk, on the byte level, by file synchronization, database mirroring, daily copies, and so on.

These alternatives can differ in their RPOs, recovery options, bandwidth requirements, and failover strategies.

sEach of these levels allows the mitigation of certain risks, but not all risks.

For example, block-level data replication protects against physical data loss but not against database corruption.

Also, it does not necessarily permit recovery to a different software solution that requires different data formats.

Furthermore, backup and archive are traditionally used for snapshot functionality, which can mitigate risks related to accidental file deletion and database corruption.

Beyond replication, there may exist an opportunity to re-architect the application so that relevant data sets are moved to a different provider.

This modularizes the application and makes the data more resilient in the face of a power failure.

Examples of components to split off include database as a service (DBaaS) and remote storage of log files.

In contrast with IaaS services, PaaS and SaaS service models often have data replication implicit in their services.

However, that does not protect against the failure of the service provider, and exports of the important data to external locations may still be necessary.

In all cases, selecting the proper data replication strategy requires consideration of storage and bandwidth requirements.

Functionality Replication

Functionality replication is about re-creating the processing capacity in a different location.

Depending on the risk to be mitigated and the scenario is chosen, this could be as simple as selecting an additional deployment zone or as involved as performing an extensive rearchitecting.

In the SaaS case, this replication of functionality might even involve selecting a new provider with a different offering, implying a substantial impact on the users of the service.

Examples of simple cases are a business that already has a heavily virtualized workload.

The relevant VM images can then simply be copied to the CSP, where they would be ready for service restoration on demand.

A modern infrastructure cloud service consumer is likely to have the application architecture described and managed in an orchestration tool or other cloud infrastructure management system.

With these, replicating the functionality can be a simple activity.

Functionality replication timing can be across a wide spectrum.

The worst recovery elapsed time is probably when functionality is replicated only when disaster strikes.

A little better is the active-passive form, where resources are held on standby. Inactive mode, the replicated resources are participating in the production.

The latter approach is likely to demonstrate the most resilience.

Rearchitecting a monolithic application in anticipation of a BCDR may be necessary to enable the type of data replication and functionality replication that are required for the desired Business Continuity and Disaster Recovery (BCDR) Strategy for IT Professionals .

Finally, many applications have extensive connections to other providers and consumers acting as data feeds.

These should be included in any BCDR planning.

Planning, Preparing, and Provisioning

Planning, preparing, and provisioning are about the tooling, functionality, and processes that lead up to the actual DR failover response.

The most important component here is adequate monitoring, where more time is often available ahead of the required failover event. In any case, the sooner anomalies are detected, the easier it is to attain an RTO.

BCDR Strategy Failover Capability

The failover capability itself requires some form of the load balancer to redirect user service requests to the appropriate services.

This capability can take the technical form of cluster managers, load balancer devices, or domain name system (DNS) manipulation.

It is important to consider the risks that these components introduce because they might become a new single point of failure.

Returning to Normal Operation

Return to normal is where DR ends. In case of a temporary failover, the return to normal would be back to the original provider (or in-house infrastructure, as the case may be).

Alternatively, the original provider may no longer be a viable option, in which case the DR provider becomes the “new normal.” In all cases, it is wise to adequately document any lessons learned and clean up any resources that are no longer needed, including sensitive data.

The whole BCDR process, and in particular the failover event, represents a risk mitigation strategy.

Practicing it in whole or part strengthens the confidence in this strategy.

At the same time, such a trial run can result in a risk to production.

These opposing outcomes should be carefully balanced when developing the Business Continuity and Disaster Recovery Strategy (BCDR) for IT Professionals.