SecurityArena

Guide to Practical Info Security!

Who's Online

We have 4 guests online
Print E-mail
Written by Administrator   
Saturday, 20 June 2009 07:32
Article Index
CBK BCP and DRP
NIST Continuity Planning Guide
Zachman Model
BCP Requirements
Scope and Project Initiation
Business Impact Analysis
BIA Steps
Interdependencies
Preventative Measures vs Recovery Strategies
Offsite Facilities
Documentation
Human Resources
Data Backup Alternatives
How to Set BCP Goals
Types of Recovery Plans
Need for BCP Maintenance
Testing the DRP
Insurance
All Pages

CBK BCP and DRP

This domain examines the preservation of business activities when faced with disruptions or disasters. This involves the identification of real risks, proper risk assessment, and countermeasure implementation.

Disaster Recovery (Short-term, IT focused)

The goal of disaster recovery is to minimize the effects of a disaster / adverse event and take the necessary steps to ensure that the resources, personnel, and business processes are able to resume operation in a timely manner. This is different from continuity planning, which deals with providing methods and procedures for dealing with longer-term outages and disasters. The goal of a disaster recovery plan is to deal with the disaster and its ramifications right after the disaster hits; the disaster recover plan is usually very IT focused.

Business Continuity Planning (Longer term, broader approach)

BCP deals with providing methods and procedures for dealing with longer-term outages and disasters. A BCP takes a broader approach to the problem. This includes getting critical systems to another environment while repair of the original facilities is taking place, getting the right people to the right places, and performing business in a different mode until regular conditions are back in place. It also involves dealing with customers, partners, and shareholders through different channels until everything returns to normal.


NIST Continuity Planning Guide for Information Technology Systems

  • Develop the continuity planning policy statement. Write a policy that provides the guidance necessary to develop a BCP and assigns authority to the necessary roles to carry out these tasks.
  • Conduct the business impact analysis (BIA). Identify critical functions and systems and allow the organization to prioritize them based on necessity. Identify vulnerabilities, threats, and calculate risks.
  • Identify preventive controls. Once threats are identified, identify and implement controls and countermeasures to reduce the organization’s risk level in an economical manner.
  • Develop recovery strategies. Formulate methods to ensure that systems and critical functions can be brought online quickly.
  • Develop the contingency plan. Write procedures and guidelines for how the organization can still stay functional in a crippled state.
  • Test the plan and conduct training and exercises. Test the plan to identify deficiencies in the BCP and conduct training to properly prepare individuals on their expected tasks.
  • Maintain plan. Put in place steps to ensure that the BCP is a living document that is updated regularly.

 


Zachman Model

To rebuild something, you need to understand what it is made of.

Zachman model is one of the most comprehensive approaches to understanding a company’s architecture and all the pieces and parts that make it up. This model breaks down the core portions of a corporate enterprise to illustrate the various requirements of every business process. It looks at the data, function, network, people, time, and motivation components of the enterprise’s infrastructure and how they tie to the roles within the company.

It would be very beneficial for a BCP team to use this type of model to understand the core components of an organization, because the team’s responsibility is to make sure that the organization can be rebuilt if needed.


BCP Requirements

The company’s business plan usually defines the company’s critical mission and business function. The functions must have priorities set upon them to indicate which is most crucial to a company’s survival. For many companies, financial operations are most critical.

The most critical part of establishing and maintaining a current continuity plan is management support. Management must be convinced of the necessity for such a plan. Therefore, a business case must be made to obtain this support. The business case may include current vulnerabilities, regulatory and legal obligations, the current status of recovery plans, and recommendations. Management is mostly concerned with cost/ benefit issues, so preliminary numbers need to be gathered and potential losses estimated. The decision of how a company should plan to recover is purely a business decision and should always be treated as such.

Executives may be held responsible and liable under various laws and regulations. They could be sued by stockholders and customers if they do not practice due diligence and due care and fulfill all of their responsibilities when it comes to disaster recovery and business continuity items.

 Disaster recovery, continuity development, and planning work best in a top-down approach, not a bottom-up approach.

 


Scope and Project Initiation

Once management’s support is solidified, a business continuity coordinator needs to be identified. This person needs to have direct access to management and have the credibility and authority to carry out leadership tasks.

Representatives from each department must be involved with not only the planning stages but also the testing and implementation stages.

The committee should be made up of representatives from at least the following departments:
  • Business units
  • Senior management
  • IT department
  • Security department
  • Communications department
  • Legal department

The people who develop the BCP should also be the ones who execute it. If you knew that in a time of crisis you would be expected to carry out some critical tasks, you might pay more attention in the planning and testing phases.

Management needs to help direct the team on the scope of the project and the specific objectives.

Most companies outline the scope of their BCP to encompass only the larger threats; the smaller threats are then covered by independent departmental contingency plans.

Once the project plan is completed, it should be presented to management for written approval before any further steps are taken. It is important that there are no assumptions in the plan and the coordinator obtains permission to use the necessary resources to move forward.


Business Impact Analysis

A business impact analysis (BIA) is considered a functional analysis, in which a team collects data through interviews and documentary sources; documents business functions, activities, and transactions; develops a hierarchy of business functions; and finally applies a classification scheme to indicate each individual function’s criticality level.

The BCP committee must identify the threats to the company and map them to the following characteristics to determine their criticality levels:

  • Maximum tolerable downtime
  • Operational disruption and productivity
  • Financial considerations
  • Regulatory responsibilities
  • Reputation
Committee must gather this information from the people who do know, which are department managers and specific employees throughout the organization.

BIA Steps

The more detailed and granular steps of a BIA are outlined here:

  • Select individuals to interview for data gathering.
  • Create data-gathering techniques (surveys, questionnaires, qualitative and quantitative approaches).
  • Identify the company’s critical business functions.
  • Identify the resources that these functions depend upon.
  • Calculate how long these functions can survive without these resources.
  • Identify vulnerabilities and threats to these functions.
  • Calculate risk for each different business function.
  • Document findings and report them to management.

The committee needs to step through scenarios that could produce the following results:

  • Equipment malfunction or unavailable equipment
  • Unavailable utilities (HVAC, power, communications lines)
  • Facility becomes unavailable
  • Critical personnel become unavailable
  • Vendor and service providers become unavailable
  • Software and/or data corruption
Qualitative and quantitative impact information should be gathered and then properly analyzed and interpreted.

Loss criteria must be applied to the individual threats that were identified. The criteria may include the following:

  • Loss in reputation and public confidence
  • Loss of competitive advantages
  • Increase in operational expenses
  • Violations of contract agreements
  • Violations of legal and regulatory requirements
  • Delayed income costs
  • Loss in revenue
  • Loss in productivity

These costs can be direct or indirect and must be properly accounted for.

Interruptions

Being properly prepared specifically for a flood, earthquake, terrorist attack, or lightning strike is not as important as being properly prepared to respond if one of the following results becomes reality:

  • Equipment malfunction or unavailable equipment
  • Unavailable utilities (HVAC, power, communications lines)
  • Facility becomes unavailable
  • Critical personnel become unavailable
  • Vendor and service providers become unavailable
  • Software and/or data corruption

All of the previously mentioned disasters could cause these results, but so could a meteor strike, a tornado, or a wing falling off of a plane passing overhead. So the moral to the story is to be prepared for the loss of any or all business resources, instead of focusing on the events that could cause the loss.

Maximum tolerable downtime (MTD) estimates that may be used within an organization:

  • Nonessential 30 days
  • Normal 7 days
  • Important 72 hours
  • Urgent 24 hours
  • Critical Minutes to hours

Categories of disruptions

A nondisaster is a disruption in service as a result of a device malfunction or failure. The solution could include hardware, software, or file restoration.

A disaster is an event that causes the entire facility to be unusable for a day or longer. This usually requires the use of an alternate processing facility and restoration of software and data from offsite copies.

A catastrophe is a major disruption that destroys the facility altogether. This requires both a short-term solution, which would be an offsite facility, and a long-term solution, which may require rebuilding the original facility.


Interdependencies

The following interrelation and interdependency tasks should be carried out by the BCP team and addressed in the resulting plan:

  • Define essential business functions and supporting departments.
  • Identify interdependencies between these functions and departments.
  • Discover all possible disruptions that could affect the mechanisms necessary to allow these departments to function together.
  • Identify and document potential threats that could disrupt interdepartmental communication.
  • Gather quantitative and qualitative information pertaining to those threats.
  • Provide alternative methods of restoring functionality and communication.
  • Provide a brief statement of rationale for each threat and corresponding information.
In larger organizations, it can be helpful for each department to have its own specific contingency plan that will address its specific needs during recovery. These individual plans need to be compatible with the enterprise-wide BCP. 

Preventative Measures vs Recovery Strategies

Preventative mechanisms are put into place to try to reduce the possibility of the company experiencing a disaster and, if a disaster does hit, to lessen the amount of damage that will take place. Although the company cannot stop a tornado from coming, it could choose to move its facility from tornado valley in Kansas. The company cannot stop a car from plowing into and taking out a transformer, but it can have a separate feed from a different transformer in case this happens.

Recovery strategies are a set of predefined activities that will be implemented and carried out in response to a disaster. Recovery strategies are processes on how to rescue the company after a disaster takes place. These processes will integrate mechanisms such as establishing alternate sites for facilities, implementing emergency response procedures, and possibly activating the preventative mechanisms that have already been implemented.

Type of Recovery Strategies:

  • Business process recovery
  • Facility recovery
  • Supply and technology recovery
  • User environment recovery
  • Data recovery


Offsite Facilities

When choosing a backup facility, it should be far enough away from the original site so that one disaster does not take out both locations. Three main types of leased or rented offsite facilities:

Hot site A facility that is leased or rented and is fully configured and ready to operate within a few hours. It contains all necessary computer equipment, company moving in just requires to bring in retrieved data and workforce. The equipment and system software must absolutely be compatible with the data being restored from the main site and must not cause any negative interoperability issues. It is most expensive option. A hot site can support a short- or long-term outage. Most hot-site facilities support annual tests that can be done by the company to ensure the site is functioning in the necessary state. Backup tapes or other media should be tested periodically on the equipment kept at the hot site to make sure the media is readable by those systems.

Warm site A leased or rented facility that is usually partially configured with some basic IT equipment, but not with the actual/company specific computers. This is the most widely used model. It is less expensive than a hot site and can be up and running within a reasonably acceptable time period. It may be a better choice for companies that depend upon proprietary and unusual hardware and software, because they will bring their own hardware and software with them to the site after the disaster hits.

Cold site A leased or rented facility that supplies the basic environment, HVAC but none of theequipment or additional services. It may take weeks to get the site activated and ready for work. The cold site is the least expensive option but takes the most time and effort to actually get up and functioning right after a disaster.

Redundant Site A redundant site is a site owned and maintained by the company vs the above three which are subscription services , meaning the company does not pay anyone else for the site. Redundant site is usually hot in nature as well.

Tertiary Sites During the BIA phase, the team may recognize the danger of the primary backup facility not being available when needed, which could require a tertiary site. It is also referred as ‘backup to the backup’.

Reciprocal agreement This arrangement is also referred to as mutual aid, with another company. This means that company A agrees to allow company B to use its facilities if company B is hit by a disaster, and vice versa. These agreement have issues of their own, but in certain businesses involving specialized technologies and system this may the viable option, like a newspaper printing press.

Rolling hot site or mobile hot site, where the back of a large truck or a trailer is turned into a data processing or working area. The trailer has all of the necessary power, telecommunications, and systems to allow for processing to take place right away.

Multiple processing centers  An organization with a multiple facilities around the world, may include products and technologies that would move all data processing from one facility to another in a matter of seconds when an interruption is detected. This technology can be implemented within the organization or from one facility to a third-party facility or service provider.

Time Brokers  Time Brokers promise to deliver processing time on other systems. They charge a fee, but cannot guaranty that processing will always be available, especially in areas that experienced multiple disasters.


Documentation

Procedures need to be well documented because when they are actually needed, it will most likely be a chaotic and frantic atmosphere with a demanding time schedule. The documentation to include:

  • How to install images, configure operating systems and servers
  • How to properly install utilities and proprietary software
  • A calling tree describing who should be contacted, in what order, and who is responsible for doing the calling.
  • Information for specific vendors, emergency agencies, offsite facilities.

BCP Plan Documentation

There should be two or three copies of these plans. One copy may be at the primary location, but the other copies should be at other locations in case the primary facility is destroyed. Typically, a copy is stored at the BCP coordinator’s home and a copy is stored at the offsite facility. These plans should not be stored in a file cabinet, but rather in a fire-resistant safe.

Documentation is an essential piece of business, and therefore an essential piece in disaster recovery and business continuity.

Software escrow

It means that a third party holds the source code, backups of the compiled code, manuals, and other supporting materials. A contract between the software vendor, customer, and third party outlines who can do what and when with the source code. This contract usually states that the customer can have access to the source code only if and when the vendor goes out of business, is unable to carry out stated responsibilities, or is in breach of the original contract. If any of these activities takes place, then the customer is protected because it can still gain access to the source code and other materials through the third-party escrow agent.


Human Resources

Human resources is a critical component to any recovery and continuity process, and it needs to be fully thought out and integrated into the plan. Organizations should already have executive succession planning in place. Some large organizations also have a policy indicating that two or more of the senior staff cannot be exposed to a particular risk at the same time.


Data Backup Alternatives

The BCP team should not be responsible for setting up and maintaining the company’s data classification procedures, but the team may recognize that the company is at risk because it does not have these procedures in place. This should be seen as a vulnerability that is reported to management. Management would need to establish another group of individuals who would identify the company’s data, define a loss criterion, and establish the classification structure and processes.

The BCP team’s responsibility is to provide solutions to protect this data and identify ways to restore it after a disaster. Data usually changes more often than hardware and software, so these backup procedures must happen on a continual basis. The data backup process must make sense and be reasonable and effective.

The operations team is responsible for defining what data gets backed up and how often. These backups can be full, differential, or incremental backups and are usually used in some type of combination with each other. Operating systems’ file systems keep track of what files have been modified by setting an archive bit.

Whatever the organization chooses, it is important to not mix differential and incremental backups. This overlap could cause files to be missed, since the incremental backup changes the archive bit and the differential backup does not.

Critical data should be backed up to both an onsite area and an offsite area.

  • A disk-shadowing or disk-mirroring process uses two physical disks, and the data is written to both at the same time for redundancy purposes. If one disk fails, the other is readily available.
  • Disk duplexing means that there is more than one disk controller. If one disk controller fails, the other is ready and available.
  • Electronic vaulting makes copies of files as they are modified and periodically transmits them to an offsite backup site. The transmission does not happen in real time, but is carried out in batches. A company can choose a suitable batch interval either time or event regulated.
  • Remote journaling takes place in real time and transmits only the file deltas.

Recovery and Restoration

Following teams are required for recovery and restoration phase:

  • Damage assessment team
  • Legal team
  • Media relations team
  • Network recovery team
  • Relocation team
  • Restoration team
  • Salvage team
  • Security team
  • Telecommunications team

Damage Assessment  A role, or a team, needs to be created to carry out a damage assessment once a disaster has taken place. The assessment procedures should be properly documented and include the following steps:

  • Determine the cause of the disaster.
  • Determine the potential for further damage.
  • Identify the affected business functions and areas.
  • Identify the level of functionality for the critical resources.
  • Identify the resources that must be replaced immediately.
  • Estimate how long it will take to bring critical functions back online.
After the damage assessment, if one or more of the situations outlined in the criteria have taken place, then the team is moved into recovery mode.

Restoration Team  The restoration team should be responsible for getting the alternate site into a working and functioning environment.

Salvage Team  The salvage team should be responsible for starting the recovery of the original site.

Reconstitution Phase  When it is time for the company to move back into its original site or a new site, the company is ready to enter into the reconstitution phase. A company is not out of an emergency state until it is back in operation at the original primary site or a new site that was constructed to replace the primary site, because the company is always vulnerable while operating in a backup facility.

In reconstitution phase, the least critical functions should be moved back first, so if there are issues in network configurations or connectivity, or important steps were not carried out, the critical operations of the company are not negatively affected.

How to Set BCP Goals

To be useful, a goal must contain certain key information, such as the following:

  • Responsibility Each individual involved with recovery and continuity should have their responsibilities spelled out in writing to ensure a clear understanding in a chaotic situation. These individuals must know what is expected of them, which is done through training, drills, communication, and documentation.
  • Authority In times of crisis, it is important to know who is in charge. Teamwork is important in these situations, and almost every team does much better with an established and trusted leader.
  • Priorities It is extremely important to know what is critical versus what is merely nice to have.
  • Implementation and testing Once a continuity plan is developed, it actually has to be put into action. It needs to be documented and put in places that are easily accessible in times of crisis. The drills should take place at least once a year, and the entire program should be continually updated and improved.

Types of Recovery Plans

  • Business resumption plan It focuses on how to re-create the necessary business processes that need to be reestablished instead of focusing on IT components (i.e., process oriented instead of procedural oriented).
  • Continuity of operations plan (COOP) It establishes senior management and a headquarters after a disaster. Outlines roles and authorities, orders of succession, and individual role tasks.
  • IT contingency plan It is plan for systems, networks, and major applications recovery procedures after disruptions. A contingency plan should be developed for each major system and application.
  • Crisis communications plan It includes internal and external communications structure and roles. Identifies specific individuals who will communicate with external entities. Contains predeveloped statements that are to be released.
  • Cyber incident response plan It focuses on malware, hackers, intrusions, attacks, and other security issues. Outlines procedures for incident response.
  • Disaster recovery plan It focuses on how to recover various IT mechanisms after a disaster. Whereas a contingency plan is usually for nondisasters, a disaster recovery plan is for disasters that require IT processing to take place at another facility.
  • Occupant emergency plan It establishes personnel safety and evacuation procedures.

Emergency Response

Often, the initial response to an emergency affects the ultimate outcome. Emergency response procedures are the prepared actions that are developed to help people in a crisis situation better cope with the disruption. These procedures are the first line of defense when dealing with a crisis situation.


Need for BCP Maintenance

The main reasons plans become outdated include the following:

  • The business continuity process is not integrated into the change management process.
  • Infrastructure and environment changes occur.
  • Reorganization of the company, layoffs, or mergers occur.
  • Changes in hardware, software, and applications occur.
  • After the plan is constructed, people feel that their job is done.
  • Personnel turns over.
  • Large plans take a lot of work to maintain.
  • Plans do not have a direct line to profitability.

Actions to maintain BCP

  • Make business continuity a part of every business decision.
  • Insert the maintenance responsibilities into job descriptions.
  • Include maintenance in personnel evaluations.
  • Perform internal audits that include disaster recovery and continuity documentation and procedures.
  • Perform regular drills that use the plan.
  • Integrate the BCP into the current change management process.

Testing the Disaster Recovery Plan (DRP)

Types of test types:

  • Checklist Copies of the plan are sent to different department managers and business unit managers for review. This is a simple test and should be used in conjunction with other tests.
  • Structured Walk-through Team members and other individuals responsible for recovery meet and walk through the plan step-by-step to identify errors or assumptions.
  • Simulation This is a simulation of an actual emergency. Members of the response team act in the same way as if there was a real emergency.
  • Parallel This is similar to simulation testing, but the primary site is uninterrupted and critical systems are run in parallel at the alternative and primary sites.
  • Full interruption This test involves all facets of the company in a response to an emergency. It mimics a real disaster where all steps are performed to test the plan. Systems are shut down at the primary site and all individuals who would be involved in a real emergency, including internal and external organizations, participate in the test. This test is the most detailed, time-consuming, and expensive all of these.

Insurance

Replacement Cost Property replacement cost insurance promises to replace old with new. Generally, replacement of a building must be done on the same premises and used for the same purpose, using materials comparable to the quality of the materials in the damaged or destroyed property.

Actual Cash Value (ACV) The ACV is the default valuation clause for commercial property insurance.  It is also known as depreciated value, but this is not the same as accounting depreciated value.  The actual cash value is determined by first calculating the replacement value of the property.  The next step involves estimating the amount to be subtracted, which reflects the building’s age, wear, and tear. This amount deducted from the replacement value is known as depreciation. The amount of depreciation is reduced by inflation (increased cost of replacing the property); regular maintenance; and repair (new roofs, new electrical systems, etc.) because these factors reduce the effective age of the buildings.

Functional Replacement Cost This method provides for the replacement of a building with similar property that performs the same function, using less costly material. The endorsement includes coverage for building codes automatically. In the event of a loss, the insurance company pays the smallest of four payment options.

  • In the event of a total loss, the insurer could pay the limit of insurance on the building or the cost to replace the building on the same (or different) site with a payment that is “functionally equivalent.”
  • In the event of a partial loss, the insurance company could pay the cost to repair or replace the damaged portion in the same architectural style with less costly material (if available).
  • The insurance company could also pay the amount actually spent to demolish the undamaged portion of the building and clear the site if necessary.
  • The fourth payment option is to pay the amount actually spent to repair, or replace the building using less costly materials, if available.
Agreed Value or Agreed Amount  Agreed value or agreed amount is not a valuation method. Instead, his term refers to a waiver of the coinsurance clause in the property insurance policy. Availability of this coverage feature varies among insurers but, it is usually available only when the underwriter has proof (an independent appraisal, or compliance with an insurance company valuation model) of the value of your property.
Last Updated on Friday, 28 August 2009 05:06
 
Please register or login to add your comments to this article.
Comments (1)
-10 Friday, 07 August 2009 08:22
Dear,
I need to organize Tender for BCP implementation for my company.
What is neccesary for Tender ?
 
Joomla 1.5 Templates by Joomlashack