Wednesday, April 30, 2014

ARTICLE: Pearson Grounds Flights During Ice Storm

In any BC Plan, it is critical to define when and how a disaster should be declared.

“Was the GTAA correct in making the decision to impose ground stop during frigid temperature of -25 to -45 Celsius?”
                                  Pearson right to ground flights during ice storm

These are the facts from those of us who were working on the ground when this decision was made:

Simply put, there is a very good chance that the GTAA's decision saved people's lives. In the proceeding 30 hours before the ground stop, there were two airplane crashes in similar conditions at New York's JFK Airport and Aspen, Colorado. Two days later, an aircraft slid off the runway shortly after landing in Saskatoon.

Years of two-tiered wages and contracting out has forced thousands of our co-workers into precarious, near-minimum-wage jobs. This is creating a high turnover rate and a lost opportunity to retain the experience needed to work in irregular operations. Many airports around the world, particularly in the U.S., are implementing Living Wage Ordinances in recognition that skilled, properly paid people on the ground are necessary for your safety.

Most importantly we need to remember that we are all people first. None of us can control bad weather in an industry with zero room for error. Nothing is achieved when we are abusive to each other — worker or passenger. After all, these decisions are made for both of our groups' safety. 

Sheri Cameron, Martyn Smith and Sean Smith are airline workers and representatives on the Toronto Airport Council of Unions encompassing over 20,000 airport ground handlers and flight attendants in both Terminals 1 and 3 at Pearson Airport.

Source Article (Toronto Star News)



GTAA criticized for “Ground Hold” at Pearson International Airport
The Greater Toronto Airports Authority is being harshly criticized for their decision to stop all arriving North American flights for more than eight hours at Pearson International Airport, which literally stranded thousands of frustrated passengers and caused serious delays since that day.

As a result, more than 50 per cent of all 774 arriving flights, i.e. 381, had to be cancelled as of Tuesday evening. Consequently, hundreds of weary travelers slept on seats or trudged forward in hours-long lines to rebook their cancelled or missed flights.

Vice President of strategy development for the GTAA, Toby Lennox, revealed that the decision to impose ‘Ground Stop’ at the airport is the CEO’s first in his 15-year career. He alleged that usually stops are only imposed due to snowstorms or lightning and last only a few hours. Although, he also admitted that “it’s just never been this extreme,” and “no matter how much you prepare, you’re not going to be able to make the event go away. I can’t prepare to make the weather go away.”

Source Article (Oye! Times):

http://www.oyetimes.com/news/canada/57358-gtaa-criticized-for-ground-hold-at-pearson-international-airport

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Alternate Communications During Times Of Disaster

by Dr. Jim Kennedy, NCE, MRP, MBCI, CBRM

We have witnessed over the last three to five years many disasters both in the United States and abroad. Based on what we are hearing from NOAA and the National Weather Service the US is likely to see the same number, if not more, tropical storms this year. Storms like those of the size and ferocity of the type that were so devastating to the southern portion of the US in 2005. So, tropical storms in the US , earthquakes in South America and Asia or volcanoes anywhere else on the globe, we, humanity, face another year of potential emergencies that will need to be responded to.

One thing that all of these natural disasters have in common, besides the tremendous loss of life and disruption to everyday lives of the populous, is that they are immediately followed by an almost total loss of the ability to communicate with the outside world. Power is lost, telephone services are discontinued, and cell phone service is either non-existent or is so congested that it takes hours to get a call through.

So, every year, companies and emergency planners face the problem of providing continued communication before, during, and after a disaster strikes their areas. This year, more than any other time, in the southern part of America small, medium and large company business continuity planners are looking for alternatives to standard communications so that they can keep their business and critical operations running in the aftermath of a devastating event.

I thought that I would present some alternatives for the spectrum of business types so that those business continuity planners would have choices to make informed decisions about backup communications from. Before we discuss back up communications solutions let’s first discuss the failure mechanisms for the communications used during normal times.

Failure modes
Most companies continue to rely upon the standard telephone system for their communications needs. In order to provide this service the telecommunications carrier, regardless of where you are located in the world, relies upon either copper wire or fiber optic cables from its central offices to its customers' premises. This ‘last mile’ can either be above ground, which is in the majority of cases, or underground. We have all seen those graphic pictures of poles and trees uprooted and thrown to the ground after a hurricane or tornado have devastated an area. When this happens that last mile of connectivity between the business and its telephone provider, Internet provider, or application service provider are abruptly disconnected and utility power is lost. Underground cables are not entirely safe from disruption of service either. Many times due to flooding and/or power loss these underground services are disrupted as well. In the case of cell phone providers the cell towers receive your cell phone’s call they then route it to a local central office. These towers or the equipment inside of them can also be damaged or destroyed as well as the last mile circuits which connect those cell towers to the local telephone network. So cell phone service is as tenuous as the regular telephone service when a disaster strikes. I should also mention that the southeast US is not the only area where loss of communications services takes place and hurricanes and tornadoes are not the only natural disasters that disrupt communications and power. In the northeast US over the last several years ice storms and blizzards have also taken their toll on communications and power utilities, for example.
Usually following an event like a tornado, hurricane, blizzard or the like, the communications and power service providers work very hard to restore service, however, in most cases we are talking several days if not a week for the restoration of power and phone service. This restoration time varies depending on the size and intensity of the disaster. If it is localized, as it could be for a tornado, then service could be restored more quickly.
These copper and fiber optic cables also interconnect the local telephone company’s central offices to other central offices in the region and to long distance providers, cell phone carriers, Internet and data communications service providers anywhere in the world. These inter-exchange or ‘long haul’ circuits provide the ability of inter-connectivity and communication to beyond the local area. So if your business communicates between offices in Baton Rouge LA and St. Louis MO there are probably several service providers and miles of cables involved in carrying the information from one point to the other. These cables travel above and underground and suffer the same fate as the local last mile circuits do. However, because of the number of calls, subscribers and the importance of these circuits, the carriers or the businesses that use them generally employed circuit ‘diversity’. What this means is that there are multiple paths for the voice or data to travel. If one path fails there is another which can be used to take the call to its intended destination. This works well for such things as car vs. pole accidents, isolated incidents like localized fires and floods, but with mass devastation like we experienced with Hurricane Katrina or the tornadoes in the midwest US, even the diverse routes are consumed in the overall damage toll.
Power is another failure mode. The central offices and cell phone sites have their own power sources in the form of batteries and emergency generators. If the event is limited to a few hours or a few days they will be fully operational. However, it was found that in the case of the hurricanes and earthquakes of the last few years power has been interrupted for several days even up to several weeks and the power plants, central offices, or cell towers in the areas of devastation were inaccessible for most of that time. This meant that the fuel trucks needed to refuel the generators were unable to get to their destinations and subsequently the central offices and cell sites went off-line.
So now that we understand that the power and communications utilities have planned for adverse events, but the intensity and massive area of devastation often make these plans fail. It is left to the individual business owner or operator to determine the criticality of their services and to properly plan for potential communication and power failures that might impact them.
In the next part of this article, I will endeavor to present the alternatives that exist in case you experience a disastrous event with a communication failure.

Alternatives
Before I discuss the alternatives I feel that it is important to note that power is a main component of any recovery or mitigation strategy. That is, without power to run these technologies they will not operate. So, it is important to have reliable and sustainable power for the duration of the resumption and/or recovery effort. If you cannot verify that this is the case then alternate site recovery is the only viable alternative.

Infrared
One such alternative to commercial communication systems is infrared. This alternative is used if a company needs to interconnect two buildings together. Infrared provides an optical data, voice and video transmission system. Like fiber optic cable, infrared communications systems use laser light to transmit a digital signal between two transceivers. However, unlike fiber, the laser light is transmitted through the air. In order for the digital signal to be transmitted and received, there must be clear line of site between each unit. In other words, there should be no obstructions such as trees or buildings between the transceiver units. So, if your wire-line or wireless communications fails you can still provide communications between two points. The only drawback is the distance and the line-of-sight requirements.
This solution provides low-cost, high-speed wireless connectivity for a variety of last-mile applications. It provides narrow-band voice and broadband data connectivity and the various products provide scalable, wireless alternatives to leased lines. These infrared systems operate at data rates of 1 Megabit to Multi Gigabit speeds and they are deployable in one day, without requiring right-of-way or government permits for installation. They can provide an alternative communication link in hours instead of weeks or months. This is probably not an option for a small business, but for a medium or large business owner the cost is affordable. Cost can range from $10K to $25K per installation capable of distances of up to 1000 meters.

Microwave
Another alternative to commercial communication systems is microwave (wireless). This alternative is used if a company needs to interconnect two buildings together that are spaced farther apart than the conventional infrared can operate (i.e., in excess of 1000m). Microwave also provides a data, voice and video transmission system. Unlike infrared communications systems, which use laser light to transmit a digital signal between two transceivers, microwave uses ultra-high frequency radio frequency (wireless) transmission. In order for the digital signal to be transmitted and received, there again must be clear line of site between each unit. However, the distance that this alternative can span is up to 60 miles as long as no obstructions such as trees or buildings are located between the two locations. If wire-line or wireless communications fails communications between two points can still take place. There are several drawbacks to this solution:

  • Distance limited to up to 60 miles
  • Requires an FCC license to operate
  • Right of Way Permits may be required
  • Needs highly trained technicians to install equipment
  • Cost can be prohibited to small businesses
The cost of a microwave system can be between $50K and $100K with installation and license preparation charges to be in the area of another $15K. It still provides a viable alternative for medium and large businesses.
Small businesses also have an alternative of smaller wireless systems which utilize non-licensed frequencies and which can be installed by an IT person in the business operation. Cost is about $1000 to $2000, but I must warn you that this is not as reliable a solution as the microwave wireless option and reliable speeds may be slower.

Satellite
So far I have provided solutions that have been better suited for the medium and large business operations. Satellite provides alternatives for small, medium and large enterprises and there are various speed and pricing options, which make it a very attractive alternative or mitigation strategy.

Satellite phones
There are several types of satellite alternatives. If a company is only interested in providing a short term telephone back-up alternative then satellite phone service like INMARSAT, at&t, Iridium, Satcom, Skytel, Worldcell, or Globalstar to name only a few offer basic voice, fax and basic v and e-mail services. They offer mobile phone services and are not usually capable of providing sustained data communication or Internet types of services. However, this communications strategy is good for keeping your senior executives and critical operations personnel in contact during disasters. You can rent phones for about $40/week and then pay about $1.00/minute for basic service or you can buy the phones for $700 to $2000 each and negotiate rates in the area of $0.85/minute. So as you can see this is not an inexpensive option, but usable depending on the need for communications.

VSAT
VSAT is an acronym for Very Small Aperture Terminal, an earthbound station used in satellite communications of data, voice and video signals. A VSAT consists of two parts, a transceiver that is placed outdoors in direct line of sight to the satellite and a device that is placed indoors to interface the transceiver with the end user's communications device, such as a PC. It is very much like a satellite TV setup. VSAT service can be placed into two categories: those that provide basic Internet access services and those that are enterprise grade. For the small and medium sized business the Internet access type service is often what is selected. Such offerings as: DirectWay, WildBlue, and Connexstar all offer low cost, small business types of back up solutions which use equipment much like the in-home satellite television services. The data rates are in the area of 200 kbps uplink and 1.5 Mbps downlink which is very much like residential DSL service. The cost is about $300 for the equipment and around $100 or less each month. This would provide a small business the ability to utilize VoIP, VPN and connect to the Internet. For medium and large size businesses there are more sophisticated satellite services. They require satellite antennas, which are 3 to 5 meters in diameter and much more sophisticated and expensive equipment. Installation of these more sophisticated satellite services can cost in the range of $100K to $250K with monthly operational service charges from $1000 to $5000/month. They provide quality of service and committed information rates as part of the service. They can provide for up to 150 toll-quality phone lines, broadband Internet, and high speed data communications and also provide secure communication (encrypted) is required. Satellite services can also be rented as part of a contract or call up service. But, rental services are on a first-come-first served basis. As we witnessed during the tropical storms of last year these portable rental satellite service providers were inundated with requests and try as they would there were only so many units to go around. Those who did not plan or contract ahead were left without service.

Last Thoughts
I hope that I have given business continuity planners some food for thought in developing alternative communication mitigation strategies. Each strategy has its benefits and drawbacks. You need to look at each potential possibility and determine what is right for you. If you are overwhelmed there are many consulting organizations and even your own telecommunications services provider who can help you to identify and select the best options. However, you need to get started today for the next hurricane, tornado, flood, of catastrophe season in your geographic region. It will be too late to plan after an event occurs.
Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity.
He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a contributing author for two books, the "Blackbook of Corporate Security" and "Disaster Recovery Planning: An Introduction."
jtkennedy@lucent.com

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Critical Infrastructure Protection Is All About Operational Resilience And Continuity

By Dr. Jim Kennedy, MRP, MBCI, CBRM
It has always been the policy of the United States to ensure the continuity and security of the critical infrastructures that are essential to the minimum operations of our economy and government. This critical infrastructure includes essential government services, public health, law enforcement, emergency services, information and communications, banking and finance, energy, transportation, and water supply.
So even before the events of 9/11, the Executive Branch of our government, the President through Presidential Decision Directive 63 (PDD 63) issued May 22, 1998, ordered the strengthening of the nation's defenses against emerging unconventional threats to the United States, including those involving terrorist acts, weapons of mass destruction, assaults on critical infrastructures, and cyber-based attacks.
But how many of us really understand what an immense undertaking that was? What is the critical infrastructure in the United States?
  • More than 3,000 government facilities
  • 7,569 Hospitals
  • Telecommunications: 2 billion miles of cable; 1000s of telephone switching central            offices
  • Energy: 2800 Electric power plants; 300,000 oil and natural gas producing sites; 104 nuclear power plants
  • Transportation
    Ø  2 million miles of pipelines
    Ø  300 coastal ports
    Ø  500 major urban public transit operators
    Ø  500,000 highway bridges
    Ø  5000 public airports
  • 4,893 banks or savings institutions have more than $100 billion in assets
  • 66,000 chemical and hazardous material producing plants
  • 75,000 dams
  • 51,450 fire stations responding to 22,616,500 calls for assistance each year.
US business and every individual rely in some manner on the above every day. We depend on their operational resiliency and continuity of operations.
Initially, critical infrastructure assurance was essentially a state and local concern. With the massive use of information technologies and their significant interdependencies it has become a national concern, with major implications for the defense of our homeland and the economic security of the United States.
However, given all of the focus on critical infrastructure still one in three critical infrastructure operations goes without a business continuity or continuity of operations plan and three out of five of those operations with plans have never tested their plans as ‘fit for purpose.’
Up until this year the electrical energy sector had no single body setting security and availability standards and practices for their operation. In 2006 the Federal Energy Regulatory Commission (FERC) selected the North American Electric Reliability Council (NERC) as the Electric Reliability Organization (ERO) and standard setting body in the US for electric utilities. Contingency and continuity of operations plans in this segment of the critical infrastructure is minimal at best as is typical across the entire energy sector (e.g. transmission, generation, oil and gas distribution and etc.).
In the financial sector many institutions, despite regular audits and increased governmental regulations, still do not have adequate continuity plans in place and information security is marginal.
Although the deadline for HIPAA compliance has officially passed, a significant percentage of covered health care organizations still have not achieved basic HIPAA compliance, according to a recent industry survey. They lack emergency operations plans and even in some cases proper disaster recovery plans for patient care systems, which contain critical patient healthcare information.
So even though there are laws and regulations and a very clear focus on the protection and resilience of critical infrastructure operations it has not seemed to translate into practice for the actual critical infrastructure operations across the US.
Critical infrastructure protection is all about operational resilience. In the GAO’s ‘Critical Infrastructure Protection – Significant Challenges in Safeguarding Government and Privately Controlled Systems from Computer-Based Attacks’ the report refers to service continuity controls as: “controls that ensure that when unexpected events occur, critical operations will continue without undue interruption and that crucial, sensitive data are protected.” It (the report) goes on to say that: “Service continuity controls should address the entire range of potential disruptions including relatively minor interruptions, such as temporary power failures or accidental loss or erasure of files, as well as major disasters, such as fires or natural disasters, that would require reestablishing operations at a remote location.”
So how is this to be accomplished? The most effective way is for the development of a thorough and comprehensive business continuity or business resiliency management program. That program can be based on the NIPP Risk Management Framework, which consists of:
  • Setting Security Goals
  • Identify Assets, Systems, Networks, and Functions
  • Assess Risks
  • Prioritize Mitigation Efforts
  • Implement Mitigations Strategies and Protective Programs
  • Measure Effectiveness
  • Start back at the beginning
I have attempted to outline below a process to aid critical infrastructure operations, utilizing the above CIPP Risk Management Framework coupled with an effective governance model, in addressing business continuity and resiliency needs.
First a certified business continuity planner needs to be selected and must obtain senior management agreement and sponsorship for the program to be developed. With this sponsorship budgets and manpower can be allocated for the project.
Second, the planner must solicit the aid from multiple areas of the operation or business. This can be accomplished by establishing a Business Continuity or Business Resiliency Steering Committee. This committee will be comprised of middle management from across the operation (e.g. technical, operational, financial, HR and etc.). The function of this committee is to establish the direction and approve the program, identify tools to be used, establish metrics, and report to senior management on progress.
Next, if the amount of work to be done is substantial or if the business continuity or resiliency program is starting from scratch, is the development of a Business Continuity or Resiliency Program Office. This may be comprised of one or more individuals who are responsible (using project management disciplines) for ensuring that the planning and mitigation tasks are implemented consistently throughout the organization. They must also track and report on progress.
With the governance in place, the CIPP framework can be implemented and work can begin to implement it within the organization. The steering committee will work with senior management to establish the direction and communicating the goals within the organization.
Identifying the critical assets is the next step. In everyday business continuity planning this equates to performing a business impact analysis. Here business continuity planners will work to develop a clear picture of what components (people, process, and/or technology) of the operation are critical to it carrying out its mission and to identify how long it can do without or work-around those components if they are to become unavailable.
Next step in the CIPP Risk Management Framework is the assessment of risk. This equates to the business continuity planner’s risk assessment. The risk assessment is the process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event.
Once the risk assessment is complete it will be necessary to move to the next step in the CIPP Framework, that of prioritizing the risks and developing mitigation strategies based on the operations risk appetite. Here is where the organization determines how to address the risk. Mitigate it, pass it on to another entity (insurance) or simply ignore it.
Whatever makes the best business sense is then translated into a protective plan which is then implemented under the direction of the program office. At this point in time, when the mitigation strategies are identified and are being implemented, is where the business continuity or resiliency plan can be developed. Again business continuity subject matter experts are best utilized to accomplish this task as they have developed plans for similar business operations. Once the mitigation efforts are in place and the plans completed awareness training and exercising of the plan is appropriate.
Lastly, before starting the whole effort over again, is measuring effectiveness. Is the plan and are the mitigation strategies “fit for purpose?” Does it adequately protect the operation from adverse events? If not, then the plan and mitigation efforts will have to be reviewed and modified as appropriate.
What has been accomplished is the beginning of a continuing effort to maintain the operation of the critical infrastructure. It has no end. It needs to be reviewed for every change to the operation.
I have been fortunate to help many critical infrastructure organizations build business continuity and resiliency into their operations. It is not easy but, as Presidents past and present indicate, it is of the utmost importance to make sure that the United State’s critical infrastructure is adequately protected as its citizens rely upon it every day for their safety, protection, and well-being. It is difficult but as has been said: the beginning of any important journey starts with a single step.
Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity. He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a co-author for two books, the ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-Book entitled: ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic.’ 



For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www. sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Developing Seamless Business Continuity And Disaster Recovery Plans

by Jim Kennedy
PDFPrintE-mail
Introduction
The development of recovery times for both the business organization’s business continuity plan and the IT department’s disaster recovery plan need to be developed through the collaboration of both parties for either plan to provide the proper protection. However in my thirty-five years in the business continuity and resiliency field I have found in many situations they are not.
The reasons for this can be timing or a lack of knowledge of the overall business continuity and/or disaster recovery planning process coupled with a lack of understanding of each other’s real recovery timing needs.
The purpose of this article is to provide a framework in which the recovery time objectives (RTOs) for the business continuity and the disaster recovery plan can be developed together.
Reason for inconsistencies and failures
Generally the drivers for business continuity and disaster recovery planning are considered to be one and the same, but this is not always the case. Many times the very design process for IT infrastructure requires that the IT organization develop disaster recovery planning thoughts and plans early in the application and/or systems development process. So, early in the project’s timescale of the development of a new application or system, IT must have some understanding of what kind of recovery timing and recovery point timing will be needed to support the technology to be deployed. IT will try to obtain the RTO and RPO (recovery point objective) numbers, but the business is most often focused on insuring that the deployment of the new business process or function is rolled out on time and within budget. The business organization is not thinking about business continuity planning at this time. So, IT will take it on itself to develop a best guess of the required recovery times either based on conversations with the business organization or on its own, if the latter cannot or will not commit to a number.
In other cases that I have seen, there is a clear lack of knowledge about business continuity and disaster recovery planning. Each organization knows that they need either a business continuity or a disaster recovery plan but they are not trained in the overall steps in developing such plans. As such the business organization does not understand the risks, trade-offs, and costs involved in developing a proper business continuity plan. The business organization also often does not understand that it needs to properly analyze the operation to better understand the recovery requirements during the process/systems/application development phase of the systems/process development life cycle or, as ITIL defines it, the application life cycle (ALC). The business organization needs to quantify the impacts of loss of that process or system; and may not be sure of the right questions to ask - not only in terms of loss of productivity, but in terms of costs to process manually in case of a system loss or failure. Can the organization develop and use manual processes at all if the system or IT infrastructure fails? Does the organization have the human resources to perform the necessary manual processes or will they need to bring in contingent workers and for how long and for what cost? Every business organization needs to clearly understand and to articulate their operation’s maximum tolerable period of disruption (MTPD).
MTPD is the maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization. This applies to both customer-facing and internal activities. Note that the recovery time objective specifies the time by which an organization intends to recover an activity or resource: the maximum tolerable period of disruption is the upper bound on this time.
The business needs to utilize the MTPD to develop its processes and contingency processes, and the IT organization need to understand the MTPD to properly develop its technology and RTO which, in turn, will enable the business to achieve its RTO objectives.
At the same time, IT needs to utilize the recovery time numbers developed by the business organization as a basis for its system and infrastructure RTO values.
Standards and planning process
There are so many business continuity and disaster recovery standards to choose from, as well as other related standards of practice, that this might be the reason for all of the confusion. The fact that none of these standards really talk of integrating the business recovery and the IT technology recovery plans together in to the overall process or application development life cycle complicates the matter even further.
There is also the issue that business continuity and/or disaster recovery planning classes are usually only electives in business administration or computer technology/information systems curriculum. So we are not exactly preparing our next batch of business or technology leaders to properly understand the methods, or importance, of contingency planning.
All that being said, most of the standards that exist do have a pretty consistent set of predefined steps to be reasonably successful. So if we take all of the contingency planning steps and align them with the ITIL ALC phases the planning cycle will integrate system development with continuity planning together at the best possible time in the development process.
I will outline the steps below in developing business continuity and disaster recovery plans with their corresponding points within the ITIL application development life cycle:
STEPS IN BUSINESS CONTINUITY AND DISASTER RECOVERY PLANNINGITIL APPLICATION LIFE CYCLE PHASES
1) Understand the Organization
a. Risk Assessment
b. Business Impact Assessment
            i. Determine MTPD for operation
           ii. Develop RTO for Critical Systems
           iii. Develop RPO for Critical Systems
Requirements – requirements gathered based on business needs of the organization
2) Evaluate and Determine Strategy
a. BC strategy to meet RTO/RPO
b. DR strategy to meet RTO/RPO
Design – requirements translated into specifications
3) Develop Plans
a. BCP – Business Organization
b. DRP –IT Organization
Build – Application and the operational model are made ready for deployment
4) Exercise PlanOperate -- IT operates the application as part of the business service
5) Audit and Maintain PlanOptimize

Using the standards and good practices during the requirements gathering phase of the ITIL ALC the business owner should have also conducted the risk assessment and business impact analysis or BIA. The results of these two activities allow the business owner to clearly see the impact on the business of a failure or discontinuation of operations in either, or both, of the business or IT operations. They can then translate that knowledge from the risk assessment and business impact analysis into quantifiable RTO and RPO numbers to be used in the next phase of business continuity and disaster recovery planning (Evaluate and Determine Strategy) and the Design phase of the ITIL ALC.
The RTO and RPO numbers are used to develop alternative strategies that meet the recovery time and point needs. A cost for each alternative design is developed. The cost is the total of the IT cost to design, implement, build and operate; and the business cost for any workarounds or special handling during the outage period; plus costs to load any transactions processed during that outage period into the system (processing re-synchronization) after they are brought back on-line and are processing again as before the incident.
The alternative strategies are then looked at using a cost and benefit (time, reduced workaround complexity, and etc.) analysis of each alternative. The best option will accomplish return to operation in a reasonable time with an acceptable cost to the business and IT. However, the alternative selected will require input from both IT and the business to properly address the risk of outage. The business will need to insure that it can perform the workarounds and still meet all of the business, regulatory and audit needs of the operation for the time period that the alternative defines the IT organization to need for restoring the IT systems needed to restart the application and its associated services.
For the plans to be effective and ‘fit for purpose’ it is very important that the business and IT are on the ‘same sheet of music’ as to recovery times and points. It is no good if the business has planned its resources and workarounds expecting a system recovery time of 24 hours only to find that the system will be down for 48 hours. On the other side of the coin it is not fiscally responsible to pay the cost to expedite the recovery time of an IT system to less than four hours if the business can tolerate an outage period of 24 hours or more at much less cost for the final IT solution.
Once it has been concluded that both plans are consistent with each other, the actual plans can be developed. While the business prepares for implementation of the new application and/or service, IT will make ready the systems and infrastructure needed to also meet the business schedule for implementation.
Exercising the plans
There is one caveat, however. Even if both sides have planned together and developed their plans based on a single and consistent recovery time, the two planning activities still need to verify (via exercising the plans together) that the IT recovery timing (the disaster recovery plan which includes hardware restoration, software restoration, synchronization of databases, and etc.) actually comes in on time to meet the business’ needs as provided for in the business continuity plan.
Only in testing and timing the two recovery processes to ensure that they are coincident can an organization truly be confident that the overall plans will be successful.
The Author
Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV, CRISC has a PhD in Technology and Operations Management and is the chief consulting officer for Recovery-Solutions. Dr. Kennedy has over 30 years' experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of three books, ‘Security in a Web 2.0 World – a standards based approach,’ ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and is author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Dr. Kennedy can be reached at Recovery-Solutions@xcellnt.com

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460.

ARTICLE: Disaster Recovery Planning And Cloud Computing

    by Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV  
  
   January 2011
If you asked a group of IT practitioners or business people what cloud computing is they would probably answer in a manner consistent with blind men trying to describe an elephant with only the sense of touch. Each would have an answer consistent with their own specific perceptions.
In fact Public Cloud Computing is a relatively new term that has been around for only a few years and refers to the use of information technology services, infrastructure, and resources that are provided on a subscription basis. Public Cloud Computing is a Web or Internet accessed business solution where most or the entire computing infrastructure (computers, network, storage, and etc.) are contained remotely from the actual business site and is managed by a third party.
Many companies rely upon Public Cloud Computing in part or in whole for their business operations critical and other wise. So as we look at disaster recovery and Public Cloud Computing we are looking at a relatively new set of risks that need to be addressed to properly protect a business against unforeseen events.
Before I address the areas of concern to DR planning for public cloud computing let me discuss the various popular forms of public cloud computing available to the business.
There are three basic types:

  • Software as a Service (SaaS)
  • Platform as a Service (PaaS)
  • Infrastructure as a Service (IaaS)
Software as a Service (SaaS) is defined as a service based on the concept of renting software from the service provider rather than buying individually for your business. The software is hosted on network servers which are made functionally available over the web or intranet. This service provides software on demand and is currently the most popular type of public cloud computing because of its flexibility, ability to be scaled, and because maintenance is provided by the service provider as part of the cost of the service. There are many CRM, ERM, and unique applications that are all provided as SaaS services. With web-based services all that employees need to do is register and log-in to the cloud provided instance. The service provider hosts both the application and the data so the business user is capable of utilizing the service from anywhere potentially across the globe. With SaaS the service provider is responsible for all issues dealing with capacity, upgrades, security and service availability.
Platform as a Service (PaaS) is defined as a service that offers a platform for developers. The business users develop their own code and the service provider uploads that code and allows access to it on the web. The PaaS provider provides services to develop, test, deploy, host and maintain applications on their development environment. The service providers also provide various levels of support for the creation of applications. Thus PaaS offers a quicker and cheaper model for application development and delivery. The PaaS provider will manage upgrades, patches and system maintenance.
Infrastructure as a Service (IaaS) is defined as a service where the service provider delivers the computing infrastructure as a fully outsourced service. The user can purchase various components of the infrastructure according to their requirements when they need it. IaaS operates on a “Pay as you go” model ensuring that the users pay for only what they have contracted for – such as network, computing platforms, rack space, and environmental (HVAC and power). Virtualization has enabled IaaS vendors to high volumes of servers to customers. IaaS users purchase access to enterprise grade IT Infrastructure and resources and personnel to keep the infrastructure running. No application or monitoring of data bases or data is provided by the hosting vendor above the OS level unless contracted at an additional cost.

Basic Flaw in the "... as a Service" Offerings
In the cloud computing definitions that are evolving, the services in the cloud are being provided by third-party providers and accessed by businesses via the internet. The resources are accessed as a service on a subscription basis. The users of the services being offered most often have very little knowledge of the technology being used, the security being deployed, the availability of the service being offered, or the operating best practices (monitoring, patching, maintenance, and etc.) utilized by the service provider. The business subscribers also have little or no control over the infrastructure that supports the technology or service they are using.

   How to Take Control
Under the standard of “Due Care” and charged with the ultimate responsibility for meeting business information technology objectives or mission requirements, senior management must ensure that the services they contract, which include these “. . . as a Service” solutions are appropriate to meet all of the necessary business requirements including the areas: legal, technical, financial, and operational.
This business continuity due diligence comes only through a thorough vetting of the “. . . as a Service” provider in several areas. I have listed some of the more important ones below.

Legal & Regulatory

  • Will the service provider meet any of you data breach notification requirements (remember even though you are hosting you are responsible for the data under your protection i.e. PHI, PII, and etc.)?
  • Will the provider meet data retention requirements of the business?
  • Will the provider meet the standards for data encryption and protection you require?Are “Safe Harbor” needs met?
  • Data destruction or return on end of contract well defined to meet your business requirements?
  • What is their incident management program?
  • Are they prepared to react in a timely fashion in case of any eDiscovery needs of data they store for you?
     Service Availability
     Are the facilities housing the service provider adequately secured (video surveillance,              access control, and etc.?
Are the RPOs and RTOs consistent with the business’ requirements?
·         How often are backups taken, are they maintained off-site, and have backups and restores been tested to your satisfaction?
·          Are standard backup methods and media used just in case the business needs to bring data back into house?
·     Maintenance and maintenance windows satisfactory with your operational needs?
·   What types of technical security do they employ (i.e., firewalls, virus protection, Intrusion Detection Devices, and etc.)
·     Are their hours of operation coincident with yours?
·     If you are a global company do they provide multilingual support?
·    Are there clear escalation procedures in case of an incident?
·   Does the vendor provide global diversity so if one goes down another can be used in its place?

Operational
·         Do they have a current SAS 70 Type II audit findings report?
·         Have they corrected any areas of concern to your business?
·       What capacity planning do they have in place to meet the growing needs of your business?
·    What standards of practice do they adhere to (i.e., ISO 27001, BS25999, and etc.)?
·       Do they have a patch management program in place and what is it? Does it meet your requirements?
·      Do their SLAs meet your business and operational requirements?
I have developed a hosting questionnaire which each “. . . as a Service” vendor is required to   answer to the satisfaction of my client and I would recommend that you do the same. Sometimes it takes a few iterations to complete the form to the satisfaction of the client, but when completed it does provide documentation of due diligence and a clearer picture of what can be expected from the service provider. If the vendor will not complete the questionnaire then it would be best to move on to another vendor – regardless of cost. If you can’t come to terms before a contract or Statement of Work is signed it will be ten times more difficult after signature to come to an agreement.

In Summary
Now this article has only scratched the surface and provided information on the basic questions that should be asked and answered to protect businesses utilizing “ . . . as a Service” providers. However, the intent of this article was to inform the reader that there are many types of “. . . as a Service” offerings and ways to reduce and/or eliminate problems that I have experienced over the last few years. The issue the article wants to impress upon the reader is one of due diligence. We as corporate or governmental IT security or business continuity experts need to make sure that our organizational leaders have the necessary information to make informed choices for the protection of critical and sensitive information. To allow them to decide between implementing adequate controls and safeguards now to protect against risks or to potentially pay later in reparations and lost confidence of those whose data they (senior management) have been entrusted to protect but have lost or allowed to be taken.

The Author
Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV has a PhD in Technology and Operations Management and is the Chief Consulting Officer for Recovery-Solutions. Dr. Kennedy has over 30 years' experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of two books, ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Author can be reached at Recovery-Solutions@xcellnt.com

For more information about Business Continuity, IT Disaster Recovery and Audit Training and Certification, visit www.sentryx.com or contact info@sentryx.com or call 1-800-869-8460