Continuity Insights

Articles

Managing Operational Resiliency - A Benchmarking Case Study
Fri, 02/29/2008 - 7:00pm
Charles Wallen and Dave White

REF is a maturity model which embodies a series of proven techniques developed by subject matter professionals from private and public sectors in collaboration with the FSTC and Carnegie Mellon. This innovative approach enables organizations to achieve significant
risk and cost management improvements by focusing on refinements to business continuity, security, and IT operations areas. The framework consists of 27 capability modules that provide enterprises the flexibility to implement as few or as many as their needs and strategies require. By providing a detailed description of what capabilities an organization must cultivate to achieve resiliency, the framework establishes a roadmap to building a resilient organization.

Ten leading financial services companies recently conducted benchmarking against the framework. The purpose of the benchmarking project was to provide comparable baseline peer data and validate the preliminary assessment method developed in support of resiliency management.

Background on Resiliency

The quest for better ways to manage risk and achieve resiliency has become one of top priorities in both the pubic and private sectors. Organizations throughout the world are looking for guidance by turning to an ever-increasing set of international standards and models, such as NFPA 1600, the ISO 27000 series, BS-25999, ITIL, and Six Sigma. Governments are expanding their oversight and regulatory activities to help ensure stability and security. Determining how to best manage risk and comply with regulations has become an increasingly complex problem, not to mention a major cost management challenge. It is this complex dilemma of balancing risk and cost that drove FSTC and CERT to undertake the work we are doing on resiliency management.

REF was developed to provide a framework in which to structure the existing practices and establish processes to help ensure compliance. The framework doesn't replace an organization's best practices - it provides a process structure into which these practices can be inserted and managed to address questions such as:

o Are you actively managing operational resiliency, or do you typically react to disruptive events as they occur?
o Are the security and business continuity practices you've implemented effective? Do they support the achievement of the organization's strategic objectives and mission?
o Are the processes used to manage compliance with regulatory guidance optimized for efficiency?
o Are you confident that you can sustain the activities that are protecting your operations?
o Is resiliency part of your organizational culture, or are your day-to-day practices completely dependent on the presence of a small number of experts?
o What can you do to continuously improve your resiliency activities?

History of the FSTC-CERT Resiliency Initiative

Early efforts by FSTC to find better ways to improve business continuity practices focused on resiliency, compliance, improved business recovery methods, and more efficient technologies to ensure systems availability. Carnegie Mellon's CERT group was, at that same time, looking to refine security practices and achieve higher levels of resiliency by leveraging the process improvement techniques developed to support their Capability Maturity Model Integration (CMMI)®. As FSTC researched the latest efforts to improve continuity practices and achieve higher levels of maturity, the security and resiliency work at CERT came to light as powerful and innovative. Following a series of discussions, it became clear that collaborating on defining resiliency had great potential,  particularly because it appeared that business continuity and security were complementary activities in the pursuit of resiliency.

Over the last three years, FSTC and Carnegie Mellon have worked closely to develop the framework, with an eye toward the potential value, to all industries, of a common approach to manage resiliency. Measurement and improvement of resiliency-related activities has remained the key objective. A number of interim frameworks, models, and assessment tools have been refined to bring us to where we are today. The full outline of the REF was developed and released for public comment in the early summer of 2007.

In conjunction with the ongoing FSTC-CERT Resiliency Project activity, a benchmarking project was convened in August of 2007 by ten of the financial services organizations participating in the project. The participants set a course to assess themselves against selected capabilities from the framework. This was to be the first attempt to systematically gather benchmarking data against REF.

Benchmarking Objectives

The following objectives were established by the participants:

o Facilitate a clearer understanding by participant organizations of how their business continuity activities compare to those of others
o Define a set of implied short- and long-term strategies for improving BC programs based on the data gathered in the benchmarking study
o Validate that the REF capabilities, goals, and practices are complete and defined sufficiently to enable benchmarking
o Test the early assessment method to provide input to a more comprehensive appraisal methodology to be developed in 2008
o Produce a sanitized, non-attributional case study describing the benchmarking results
o Collect baseline data to facilitate the identification of trends and/or changes over time

The results and lessons learned from this early benchmarking will provide the baseline for the longer term objective of developing a full appraisal methodology which includes measurements of resiliency capabilities and process maturity.

Benchmarking Methodology

Many of the concepts used in REF derive from proven techniques developed to support CMMI, a process improvement approach that provides organizations with the essential elements to improve software and systems development. The "verification" measurement approach used for the benchmarking was a generalization of techniques drawn from the Standard CMMI Appraisal Method for Process Improvement (SCAMPI)sm. This approach provides reliable and verifiable artifacts that processes are aligned with business objectives and points to weaknesses that can help identify opportunities for improvement.

Self assessments were conducted to determine the extent the participant organizations were performing the practices outlined in the REF capabilities. The assessment was accomplished using the verification approach to objectively determine the performance of the practices defined in REF. The assessment required the identification of artifacts to provide evidence that the organization is performing a specific resiliency practice, e.g. an inventory of completed BC plans to verify that certain service continuity practices are being performed.

Detailed criteria and procedures for assessment were developed up-front by the benchmarking team and were further refined as the benchmarking effort progressed. Templates were created for each capability area that allowed for the collection of the data in a consistent manner. The benchmarking team met regularly to share information on the artifacts collected and provide insights gained from the assessment process. Example results were compiled from the information exchanges to help participants establish a clear and common understanding of what data was required to provide objective evidence of practice performance.

Establishing the Measurement Criteria

By leveraging the SCAMPI concepts developed to support other Carnegie Mellon process improvement models, the project was able to define an effective assessment process rapidly. An assessment approach measured the extent to which an organization was or was not performing a particular practice - more detail on the performance categories is provided below. The following criteria were used to classify the strength of objective evidence that was to be collected:

Direct artifact: The tangible outputs resulting directly from an implementation of a specific practice

Indirect artifact: An artifact that is a consequence of performing a specific practice

Affirmation: An oral or written statement confirming or supporting implementation (or lack of implementation) of a specific practice

For example, if you paint a house, the painted house is the direct artifact. Receipts for the paint or empty paint cans are indirect artifacts. A thank you note to the painter is an affirmation.

In conjunction with the artifact definitions, the following rules were used to characterize the degree to which the organization being assessed has implemented a specific practice:

Fully implemented: One or more direct artifacts are present and judged to be adequate. At least one indirect artifact and/or affirmation exists to confirm the implementation. No weaknesses are noted.

Largely implemented: The direct artifact is present and judged to be appropriate. At least one direct artifact and/or affirmation exists to confirm the implementation. One or more weaknesses were noted.

Partially implemented: The direct artifact is absent or judged to be inadequate. Artifacts or affirmations suggest that some aspects of the practice are implemented. Weaknesses have been documented.

Not yet implemented: The organization or group has not yet reached the stage in the lifecycle to have implemented the practice.

Not implemented: Direct artifacts are absent or judged inadequate. No evidence supports the practice implementation. One or more weaknesses are noted.

Benchmarking Scope

Scoping of an assessment is an important first step to a quality assessment process. A narrow scope was judged essential to achieving usable results within the fairly short timelines set for this effort. Two aspects of scope had to be defined: framework scope and organizational scope.

Organizational Scope establishes the parts of the organization which are being assessed. Examples of organizational scope include the entire enterprise, a single business unit, or a single department. The resource impact associated with this aspect of scope was potentially significant for some of the participants due to their vast size and complexity. To keep the assessment achievable within the time frame of the project, the participants agreed to allow one another the flexibility to declare an organizational scope that was workable. Many of the participating firms declared an enterprise scope; some defined more narrow organizational scopes.

Model Scope considers which capability areas are addressed and how the concept of process maturity is treated. The team chose to focus on those areas of the framework that are most closely related to what is typically considered in the domain of business continuity. Maturity measurement was immediately moved out of scope due the considerable number of complexities it would introduce.

The focus of the assessment was targeted on determining the extent to which the organizations were performing the practices within the selected capability areas.
REF is comprised of 27 interrelated resiliency capability areas, spanning a broad set of activities supporting operational resiliency, in particular security and business continuity. A number of the capabilities included in the framework are from disciplines not typically associated with resiliency, such as asset management, financial management, or communications. Proficiency in managing resiliency requires a holistic perspective across many functional areas to be effective.

The benchmarking participants agreed to focus on the following REF capability areas:
o Incident Management and Control
o Organizational Training and Awareness
o People Management
o Risk Management
o Service Continuity

Data Analysis*

Results of this assessment suggest that participants do have active and sophisticated business continuity programs. There is clear variability among the reporting organizations that may be attributable to differences in the resources or emphasis the participants place on the various practices evaluated. Generally, the data suggests that core activities such as establishing and executing capabilities, exhibited higher levels of implementation than activities associated with follow-up, review, and improvement. Outlier results were noted in a few areas, such as plan execution and event detection, which can be attributed to the fact that both areas are affected by real disruptions.

One of the more interesting outcomes was that organizational training and awareness (OTA) demonstrated the most implementation variation among the five capability areas. Some of the practices in OTA were not implemented at all by one or more participants, and all of the OTA practices showed lower levels of implementation when compared to the other capability areas. Given the importance of ensuring that people in an organization know what and how they are to respond to business disruptions and other operational risks, the results here suggest that OTA is an area that may be ripe with improvement opportunities.

These early benchmarking results are interesting and point to the benefits of continued benchmarking efforts. Larger more stratified benchmarking, which incorporates maturity concepts, will be a key aspect of the resiliency project activities planned for the coming year.

More extensive benchmarking and maturity measurement activities are planned, along with a number of activities aimed at expanded implementations of the resiliency management concepts described here. Over the next year, we will work with organizations to implement REF, benchmark, and build a suite of resiliency management solutions.

Conclusions

The overall conclusion reached by the participants was that appraisal concepts tested in the benchmarking project met the established objectives. Based on feedback from participants, the following additional observations can be made about the benchmarking results:
o The assessment approach and the requirement for artifacts generated scrutiny and objectivity
o The implementation scale (fully, largely, partially, and not implemented) provided insights unavailable from binary checklist approaches
o Gap analysis against the framework identified improvement opportunities
o Comparisons to summary peer data was helpful for prioritizing and justifying improvements, in particular where scores were well below the peer average
o The addition of more comprehensive maturity concepts are necessary to provide adequate measurement and improvement criteria

The results achieved by the resiliency benchmarking initiative conducted by FSTC participants and facilitated by Carnegie Mellon CERT have confirmed the value of the model and validated its effectiveness in providing meaningful objective benchmarks. Additionally, it is clear that refinements are merited to enable objective measurements of process maturity. Over the next several months, additional details will be established around the concepts of resiliency and process maturity as FSTC and Carnegie Mellon CERT continue to collaborate on the REF approach through expanded efforts in the area of operational resiliency management.

Operational risk management is a growing concern for enterprises. Increased demands from the marketplace, regulatory pressures, and process complexities are driving business to explore new paradigms, in particular in the area of operational areas such as business continuity and security. Resiliency is a concept that is increasingly associated with the effective management of risk, but there is a lack of clarity around that terminology. The REF was developed with an eye toward providing that clarity by establishing a model to cost effectively manage and improve operational resiliency. It is a unified model, supported by objective appraisals for guiding operational resiliency management activities.

To learn more about this initiative, contact Charles Wallen or Dave White. Additional information is also available at www.fstc.org and www.cert.org/resiliency_engineering/.

*  Variations in organizational scope among the organizations participating, a small sample size, and the limitations associated with the "pilot" nature of this work require caution when interpreting and generalizing this data.
® Capability Maturity Model, Capability Maturity Modeling, Carnegie Mellon, CERT, CERT Coordination Center, CMM, and CMMI are registered in the U.S. Patent and Trademark Office by Carnegie Mellon University.
SM SCAMPI is a service mark of Carnegie Mellon University.

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading