Use Of The Incident Command System To Manage An IT Outage At Arizona State University
Arizona State University (ASU) is a public research university located in the Phoenix Metropolitan Area of Arizona. With a 2012 enrollment of more than 73,000 students, ASU is one of the largest public universities in the United States by enrollment. The university is a large complex organization with four residential campuses catering to traditional and online students. Though many opt to take classes by traveling to a campus others elect to take all or a portion of their classes online. The campuses house over 10,000 students, including undergraduate, graduate, community college students, and married affiliates. Approximately 2,800 full time faculty and 5,000 staff support ASU’s mission on all four campuses. ASU is considered “One University in Many Places.”
As one might imagine, information technology (IT) system maintenance, monitoring, and support is a vital and essential function and amenity at what many have referred to as a “city within a city.” During the early morning hours on January 18, 2012, a hacker gained access to the ASU computer systems and removed a list of passwords. The breach was detected quickly by using protocols already established. Within hours, an IT team was working to determine the extent of the compromise. Finally, it was determined there was a strong possibility that more than 100,000 email accounts were compromised and subsequently taken offline.
Figure 1: Demographics of those affected at ASU
Mitigation & Preparation
Homeland Security Presidential Directive HSPD-5 was issued following the events of September 11, 2001 and the National Incident Management System (NIMS) was established. The governor of Arizona declared that the state would follow this mandate and NIMS became the management tool for emergency response within the state.
Emergency management personnel at ASU trained a large number of faculty and staff in the use of the Incident Command System (ICS) and NIMS. Steps were taken to prepare, test, evaluate, and correct response and recovery components of trained staff during a variety of situations.
ASU staff is experienced with ICS and in creating incident action plans for football games, bowl games, planned protests and other events that potentially could have a considerable impact on ASU operations. ICS was established in response to a fire in 2007 that impacted the Memorial Union and caused extensive damage. In addition, ICS was established to manage Commencement 2009 when President Barack Obama delivered the keynote address.
ASU had previously recognized the need for a dedicated emergency “bridge” or “conference-line” that could be used to manage an incident regardless of where it might happen. Bridge-line calls are an effective way of managing communication when dealing with a large number of people that are spread out geographically.
As a cautionary note, bridge or conference lines can be rather difficult to control especially when a large number of people enter to discuss issues that affect the areas they control. Because of this, two key roles were established for phone bridge-line calls. A proctor is used to activate the line, record who is present and perform other administrative tasks as needed. A second person is tasked to run the call. Without a leader the conversation can get out of control, with individuals speaking over one another, no direction for competing interests and a host of other problems.
These dedicated bridge lines proved very useful throughout the response to this incident.
The Incident Response
The incident response began with the early detection of the breach and the activation of an information security incident response team by the Chief Information Security Officer. By early afternoon on the same day the IT incident response team recommended the incident be classified as high-level and the email systems which were used by all ASU faculty, staff, employees, students and affiliates (vendors, foundation members, etc.) be shut down. This recommendation was not taken lightly.
One of the roles of the Chief Financial Officer (CFO) is to function as the “emergency policy executive” and liaison between the incident commander and the policy group. Within minutes of establishing the first bridge-line call, a member of ASU’s media team recalled their training in ICS and suggested the use of ICS to manage this incident. Subsequently, the Chief Information Officer assumed the role as the Incident Commander (IC). ASU emergency management personnel had established a basic ICS matrix, as shown in Figure 2.
Figure 2: Basic ICS Structure
The basic concept of ICS is that it is flexible and can be adapted for any incident. The need for additional roles was recognized, while other roles were not needed. Thus, an ICS structure was implemented to meet the specific incident needs.
Figure 3. ICS Structure Used for the IT Outage
Communication To Those Affected
Once it was decided to deactivate or shut off portions of the email system, the choice(s) for notifying those affected had to be made. ASU has a robust communications system used to relay messages to students, faculty, staff, and constituents on a regular basis. Like many, the system relies heavily on the use of email as one of its primary distribution methods. The concern however, was that with a current security breach, any additional time the system remained live there constituted a potential risk for data loss. Based on information that had been gathered throughout the incident, it was decided to use the email system and push an initial notification to the community.
The message sent to stakeholders was quickly picked up by media affiliates. This proved to be extremely helpful in that the media was able to help ASU deliver valuable messages to constituents who had not heard about or were trying to gather additional information about the outage.
One of the most effective communication tools was social media (Facebook and Twitter). These platforms had not been compromised and allowed for broad communication. It should be noted however, that the mere use of these tools caused additional work as postings had to be monitored to ensure that there were no malicious or inaccurate posts, i.e. someone incorrectly posting that classes were cancelled.
In the end, ASU’s communication was successful by using many methods including print and televised media, email, mass telephone messaging, community announcements, social media, and word of mouth to spread the news about the outage and the process for revitalization of the IT system.
As a large portion of the email system of the university had been shut down (impacting more than 100,000 email accounts and interactive sites for staff, faculty, and students), one of the first questions the IC faced was how to prioritize the return of people back to the system when it became operational again. Re-establishing more than 100,000 email accounts at one time was not a realistic option. The system had to be brought back systematically to avoid the likelihood of it crashing.
Since normal ICS principles (life safety, scene stabilization, and property preservation) had already been addressed, the IC was asked to create a hierarchy list after hearing from those affected by the outage. This type of information should come from a pre-incident business impact analysis, but business process importance can change and the specific impacts from an incident often leads to making immediate or timely decisions with only a limited amount of information.
Ultimately, it was decided that groups would be re-connected using the following matrix:
- Executive staff (President and senior administrators)
- Emergency response/management personnel
- Online students and the faculty to support those classes
- Additional faculty students, staff, and other stakeholders or university constituents.
The prioritization list was created with the sense that key and executive staffs were needed as part of the business continuity model. Emergency response personnel were next so they could manage further incidents related to this attack or other situations that might arise. Online faculty and students were placed high in the prioritization in an attempt to keep their courses current. In the early stages of the outage, staff recognized that the payroll and human resource systems were not compromised. Therefore, their rank in prioritization was listed near the bottom.
- It is very important that administrators and directors be trained on the ICS so that when an incident arises, everyone understands their role and authority. It also aids in effective communication of important information to manage an incident.
- A list of alternate email addresses for emergency response and critical personnel should be maintained for use when primary addresses are not available.
- An established list of institutional priorities should be created now, before an incident occurs. When possible, build flexibility into the results of the business impact analysis. For example, an IT incident on the day before payroll or between semesters might yield different priorities. The prioritization list should be used to help guide ICS decisions and the response process.
- It was critical that ASU had a robust and redundant process for communicating with students, faculty, staff and other affiliates. Proficiency with social media and roles should be trained.
- An established bridge-line was extremely valuable in the management process.
- IT outage scenarios must be tested as part of business continuity plan evaluations. It is valuable to have plans “on the shelf” for a variety of outage situations/scenarios. The plans should include a restoration prioritization schedule.
Allen Clark is a 23-year veteran of the Arizona State University Police Department and recently retired as an assistant chief. Clark started his law enforcement career more than 25 years ago when he joined the U.S. Air Force, Air National Guard and was trained as a law enforcement specialist. He was activated for Operation Desert Storm and spent active duty time guarding jets. Chief Clark is well versed in emergency management as both a first responder and as a member of an incident management team. He has spoken many times in numerous states regarding emergency preparedness and management. He holds a BA from Ottawa University in management and is a published author in emergency management. In July 2012 Chief Clark became the Director of Emergency Preparedness at ASU. Clark can be contacted at: firstname.lastname@example.org; Office: 480-965-6328.
Ed Copp, ARM, CBCP, founded Coaching For Resiliency LLC, a consulting firm, after working over 30 years in corporate positions. Ed has been involved in managing a variety of response and recovery efforts including power blackouts, flooding, employee fatalities, labor strikes and protests, and computer interruptions. He has extensive training in ICS and has served on various local and state planning teams. He served as Executive Manager of the Business Emergency Coordination Center during the 2011 Arizona statewide exercise. He had a seat in the state EOC during TOPOFF 4 and assisted in shelter operations for Hurricane Katrina evacuees. Ed has testified on Emergency Preparedness to a state Joint Senate and House of Representatives committee. Ed is known as the Resiliency Coach and he is passionate about educating individuals and families in emergency preparedness and writes and speaks on that topic.
Deborah Lou Roepke, MPA, is Executive Director of the Coyote Crisis Collaborative, a nonprofit, multi-disciplinary center designed to promote and support local community disaster response planning and exercises within the State of Arizona since its early inception in 2004. Prior to this, she served as Director of External and Integrative Relations at Scottsdale Healthcare and Vice President of Government and Foundation Gifts at Scottsdale Healthcare Foundation. She has more than 25 years of experience developing relationships and collaborations with military, healthcare, academic, corporate, and non-military government medical leaders. She served as editor for numerous Arizona Town Halls and has co-written numerous articles, the most recent pertaining to hospital disaster response for the Journal of Trauma Nursing and the American Journal of Disaster Medicine.