Risks of Cyber Attack to Supervisory Control and Data Acquisition for Water Supply

 

A Thesis

Presented to

the Faculty of the School of Engineering and Applied Science,

University of Virginia

 

In Partial Fulfillment

of the Requirements for the Degree

Masters of Science (Systems Engineering)

by

Captain Barry C. Ezell

United States Army

 

Thesis Advisor

 

Yacov Y. Haimes,

Quarles Professor of Engineering

and Applied Science and

Director, Center for Risk Management of Engineering Systems

 

May 1998

 

 

 This paper is also available as a word document for download: SCADA

 

 

ABSTRACT

 

Supervisory control and data acquisition (SCADA) allows a utility operator to monitor and control processes that are distributed among various remote sites. The goal of this thesis is to develop a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful threats to water utility SCADA systems. This framework can assist decisionmakers in understanding the risks of cyber intrusion, their consequences and tradeoffs in order to maximize the survivability of the system. Surety, a measure of survivability, is defined as a measure of system performance under an unusual loading. A survey is conducted to understand the current state of SCADA in water utilities, to document information on cyber intrusion, and to determine the concerns of administrators on system security. Using hierarchical holographic modeling (HHM), sources of cyber risk to SCADA are identified. Event trees and fault trees are used to model the probabilistic consequences of cyber intrusion on water supply systems. Cost, surety, expected level of percentage of water flow reduction, and conditional expected level of percentage of water flow reduction are introduced as performance measures to evaluate policy options. Alternatives are generated and then compared using multiobjective tradeoff analysis. Lastly, a prototype city is analyzed to demonstrate the applicability of the developed methodology. The methodological framework for managing cyber risk to water utility SCADA systems constitutes the major contribution of the thesis.

TABLE OF CONTENTS

  • ABSTRACT *

    CHAPTER 1 INTRODUCTION *

  • 1.1 Cyber Attacks *

    1.2 Stakeholders *

    1.3 Statement of Need *

    1.4 Thesis Tasks *

    1.5 Thesis Overview *

  • CHAPTER 2 SUPERVISORY CONTROL AND DATA ACQUISITION *

  • 2.1 Introduction *

    2.2 Master Terminal Unit *

    2.3 Remote Terminal Unit *

    2.4 SCADA History *

    2.5 Telemetry and the Mainframe Era *

    2.6 SCADA and Micro Computer Era *

    2.7 SCADA or Distributed Control Systems (DCS) *

    2.8 Trends in SCADA and the Internet *

    2.9 Estimating Attacks and Incidents from the Internet *

    2.10 Denial of Service *

  • CHAPTER 3 REVIEW OF RISK AND SYSTEMS ENGINEERING *

  • 3.1 Systems Approach *

    3.2 Total Risk Management *

    3.3 Alternative Approaches and Methods *

    3.4 Probabilistic Risk Assessment (PRA) and Management (PRAM) *

  • CHAPTER 4 FRAMEWORK FOR SCADA UTILITY SURVIVABILITY MODELING *

  • 4.1 Risk Modeling *

    4.2 Internet Survey *

    4.3 Survivability *

    4.4 Taxonomy for Assessing Computer Security *

    4.5 Definitions and Terms for a Taxonomy *

    4.6 Understanding the Taxonomy *

    4.7 Hierarchical Holographic Modeling (HHM) *

    4.8 Recent Uses of the HHM in Identifying Risks *

    4.9 Risk Modeling Using HHM *

    4.10 Goal Development and Indices of Performance *

    4.11 Event Tree and Fault Tree Analysis *

    4.12 Distributions from Event Tree Analysis *

    4.13 Partitioned Multiobjective Risk Method *

    4.14 Multiobjective Tradeoff Analysis *

    4.15 Evaluation *

  • CHAPTER 5 APPLICATION OF THE METHODOLOGY *

    5.1 Introduction *

  • 5.2 Experts for Elicitation *

    5.3 Problem Definition for City XYZ *

    5.4 Identifying Sources of Risk (Phase I) *

    5.4.1 Scenario One (Disgruntled Employee) *

    5.4.2 Scenario Two (Hacker) *

    5.5 Indices of Performance (IP) *

    5.6 Assess the Risks (Phase II) *

    5.6.2.1 Scenario One Event-tree (Current System) *

    5.6.2.1.1 PDF/CDF and Exceedance Probability graphs *

    5.6.2.1.2 Calculations of Benchmarks for Indices of Performance *

    5.6.2.2 Scenario Two Event-tree (Current System) *

    5.7 Generate Alternatives for System (Phase III) *

    5.7.1 Analysis of Results for Scenario One *

    5.8 Draw Conclusions through Tradeoff analysis (Phase IV) *

    5.9 Sensitivity Analysis *

    5.10 Summary of Methodology *

  • CHAPTER 6 CONCLUSIONS *

  • 6.0 Summary *

    6.1 Contributions *

  • 6.2 Future Work *

    REFERENCES *

    APPENDIX A SURVEY *

    APPENDIX B EVENT TREES FROM EXPERT ONE *

  •  

    ACKNOWLEDGMENTS

    I would like to thank my advisor Professor Yacov Y. Haimes for the many hours of his time that he has shared with me. His ability to coach, teach, and mentor has enabled me to chart a course for two years at the University of Virginia. His dedication to students is an example that I will try to emulate when I become an instructor at West Point. Secondly, I would like to thank Professor James H. Lambert for the hours of valuable help in understanding the math and science presented in this thesis.

    I am especially thankful to my mother who assisted in editing the thesis and survey. Also, Bruce Freer, Senior Area Manager, Rockwell Automation provided a valuable service by teaching me about the software components of programmed language controllers and communication. I am grateful to the SCADA mail list and in particular Ian Wiese, Anthony Nelson, and Cary Hillebrand. They spent countless hours teaching fundamentals and helping me design a worthy example problem to present the methodology.

    Finally, I would like to thank my wife Debbie. She provided emotional support and managed our home front, allowing me to focus on this thesis and graduate school.

  • LIST OF FIGURES

    Figure 2-1 Generic SCADA System (Boyer 1993) *

    Figure 2-2 Water Distribution System and SCADA (Wiese and Ezell 1997) *

    Figure 2-3 Inputs & Outputs for MTU (Boyer 1993) *

    Figure 2-4 Inputs & Outputs for RTU (Boyer 1993) *

    Figure 2-5 DISA Vulnerability Assessments (GAO 1996) *

    Figure 2-6 Difficulty vs. Damage in Attacking Networks *

    Figure 3-1 The Steps of Risk Analysis (White and Pooch 1996) *

    Figure 3-2 ISS Risk Management Model (ISS 1997) *

    Figure 4- 1 Risk Management Framework *

    Figure 4- 2 SCADA Black Box Model *

    Figure 4- 3 Taxonomy of Computer and Network Attacks (Howard 1995) *

    Figure 4- 4 HHM for SCADA and Water Utilities *

    Figure 4- 5 Goals Tree Water Utility SCADA System *

    Figure 4- 6 Event tree Development (Ang and Tang 1984) *

    Figure 4- 7 Event Tree for Cyber Intrusion *

    Figure 4- 8 Fault Tree for Firewall System *

    Figure 4- 9 Example Consequences from Event Tree *

    Figure 4- 10 Partitioning on the Probability Axis (Asbeck and Haimes 1984) *

    Figure 4- 11 Black-box Model of SCADA System *

    Figure 5- 1 Water Distribution System (Wiese et al., 1997) *

    Figure 5- 2 SCADA for City XYZ (Wiese et al. 1997) *

    Figure 5- 3 HHM for City XYZ *

    Figure 5- 4 Event Tree for Scenario One (Disgruntled Employee) *

    Figure 5- 5 PDF / CDF for Scenario One (Expert 1) *

    Figure 5- 6 Exceedance Probability and PMRM Scenario One (All Experts) *

    Figure 5- 7 Event Tree for Scenario Two (Hacker) *

    Figure 5- 8 Exceedance Probability for Scenario Two (All Experts) *

    Figure 5- 9 Exceedance Probability for Each Alternative *

    Figure 5- 10 Alternatives, ƒ1 vs. ƒ2, ƒ5, and ƒ4 *

    Figure 5- 11 Parameter Sensitivity Comparison *

    Figure 5- 12 Model Sensitivity Comparison *

  •  

  • LIST OF TABLES

     

    Table 2-1 Estimates of Cyber Attacks in 1995 (Howard 1995) *

    Table 2-2 Summary of Attack Estimates for Utilities *

    Table 2-3 Estimation of Risk (Howard 1995) *

    Table 3-1 Examples of Likelihood and Outcome (Kumamoto and Henley 1996) *

    Table 4-1 List of Terms (Cohen 1995) *

    Table 4-2 Alternatives vs. Performance Measures *

    Table 5- 1 Security Level Function (Boyer 1993) *

    Table 5- 2 Estimates for Scenario One (Disgruntled Employee) *

    Table 5- 3 Estimates for Scenario Two (Hacker) *

    Table 5- 4 City XYZ’s SCADA System Performance for Scenario One *

    Table 5- 5 Probability and Cost Estimates (Nelson et al. 1998) *

    Table 5- 6 Results for Each Alternative *

    Table 5- 7 Parameter Sensitivity Results *

    Table 5- 8 Model Sensitivity Results *

  •  

     

    CHAPTER 1 INTRODUCTION

  • 1.1 Cyber Attacks
  • Cyber attacks of supervisory control and data acquisition (SCADA) in water supply are uncommon and discussion of this matter has been limited. The Computer Emergency Response Team (CERT) data collected from 1989-1995 has no mention of infrastructure attacks (Howard 1995). However, the President’s Commission on Critical Infrastructure Protection (PCCIP) conducted a year-long study concluding that cyber threats are a clear danger (risk) to all infrastructures (PCCIP 1997). Estimates by various experts for 1995 placed cyber attacks between 48,000 and 44,000,000 (Howard 1995). General Marsh (retired), chairman of the PCCIP estimated to Congress that 80 percent of cyber attacks were by vandals or disgruntled employees (Aviation Weekly 1997). In short, the internet is doubling in size every 12-15 months and water utilities are becoming increasingly interconnected, interdependent, and moving toward common protocols like Transmission Control Protocol/Internet Protocol (TCP/IP). The purpose of this thesis is to answer the following questions:

    Are supervisory control and data acquisition (SCADA) systems used by water utilities vulnerable to cyber attack in the short term (five years)? And if so, how do we improve the survivability of the system through risk modeling, assessment, and management?

     

    One can argue that vulnerability is simply evaluating where exposure is greatest and access control is weakest (NSTAC 1997). In computer systems like SCADA, exposure is connectivity and visibility. Phone number pools for data transfer, dialin connections, and internet connectivity are examples of exposure. A water utility with a three-letter domain of .com enjoys less exposure to hackers than .gov or mil. Exposure is greater on three letter domains that are .mil or .gov. Utilities that share .com are in a pool of milllions of hosts that continue to double in size every year, where .gov and .mil have leveled off (Egnst 1997). Access controls are those active and passive measures taken to control access to all points of connectivity for the computer system.

  •  

    1.2 Stakeholders

  • There are four groups that can benefit from this thesis. The first and most important is the public, who expect uninterrupted flow of clean water. Secondly, water utilities, which are responsible for providing the service and the potential target under study in this thesis. Industry, which stands to benefit from this thesis because it designs the system and the software for utilities. Lastly, the government at all levels is responsible for the public’s water supply. All groups share similar objectives -- cost, profit, safety, survivability, security, surety, etc. Many objectives are conflicting. Industry seeks to maximize profit, while a potential client like a water utility or the taxpayer wishes to minimize the cost. Multiple objectives, often noncommensurable and in conflict, along with multiple decisionmakers, set the stage for the multifarious nature of problem solving that this thesis seeks to model.

     

    1.3 Statement of Need

    Despite the existence of probabilistic risk assessment and decision support tools, most approaches in computer security are qualitative, offering no quantitative metrics in modeling and evaluating the problem. One reason is that cyber attacks are a rare event. Instead of using precise distributions and reliability theory, there is a great deal of uncertainty in information, subjectivity of risk exposure, and true vulnerability.

    This thesis presents a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful intrusion into water utility SCADA systems. This framework can assist decisionmakers in understanding the risks of cyber intrusion, their consequences, and the associated tradeoffs in order to improve the survivability of the system. The development of a framework for cyber risk management to water utility SCADA systems is the centerpiece and major contribution of the thesis.

     

    1.4 Thesis Tasks

    The tasks completed in this thesis are as follows: (1) establishment of historical results of an internet-based survey that details the current state of SCADA in water utilities and documents information on cyber intrusion and system security; (2) research and review of the history of SCADA; (3) research and review of current risk management techniques in computer security; (4) identification of the sources of cyber risk to SCADA; (5) research of techniques in probability risk assessment through event and fault tree analysis; (6) application of extreme event analysis to the assessment of water supply reduction; (7) research of techniques in modeling survivability with the concept of surety, and (8) development of a realistic example problem from data aggregated from the internet survey to apply the framework developed in this thesis.

     

    1.5 Thesis Overview

    Chapter one describes the general information about SCADA, trends, goals, and tasks that set the conditions for developing a methodology to manage cyber attacks. The purpose of chapter two is to introduce the wide range of expert opinion on cyber attacks. It begins by defining and reviewing the history of SCADA. Chapter two highlights a study where the Defense Information Systems Agency (DISA) attacked government systems in order to gauge the effectiveness of their security. It concludes by making a comparison between government computer systems and water utility SCADA systems. The purpose of chapter three is to review and understand the current risk analysis methods used in computer security. After reviewing classic risk analysis and the concept of annualized present value of expected loss, it reviews and compares the qualitative approach used by software companies to probabilistic risk assessment as a more quantitative approach used in nuclear power, transportation, rail, etc. Chapter four is the center of gravity for the thesis, disclosing the major findings and lessons learned from the survey, presenting the major modeling tools used in the thesis, and culminating with a multiobjective framework to manage risk. Chapter four accomplishes the central goal of the thesis -- to present a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful intrusion to water utility SCADA systems. The centerpiece in chapter four is Figure 4-1 Probabilistic Risk Management Framework. It graphically depicts the risk management framework for water utility SCADA systems. Chapter five introduces a prototype city designed from aggregated data from the internet survey in order to apply the methodology described in chapter four. Chapter six conveys the major findings, contributions, and recommendations that result from the thesis. Appendix A contains the actual survey that was posted on the internet for water utilities.

     

    CHAPTER 2 SUPERVISORY CONTROL AND DATA ACQUISITION

     

    2.1 Introduction

    Supervisory control and data acquisition (SCADA) is a system that allows an operator to monitor and control processes that are distributed among various remote sites (Boyer 1993). There are many processes that use SCADA systems: hydroelectric, water distribution and treatment utilities, natural gas, etc. SCADA systems allow remote sites to communicate with a control facility and provide the necessary data to control processes. For many of its uses, SCADA provides an economic advantage. As distance to remote sites increase and difficulty to access increases, SCADA becomes a better alternative to an operator or repairman’s visiting the site for adjustments and inspections. Distance and remoteness are two major factors for implementing SCADA systems (Boyer 1993).

    There are four major elements to a SCADA system: the operator, master terminal unit (MTU), communications, and remote terminal unit (RTU). The operator exercises control through information that is depicted on a video display unit (VDU). Input to the system normally initiates from the operator via the master terminal unit’s keyboard. The MTU monitors information from remote sites and displays information for the operator (Figure 2-1). The relationship between MTU and RTU is analogous to master and slave. Depending on the complexity or sophistication the MTU may employ heuristics embedded into its programming that allow it to make modifications to the system to maintain optimality. In the same fashion, the sophistication in the RTU may allow local

    Figure 2-1 Generic SCADA System (Boyer 1993)

    optimization of functions. Figure 2-2 depicts the topology for SCADA and water supply for a small city. Note that it is quite possible that systems employ more than one means to communicate to remote sites. SCADA systems are capable of communicating using a wide variety of media such as fiber optics, dial-up, or dedicated voice grade telephone lines, or radio. Recently, some utilities have employed Integrated Services Digital Network (ISDN) (Lambert 1997). Since the amount of information transmitted is relatively small (less than 50K), voice grade phone lines, and radio work well (Boyer 1993).

     

     

    Figure 2-2 Water Distribution System and SCADA (Wiese and Ezell 1997)

     

    2.2 Master Terminal Unit

    At the heart of the system is the master terminal unit (MTU). The master terminal unit initiates all communication, gathers data, stores information, sends information to other systems, and interfaces with operators (Boyer 1993). The major difference between the MTU and RTU is that the MTU initiates virtually all communications by its programming and people. Almost all communication is initiated

     

     

    Figure 2-3 Inputs & Outputs for MTU (Boyer 1993)

    by the MTU (Boyer 1993). The MTU also communicates with other peripheral devices in the facility like monitors, printers or other information systems. The primary interface to the operator is the monitor that portrays a representation of valves, pumps, etc. As incoming data changes, the screen is updated. Figure 2-3 shows examples of inputs from the MTU and field devices.

     

    2.3 Remote Terminal Unit

    Remote terminal units gather information from their remote site from various input devices, like valves, pumps, alarms, meters, etc. Essentially, data is either analog (real numbers), digital (on/off), or pulse data (e.g., counting revolutions of meters). Many remote terminal units hold the information gathered in their memory and wait for a request from the MTU to transmit the data. Other more sophisticated remote terminal units have microcomputers and programmed language controllers (PLC) that can perform direct control over a remote site without the direction of the MTU. Figure 2-4 shows an example of outputs of the RTU to the MTU and field devices.

    Figure 2-4 Inputs & Outputs for RTU (Boyer 1993)

    The RTU central processing unit (CPU) receives a binary data stream in accordance with the communication protocol. Protocols can be open, like Transmission Control Protocol and Internet Protocol (TCP/IP) or proprietary. Data streams generally contain the information that is organized according to the seven layer Open Systems Interconnection Model (OSI Model). The OSI Model is used to set standards in the way information is exchanged with respect to protocols, communication, and data. The RTU receives its information because it sees its identification embedded in the protocol. The data is then interpreted, and the CPU directs the appropriate action at the site.

     

    2.4 SCADA History

    SCADA can be traced to the development of telemetry from the first half of the century. The technology of rockets and aircraft afforded man with the opportunity to investigate weather and planetary data. This required a simple way to get data that observers could not normally achieve from space (Boyer 1993). Manned stations on the surface of the Earth such as lighthouses, post offices, weather stations, etc., were able to collect and monitor data on weather. However, for accurate weather prediction, more detailed information was needed from the atmosphere. There were two questions to be answered. How could accurate data be gathered from the atmosphere and communicated back to a facility on the Earth’s surface? And, how might data be gathered from a number of sites in one centralized location to record, analyze, and then predict the weather (Boyer 1993)?

     

    2.5 Telemetry and the Mainframe Era

    The solution came from railroad companies that used telemetry devices. Railroads used telemetry that gave information on train location and switch status. During this time, advances in radio technology improved, removing the requirement to lay hundreds of miles of wire (Boyer 1993). Developments in error correction and data compression allowed more information to be reliably sent via radio. Throughout the century, more industries, such as automation plants, gas, electric, and water utilities, began using telemetry systems to monitor processes and remote sites. Two-way radio communication became common in the early sixties (Boyer 1993). During this time mainframe computing became the paradigm. Terminals with no intelligence used the mainframes to perform all calculations and store data. This method changed in the early eighties with the development of the microcomputer.

     

    2.6 SCADA and Micro Computer Era

    This era allowed information and intelligence at the fingertip of the user. The microcomputer allowed process control to be distributed among the remote sites, freeing up the dependency of the central facility mainframe. By the late 1980s, industry began shifting to the distributed systems era. Characterizations of this era are Wide Area Network (WAN) and Local Area Network (LAN) integration, open standards, relational information modeling, and icon driven applications (Applegate 1996). In the late nineties, a new computing era emerged. Management Information System (MIS) professionals refer to this time as the ubiquitous era (Applegate 1996). This is a time when all types of configurations of intranets, WANs, and LANs are conceivable. The lines are blurred among different servers with responsibilities. During this era, the need for master-slave SCADA has significantly diminished. Programmed language controllers now have the capability to monitor and control local sites. Users of SCADA have begun changing as well. Industries like electric utilities retained a centralized philosophy. However, oil and gas production companies have shifted to a more decentralized mode, putting the control of fields back in the hands of field operation specialists (Boyer 1993). There is a new trend emerging among the makers of software for SCADA systems. While current systems tend to have programming logic for PLC located at remote sites, a new method of placing this code back under the control of a central facility is developing. This is accomplished by allowing the PLC code to be embedded in Microsoft Windows software. Many small companies see this as a way to regain a market share from the big players like Rockwell Automation and Wonderware. Smaller companies boast that they can provide a much cheaper means to control remote sites if all code is maintained on the master device commonly referred to as soft PLC. Also, smaller companies argue that personal computers running windows are better suited (e.g., more memory, hard drive space, faster) to store larger code than a Program Language Controller (Freer 1997).

     

    2.7 SCADA or Distributed Control Systems (DCS)

    From its inception in the 1960s, SCADA was understood as a system that was primarily concerned with I/O from Remote Terminal Units. In the early 1970s, DCS was developed. The ISA S5.1 standard defines a distributed control system as a system which while being functionally integrated, consists of subsystems which may be physically separate and remotely located from one another. DCS were originally developed to meet the requirements of large manufacturing and process facilities that required significant amounts of analogue control.

    The major differences with SCADA and DCS are as follows:

  • 1. Historically, Distributed Control Systems use Program Language Controllers and SCADA uses Remote Terminal Units.

    2. A PLC has more intelligence than a Remote Terminal Unit.

    3. Unlike RTU, a PLC is able to control sites without the direction of a master (Byrne 1997).

  • The lines between the two have blurred considerably in the late 1990s. SCADA systems have DCS capabilities. DCS have SCADA capabilities. Systems are tailored, depending on the operation they intend to control. Water utilities are following other industries and becoming more interconnected to the Internet. Also, the control function of what were once old telemetry systems is becoming more advanced, interconnected, and accessible through Internet, dialin, etc. This interconnectivity has paralleled the era of the ubiquitous distributed computing environment. For the purposes of this thesis, all systems used to control water supply systems are termed SCADA. Also, this thesis focuses on traditional SCADA that behaves as a central controlling system.

    There is a perception in the SCADA and water business that their systems are secure from cyber intrusion and not likely targets. However, the President's Committee on Critical Infrastructure Protection (PCCIP) concluded that cyber threats are a clear danger (risk) to all infrastructures (PCCIP 1997).

     

    2.8 Trends in SCADA and the Internet

    The Internet continues to grow at a phenomenal rate, doubling in size every 12-15 months (Engst 1996). By 2001, projections show the Internet to have over 200 million hosts and 2 billion people interconnected (Howard 1995). A host is a domain name that has an Internet Protocol address record associated with it (e.g., virginia.edu). This would be any computer system connected to the Internet (via full or part-time, direct or dial-up connections). A domain has a name server record associated with it. The internet represents an ever-increasing medium for people to share information and processes, hence the temptation for misuse. With this growth, more industries are executing more financial transactions, online banking, and sharing vast amounts of information. Likewise, utilities are beginning to take advantage of this medium, due to deregulation and the Federal Energy Regulation Committee (FERC) ruling for electric utilities and industry to share information (NSTAC 1997).

    Companies like Rockwell International are developing a number of automation control software products that can exploit the interconnectedness of the Internet. RS View 32, RS Portal, and RS Tools represent a portfolio of software that incorporate the newest Microsoft technologies of Active X and Object Linking and Embedding (OLE). RS Portal allows a user the ability to connect to a SCADA system from a remote site and download information or control a process.

    Considering the advances in process control and the Internet, executives, information system managers, and system analysts should have a greater understanding of potential misuses of this communication medium. However, in research, conducted by R.C. Hollinger, system security did not rank among the top twenty of management issues (Gray 1994). This lack of priority indicates a low awareness in the risks of cyber attack and exposure from the internet and within a company.

     

    2.9 Estimating Attacks and Incidents from the Internet

    Currently, there is a wide range of results that have been developed for estimating the likelihood of an attack or incident from the Internet. For the purposes of this thesis, a comparison of results from the Defense Information Systems Agency (DISA 1996), Air Force Information Warfare Center Security Posture Studies January 1995 (AFIWC 1995), and Howard’s thesis is used. An attack is a single unauthorized access or use attempt. An incident is a group of attacks that are distinguished by attacker, tools, results, or objectives (Howard 1995).

    The range of values for attacks and incidents is difficult to measure. The size, ubiquity, and complexity of the system do not allow more traditional methods for quantifying the probability of attacks. DISA (1996), AFIWC (1995), and Howard (1995) employed different methods to make their assessment. AFICW used organized attacks against Air Force bases. DISA conducted vulnerability studies by attempting to penetrate computer systems. Howard (1995) collected data from CERT between 1989 and 1995 and estimated that incidents are approximately 10 percent of attacks.

    Another estimate had the number of attacks in 1995 at 44 million (Cohen 1995). A summary of the results is provided in Table 2-1.

    DISA conducted its vulnerability studies from 1992-1995. They simulated 38,000 attacks. The results were that 35 percent were blocked, 4 percent detected, and 27 percent reported. Figure 2-6 summarizes the findings. (Kyas 1997) states that systems connected to the internet are eight times as likely to suffer from hacking than

    Table 2-1 Estimates of Cyber Attacks in 1995 (Howard 1995)

     

    those not connected to the internet. Also, he states that of all hacks of sites, 80 percent were via the internet and 20 percent internal to the organization.

     

    Figure 2-5 DISA Vulnerability Assessments (GAO 1996)

    Assuming that water utilities are at least as good as the systems DISA attacked, the probability of detecting an attack, given an attack succeeded can be estimated by the statistic , the proportion of trials that attacks succeed, where X is the number of attacks detected and n is the total number of successful attacks. Using a confidence interval , where is the percentile from the standard normal distribution and and 1- is the confidence interval, gives the estimate contextual meaning (Milton and Arnold 1990). For a confidence interval of 95 % means =0.05 and = 1.96 from the standard normal table. Therefore, the probability of detecting an attack is estimated to be

    = . Table 2-2 summarizes the likelihood for detection, reporting, and a successful attack.

    Table 2-2 Summary of Attack Estimates for Utilities

     

    The typical water utility will probably not be attacked by a cyber terrorist this year, or next year for that matter. It probably will not be exposed to a devastating flood either. Or, a given car will not be stolen. In at least two of these three cases people and organizations buy insurance or make arrangements in an attempt to protect themselves from these rare events. The amount of insurance or protective measures depends on public law, policy, or personal risk aversion. How much risk is acceptable? At what level of risk do people insure themselves or our assets? Table 2-3 provides an estimate of attack from the internet and a comparison to other types of rare or extreme events.

    Table 2-3 Estimation of Risk (Howard 1995)

     

    2.10 Denial of Service

    Intuitively, results of an attack are not equal in difficulty for the attacker and in consequences for the utility. Figure 2-7 provides a conceptual relationship between a result and its consequences (damage) to a system.

     

    Figure 2-6 Difficulty vs. Damage in Attacking Networks

    Denial of service is the major potential source of danger for a SCADA system. Denial of service attacks intentionally blocks or degrades a computer or network (Amorosa 1994). An attacker makes resources inoperative by taking up the shared resources’ time so that other processes are effectively stopped. This can be accomplished by taking up disk space, CPU slice, network applications, etc. (Garfinkel 1996). Howard (1995) writes:

  • Denial of service attacks over the Internet can be directed against three types of targets: a user, a host computer, or a network. ...an attacker must begin a denial of service attack by using tools to exploit vulnerabilities and then either obtain unauthorized access to an appropriate process or group of processes, or use a process in an unauthorized way. The attacker then completes that attack by using some method to destroy files, degrade processes, degrade storage capability, or cause a shutdown of a process or of the file system.

     

  • In summary, there are two types of information used for gauging vulnerability from the internet: attacks and incidents. Attacks are single events directed against a host. Incidents are a grouping of attacks that are distinguished either by method, attacker, or results. There is no exact information on the probability of attacking a water utility and a great deal of uncertainty in estimates. Assuming that utilities are as equally likely as any other host that is connected to the internet, Howard (1995) estimates that the rate of attack for root break-in could be as high as 1 attack in 10 years. Clearly, hosts are not equally likely to be attacked. Government and military sites are attacked at a considerably higher rate than other three letter domains. There exists potential, given access to data, the ability to determine the distribution of cyber intrusion with respect to three-letter domain. Table 2-3 sums up the various probabilities associated with an attack, based on the DISA findings (GAO 1996).

     

    CHAPTER 3 REVIEW OF RISK AND SYSTEMS ENGINEERING

     

     

    3.1 Systems Approach

    The systems approach to problem solving is ideally suited for encapsulating Total Risk Management. Systems engineering, unlike other engineering disciplines, start at the top and decompose a problem into smaller problems that support the goals of the entire system at every level. This holistic approach has the advantage of unifying the effort of designing complex systems and nesting the goals from one level to the next. Gibson (1991) identified six major phases:

    - determine the goals of the system,

    - establish criteria for ranking alternative candidates,

    - develop alternative solutions,

    - rank alternative solutions,

    - iterate, and

    - action.

    Risk management should be conducted as an integral part of systems engineering. Many of the evaluation criteria might include risk functions such as minimizing the expected value of damage or maximizing the surety of the system. Haimes (1998) developed 13 steps that include one aspect that is missing from Gibson’s approach:

    1.  
    2. Define and generalize the client’s needs. Consider the total problem environment. Clearly identify the problem.

       

    3. Help the client determine his or her objectives, goals, performance criteria, and purpose.

       

    4. Similar to step one, considers the total problem’s environment. Evaluate the situation, constraints, limitations, and available resources.

       

    5. Study a understand the interactions among the environment, technology, system, and people involved.

       

    6. Incorporate many models and synthesize. Evaluate the effectiveness and check the validity of the models.

       

    7. Solve the models through simulation and/or optimization.

       

    8. Evaluate various feasible solutions, option, and policies. How does the solution fulfill the client’s need? What are the costs, benefits, and risk tradeoffs for each solution?

       

    9. Evaluate the proposed solution for the long term and the short term.

       

    10. Communicate the proposed solution to the client in a convincing manner.

       

    11. Evaluate the impact of current decisions on future options.

       

    12. Once the client has accepted the solution, work on its implementation. If solution is rejected, return to the above steps to correct it so that the client’s desires are fulfilled.

       

    13. Post audit your study.

       

    14. Iterate at all times.

     

     

     

    3.2 Total Risk Management

    Haimes (1998) defines total risk management (TRM) as a systematic, statistically based, holistic process that builds on a formal risk assessment and management. TRM answers the risk assessment questions (Kaplan 1997),

  • - What can go wrong?

    - What is the likelihood that it will go wrong?

    - What are the consequences?

  • risk management questions (Haimes 1998),

  • - What can be done?

    - What options are available and what are their associated tradeoffs in terms of cost, risks and benefits?

    - What are the impacts of current management decisions on future options.

  • and sources of failures (hardware, software, organization, and human) within a multiobjective framework (Haimes 1998).

     

    3.3 Alternative Approaches and Methods

    In computer security, there are a few documented techniques for evaluating security and risk to computer systems. Although none specifically address SCADA, they are similar because SCADA has many of the same attributes as computer networks. Cooper (1989) has a nine-step approach. He defines risk analysis as "a technique for quantitative assessment of relative values of protective measures". Cooper’s metric is the annual loss expectancy ALE. The ALE method seeks to value the asset (e.g., network worth $100,000). Next, the expected value e is equal to probability of loss of the asset per year times the value v of the system.

    ALE = e = p•v

    Cooper’s risk analysis methodology is summarized below:

  • 1. Identify and value assets. (What must be protected?)

    2. Identify threats. (Protect against what?)

    3. Identify vulnerabilities. (What are the potential ways in which threats can be realized?)

    4. Estimate risks. (What is the probability of a vulnerability?)

    5. Calculate ALE for each vulnerability. (What is the statistically expected loss?)

    6. Identify potential protective measures. (How can assets be protected against threats?)

    7. Estimate ALE reductions for each vulnerability due to each protective measure. (What is the statistically expected amount saved?)

    8. Select cost-effective protective measures. (How are assets best protected against threats?)

    9. Respond to experience by modifying protective measures, by recovering from disasters, and by prosecuting transgressors. (How can feedback be used?)

  • White and Pooch (1996) define computer security risk analysis as "the process of identifying and evaluating the risk of being successfully attacked and suffering a loss of data, time, and person-hours versus the cost of preventing such a loss." The ultimate goal of the analysis is to determine the strengths of the computer’s security and areas that need to be improved. The benefits are improved security and understanding the system and its flaws. The metric is the comparison of the burden B of preventing a loss juxtaposed to the probability P of a loss L. (White and Pooch 1996) refer to this approach as BPL. In this method the burden is the benchmark for determining if a solution is worthy of consideration. The three-step method is shown below.

     

    Figure 3-1 The Steps of Risk Analysis (White and Pooch 1996)

    Internet Security Systems, Inc. (ISS) is a company that specializes in computer and network security. The ISS approach is one of an adaptive security model. Their formula for risk analysis is captured in an equation as a subset of security:

  •  

    Security = Risk Analysis + Policy + Implementation + Threat/Vulnerability Monitoring + Threat/Vulnerability Response

  • The ISS Adaptive Security Model uses a portfolio of software that automates the security process. It seeks out problems and notifies the system administrator by conducting the following:

  • - attack analysis and response,

    - misuse analysis and response,

    - vulnerability analysis and response,

    - configuration analysis and response,

    - risk posture and response,

    - audit and trends analysis, and

    - real-time user awareness support.

  •  

    Figure 3-2 ISS Risk Management Model (ISS 1997)

    This is accomplished primarily through their SAFEsuite™ software. ISS states that the adaptive security supports a 100 percent solution to computer security yet admits that risk can not be reduced to 0 percent. Their risk assessment process appears to be qualitative, implying that there is no mathematical or quantitative detail documented in their approach. Figure 3-2 summarizes risk analysis for ISS.

    Cohen (1997) relates classic risk analysis to computer networks as "simply listing the events for the network, determining probabilities for each event, calculating the expected loss for each event and the ROI for each mitigation technique, and doing the arithmetic." Cohen (1997) describes the following:

  • Standard risk analysis asserts that we calculate an expected loss (L) by multiplying the probability of each event (p(e)) that can cause a loss by the expected loss from that event (l(e)) and adding these results for all of the events (all e in E). Mitigation strategies are then optimized by examining each proposed mitigation technique to derive the reduction in expected loss associated with the technique's use, dividing by the cost of the mitigation technique to derive a return on investment (ROI), and applying the most cost effective (i.e., the highest ROI) method first. Apply methods until no technique with a high enough ROI for the organization is left, and you are done (Cohen 1997).
  • Cohen (1997) believes that the size, dynamics, and complexity of computer networks do not lend themselves to risk analysis. He states that it is impossible to enumerate all the possibilities, mitigating strategies, and quantify damage.

     

    3.4 Probabilistic Risk Assessment (PRA) and Management (PRAM)

    Probabilistic Risk Assessment is a quantitative approach used in many fields like nuclear, transportation, or rail. Kumamoto and Henley (1996) define risk as "a combination of five primitives -- outcome, likelihood, significance, causal scenario, and population affected." Mathematically, they define risk as

    , assuming n potential outcomes (Oi), where losses (Li) have some outcome for a causal scenario (CSi), and there is a population affected by some outcome (POi) (Kumamoto and Henley 1996). The purpose of risk assessment is the derivation of risk profiles posed by a given situation (Kumamoto and Henley 1996). "The purpose of risk management is to propose alternatives, evaluate risk profiles, make safety decisions, choose satisfactory alternatives to control the risk, and exercise corrective actions (Kumamoto and Henley 1996)." Assessment and management taken together are probabilistic risk assessment and management (PRAM). The PRAM approach recognizes that there are two ways to express risk profiles (e.g., outcomes versus likelihood). The likelihood of an outcome can be expressed objectively, subjectively, or combinations of both. Examples of objective likelihood are probabilities, percentage, or distributions. When there is limited information, likelihood may be subjectively evaluated and even assessed as possible, plausible, rare, or frequent (Kumamoto and Henley 1996). To study scenarios and failures, PRA uses event trees and fault trees to show causal scenarios.

    Table 3-1 Examples of Likelihood and Outcome (Kumamoto and Henley 1996)

    Kumamoto and Henley (1996) point out that different PRA of the same problem can lead to different trees because tree generation is an art, not a science. Unlike Cohen (1997), who is suspect of PRA because of limited information, size, dynamics, and complexity of computer systems, Starr (1987) makes these comments:

  • In the nuclear field emphasis on PRA has focused professional concern on the frequency of core melts. The argument to whether a core can actually melt with a projected probability of one in one thousand per year, or in a million per year, represent a misplaced emphasis on these quantitative outcomes. The virtue of risk assessments is the disclosure of the system’s causal relationship and feedback mechanisms, which might lead to technical improvements in the performance and reliability of the nuclear stations. When the probability of extreme events become as small as these analyses indicate, the practical operating issue is the ability to manage and stop the long sequence of events which could lead to extreme end results. Public acceptance of any risk is more dependent on public confidence in risk management than on the quantitative estimates of risk consequences, probabilities, and magnitude (Starr 1987).
  • Haimes (1995) would disagree with Cohen as well. Translating Toffler's vision (Alvin Toffler: Powershift, Bantam Books 1990)

  • As we advance into the Terra Incognito of tomorrow, it is better to have a general and incomplete map, subject to revision and correction, than to have no map at all.
  • into the risk assessment process implies that a limited database is no excuse for not conducting sound risk assessment. On the contrary, with less knowledge of a system, the need for risk assessment and management becomes more imperative (Haimes 1995).

    Is PRAM the best approach? This type of quantitative risk analysis is an established approach recognized by government and industry. However, research indicates a less rigorous approach by some practitioners, attempting to make computer network systems one hundred percent secure. Some software companies conduct a qualitative risk analysis and describe multiple management strategies. They follow the steps, yet the crucial elements of PRA (risk quantification, modeling, and analysis) are missing. What are the tradeoffs? How much security is enough? Invariably, analysis leads to the same conclusion -- buy their software and the problem is solved. Their approach proceeds from a false notion that their methodology can prevent intrusions and make systems one hundred percent secure.

    Chapter 4 develops the concept of survivability of the system instead of the focus on 100 percent security. It is the center of gravity of the thesis as it develops the modeling tools that will be applied in chapter 5.

     

     

     

    CHAPTER 4 FRAMEWORK FOR SCADA UTILITY SURVIVABILITY MODELING

     

    4.1 Risk Modeling

    Security is a relative concept. Systems will never be one hundred percent secure. The larger question is evaluating the survivability of the system. The overall goal is to make the SCADA system more survivable, given a cyber attack. In order to accomplish this goal, a probabilistic risk assessment and management framework is presented. There are four phases to the framework. Phase I begins by determining the risks to the system. This will be accomplished by solicited expert opinion, hierarchical holographic modeling, and results from internet-based survey. Phase II focuses on determining the sources of risk, model construction, and assessing the results. This will be accomplished using event tree and fault tree analysis. The event tree will provide a probability density function, exceedance probability, and help to understand how an event occurs and its consequences. The fault tree will add insight into why mitigating events on an event tree fail. Also, Phase II entails the construction of the partitioned multiobjective risk method (PMRM) to provide insight into the conditional expectation of extreme events (Asbeck and Haimes 1984). Phase III begins with the construction of functions for the multiobjective analysis. This phase provides the cost, risk and objective functions to analyze and compare policy options. This will be accomplished by developing alternatives and examining the changes in the exceedance probability. Phase IV begins by conducting a tradeoff analysis and ends by either drawing conclusions for a decision or returning to Phase I with the information learned from the analysis.

    Figure 4- 1 Risk Management Framework

     

    4.2 Internet Survey

    June 97 - January 1998 a survey was posted at "http://watt.seas.virginia.edu /~bce4k /home.html". The purpose of the survey was to gather information about the cyber threat, understand the state of SCADA in water supply systems, document any intrusions in the past year, and analyze trends among the administrators of these systems. A total of 62 individuals responded. Twelve responses were discarded because their identities or other qualifying information was not provided. Fifty were retained for study. Assuming that the number of water utilities is estimated to number between 6,000-10,000 then the survey represents 1.1-1.9% of the population (SCADA 1997). For perspective, CNN routinely surveys 1,000 or 0.0004% Americans and represents the results of 258 million. Many respondents provided information for more than one water utility. For example, a vice president of a water company provided results for 28 distinct systems. In the United States, 93 cities and 19 counties were represented. Internationally, responses included Israel, Canada, Colombia, and Australia. The results overwhelmingly showed that the disgruntled employee is the number one concern followed by internet hackers. Other results included 41% surveyed spend less than 10 % of time on system security (51% spend no time). Ten utilities (10 out of 50) reported attempts, successful unauthorized access, or use of their system. Corruption of information and denial of service were seen by respondents as the major concerns from a cyber intrusion. Unfortunately, 17% did not know the number of valves and 11% were unclear regarding the number pumps they controlled. Forty percent allow their operators access to the internet and 66% allow email access via their LAN. Sixty-four percent have remote access via their LAN and 60% have the ability to control their system from a dial-up connection. Interestingly, only 35% of water utilities describe their system as master-slave SCADA. Forty-seven percent felt that the disgruntled employee was the primary concern followed by hacker at 13%. Fifty-five percent agreed that the ultimate objective of an attacker is damage followed by challenge or status. Only 49% believed their greatest vulnerability was in implementation of their system. Also, 39% felt their system was safe from unauthorized access and only 37% from unauthorized use. Respondents agreed that denial of service and corruption of information would have the greatest impact on their water system. The tabulated results are provided below. Also, a copy of the actual internet survey is located in Appendix A.

    There were several lessons learned from this survey. The first lesson concerns the purpose of the survey. Early in the research it was determined that a survey was needed because of lack of information. Unfortunately, limited knowledge of the problem domain created a survey that was not focused on any particular aspect of the problem. It was unclear to many respondents who exactly should complete the survey. Was the survey intended solely for water utilities or for businesses as well? Another lesson learned was in determining the best way to count SCADA systems. Only 50 surveys were retained, yet this number accounted for 93 cities. Clearly, one respondent could bias the results. An internet-based survey was not the best medium for a survey of this kind. Several respondents were openly critical to gathering data of this kind over the internet. What could a terrorist do with data on the design or configuration of a particular city’s SCADA? For this reason several questions were intentionally deleted and not disclosed in this thesis. A better approach to a survey would be a sponsored survey by the American Water Works Association (AWWA) or another credible source. A press release followed by mailing of paper copies of the survey would produce better representation of the current state of SCADA in water utilities. Lastly, an appreciation for the time to look up emails on thousands of utilities was recognized. No organization was prepared to share their email list. Support from a major water organization would make mass mailings and press releases on a survey more realistic.

    1. What city or county do you provide water resources? There were 50 responses reporting either as a city, county, or outside the US.

     

    US Cities

     

    US Counties

     

    International Countries
    93
    19
    4

     

    2. What is your position in the water utility organization?

     

    Superintendent

     

    Manager

     

    Engineer /

    Analyst

     

    Operator

     

    Technician

     

    Other duty

    status

    5/48
    17/48
    12/48
    1/48
    3/48
    10/48

     

    3. How many valves and pumps does your system control?

     

    Valves

     

    Total / Responses

    Less than 50

    14/36

    More than 50

    16/36

    Unknown

    6/36

     

    Pumps

     

    Less than 50

    31/45

    More than 50

    9/45

    Unknown

    5/45

     

    4. How many Remote Terminal Units do you supervise/control with your SCADA system? There were 47 responses.

     

    Unknown

     

    0-5

     

    6-10

     

    11-20

     

    21-50

     

    51-100

     

    101-200

     

    over 200
    7/47
    9/47
    1/47
    2/47
    13/47
    3/47
    10/47
    2/47

     

    5. Do you allow access to the internet for your operators? There were 48 responses.

     

    Yes

     

    No
    19/48
    29/48

     

    6. Do you or your operators have access to email via an administrative LAN? There were 47 responses.

     

    Yes

     

    No
    31/47
    16/47

     

    7. Is the Local Area Network accessible via remote connections? There were 48 responses.

     

    Yes

     

    No
    33/48
    15/48

     

    8. Do you have the ability to control your system via a dial-up connection? (e.g. laptop, modem, and dial in to your server). There were 48 responses.

     

    Yes

     

    No

     

    Unknown
    29/48
    19/48
    0

     

    9. What communication medium do you use? There were 50 responses.

    Radio

    11/50

    Telephone leased line

    10/50

    Telephone party line

    1/50

    Combination

    13/50

    ISDN dial-up

    0

    ISDN dedicated

    0

    Other

    15/50

     

    10. Which best describes your SCADA? There were 46 responses.

    Distributed control

    22/46

    Master-slave control

    16/46

    Other

    8/46

     

    11. What is the speed of your connections in bits per second (bps)? There were 43 responses.

    300 bps

    0

    300-2,400 bps

    22/43

    4,800 bps

    1/43

    9600 bps

    1/43

    14,400 bps

    3/43

    28,000 bps

    2/43

    56,000 bps

    0

    64,000 bps

    0

    128,000 bps

    1/43

    Other

    11/43

     

    12. In your judgment, who do you see as your system’s primary concern from cyber attacks where one is the highest primary concern and 6 is the least? Responses varied from 36 to 46. The number of responses is the denominator for each score.

     

    Concern / Rank

     

    1

     

    2

     

    3

     

    4

     

    5

     

    6

    Hackers

    6/46
    7/46
    6/46
    4/46
    11/46
    12/46

    Spies

    1/38
    8/38
    23/38
    1/38
    3/38
    2/38

    Terrorists

    3/39
    5/39
    2/39
    6/39
    7/39
    16/39

    Corporate raiders

    2/38
    2/38
    4/38
    8/38
    3/38
    19/38

    Professional criminals

    2/36
    2/36
    4/36
    3/36
    11/36
    14/36

    Disgruntled employees

    18/38
    6/38
    5/38
    6/38
    2/38
    1/38

     

    13. What do you believe is the ultimate objective of an attacker? There were 45 responses.

    Challenge or status

    16/45

    Political gain

    1/45

    Financial gain

    3/45

    Damage

    25/45

     

    14. What tools do you think a potential threat is most likely to use to attack your system where 1 is the highest primary concern and 6 is the least. The responses varied from 38 to 44.

     

    Concern / Rank

     

    1

     

    2

     

    3

     

    4

     

    5

     

    6

    User command

    11/44
    4/44
    10/44
    1/44
    5/44
    13/44

    Script or program

    1/44
    8/44
    11/44
    3/44
    6/44
    15/44

    Autonomous agent

    2/40
    5/40
    10/40
    5/40
    2/40
    16/40

    Toolkit

    1/43
    3/43
    9/43
    4/43
    7/43
    19/43

    Distributed tool

    0/41
    5/41
    7/41
    2/41
    5/41
    22/41

    Data trap

    1/38
    3/38
    7/38
    1/38
    9/38
    17/38

    HERF attack

    3/39
    4/39
    7/39
    0/39
    6/39
    19/39

     

    15. Please rank which vulnerability is greatest in your SCADA system where 1 is your primary concern and 3 is the least. The responses varied from 37 to 39.

     

    Concern / Rank

     

    1

     

    2

     

    3

    Design vulnerability

    10/39
    9/39
    20/39

    Configuration vulnerability

    9/37
    22/37
    6/37

    Implementation vulnerability

    18/37
    10/37
    9/37

     

    16. Do you believe the design, configuration, and implementation of your system is safe from unauthorized access or unauthorized use? There were 46 responses.

     

     

    Yes

     

    No

    Unauthorized access

    18/46
    28/46

    Unauthorized use

    17/46
    29/46

     

    17. Which results from an attack on your SCADA system would have the greatest impact on your water resource system? There were 47 responses.

    Corruption of information

    22/47

    Theft of service

    1/47

    Disclosure of information

    2/47

    Denial of service

    22/47

    18. Who would you call in the event your SCADA system was tampered with? There were 49 responses.

    CERT (Computer Emergency Response Team)

    4/49

    Outsourced security firm

    1/49

    Police department

    22/49

    FBI

    8/49

    Other

    14/49

     

    19. Did you experience a computer system intrusion? Indicate the type checking all that apply. There were 10 responses from 50 surveyed.

     

    Manipulated data

    3/10

    Installed a sniffer program

    0

    Stolen password

    1/10

    Probing/scanning your system

    2/10

    Trojan logons

    0

    IP spoofing

    0

    Introduced viruses

    5/10

    Denied use of service

    3/10

    Downloaded data

    0

    Compromised information security

    1/10

    Compromised email/documents

    0

    Publicized intrusions

    1/10

    Harassed personnel

    2/10

    Other

    1/10

     

    20. Do you have the capability to detect attempts to gain access to your system? There were 44 responses.

    yes

    19/44

    no

    16/44

    unknown

    9/44

     

    21. Have you detected any attempts to gain access to your system in the past year? There were 44 responses.

    Yes

    1/44

    No

    35/44

    Unknown

    8/44

     

    22. If yes, how many successful unauthorized attempts have you detected in the past 12 months?

    The respondent from question 26 did not answer this question.

    23. How much time do you spend on ensuring your network is secure? There were 43 responses to this question.

    none

    22/43

    < 10%

    18/43

    10-20%

    2/43

    21-30%

    0

    31-40%

    0

    41-50%

    0

    51-60%

    0

    61-70%

    0

    > 71%

    1/43

    4.3 Survivability

    Survivability is defined as the capability of the system to exist, function, and recover in spite of adversity (American Heritage 1996). Matalucci (1998) defines surety as "as a level of confidence that a system will perform in acceptable ways in both the expected and unexpected circumstances." Matalucci and Miyoshi (1997) characterize surety as a combination of attributes like safety, security, use control, and reliability. "Surety describes an elevated state of safety and security; a state which is under control and very reliable" (Matalucci 1998). However, no mathematical definition has been discovered to date. There are many ways to measure survivability. Reliability and coverage are well known. Reliability is the probability of correct performance under normal system operation at time tn, given that the system was operational at tn-1 (Haimes 1998). Coverage is defined as the ability of the system to automatically recover from a fault during normal system operation (Dugan and Trivedi 1989). Unfortunately, cyber intrusion is an unusual loading on the system and not indicative of normal system operation.

    This thesis introduces a slightly different concept of surety. Surety is defined as a measure of acceptable system performance under an unusual loading. An unusual loading may be characterized as a rare or extreme event that was not envisioned in the design of the system. Examples of unusual loading are physical attack, natural disaster, and cyber intrusion. Surety will be used to quantitatively assess risk. Surety also shares a qualitative dimension that is analogous to safety -- the level of risk deemed acceptable. Surety will be used to measure the survivability of a water utility and its SCADA system, given a willful attack to the SCADA. Survivability is characterized as a vector whose attributes or components are redundancy, robustness, resilience, and security.

    Figure 4-2 models the various inputs and outputs for a SCADA/water supply system. This black box model helps one to visualize all the inputs that may influence the output - survivability.

    Figure 4- 2 SCADA Black Box Model

    In this thesis, survivability is assumed to be dependent upon four states of the system e.g., the state of redundancy, robustness, resilience, and security of the system. Each state variable is dependent upon exogenous, random, and decision variables that affect the output vector survivability.

    Redundancy is defined as the ability of certain components of a system to assume functions of failed components without adversely affecting the performance of the system itself (Matalas and Fiering 1977). Examples of redundancy in water supply systems are additional pumps, valves, water lines or tanks beyond those needed for normal operation. An example of redundancy in SCADA is additional communication mediums between the master and remote terminal unit.

    Robustness is defined as the degree of insensitivity of a system design to errors in the estimates of those parameters affecting design choice (Matalas and Fiering 1977). Robustness reduces the sensitivity of the system to extraordinary conditions. An example of robustness in water utilities might be additional capacity of water during periods where demand may otherwise exceed supply. In SCADA, robustness may best be characterized in systems where remote terminal units operate under conditions where communication from the master terminal unit is delayed or interrupted. In general, SCADA systems with distributed intelligence are inherently more robust than centrally controlled systems (SCADA 1997). If intrusion occurs at the MTU and communications become disabled, remote terminal units are capable of function in spite of the intrusion.

    Resilience is defined as the ability of a system to operate close to its optimal design technically and institutionally over a short run after an attack, such that the losses are within manageable limits (Matalas and Fiering 1977). Resilience in water supply systems is emergency and crisis action plans that provide a utility with the ability to continue to function after an attack. In SCADA, a remote terminal unit’s electronic programmable read-only memory (EPROM) can be reset by cutting and restoring power (Lambert 1997). Another example is a crisis action plan that details how the utility will cover down on remote sites and manually control their function, given a system wide SCADA failure. Another component of resilience that is closely aligned with the software of networks and SCADA is valency. Valency is the ability of the system to react to intrusion and to restore normal system operation. The concept of valency is borrowed from its biological meaning and introduced in this thesis. High valency implies a strong capability to recover from unauthorized access or use. Software that actively seeks computer viruses and destroys the corruption is an example of valency or resilience.

    Security is defined as the ability of certain components of the system to deter, detect, and defend against attacks (Haimes et al. 1997). In water supply systems security is multifaceted. There are numerous examples like fences, locks, alarms, and sensors. In SCADA, sensors, and alarms provide feedback on water quality. Some SCADA systems also have features that work to prevent unauthorized access or use. In general, secure systems have properties that reduce the likelihood of successful attacks.

     

    4.4 Taxonomy for Assessing Computer Security

    Understanding how SCADA systems are vulnerable to cyber threats requires an understanding of the nature of cyber threats. Are water utilities concerned about cyber terrorists? Is the threat primarily from a disgruntled employee? What is the motivation of the attacker? There are many ways for attacking computer systems. Depending on the goal of the attacker, the arsenal of tools is large, and the number of malicious viruses are impossible to enumerate (Cohen 1994). What is needed is a method for characterizing the threat to SCADA systems. Assessing the threat from the internet has been attempted in a variety of ways. However, until recently, there has been no systematic method. In order to explore these methods, an understanding of some basic types of attack is required.

     

    4.5 Definitions and Terms for a Taxonomy

    A computer virus is defined "as programs that can infect other programs by modifying them to include a possibly evolved version of itself" (Cohen 1994).

    Table 4-1 List of Terms (Cohen 1995)

    Trojan horses

    Toll fraud networks

    Fictitious people

    Infrastructure observation

    Email overflow

    Time bombs

    Get a job

    Protection limit poking

    Infrastructure interference

    Human engineering

    Bribes

    Dumpster diving

    Sympathetic vibration

    Password guessing

    Packet insertion

    Data diddling

    Computer viruses

    Invalid values on calls V

    Van Eck bugging

    Packet watching

    PBX bugging

    Shoulder surfing

    Open microphone listening

    Old disk information

    Video viewing

    Backup theft

    Data aggregation

    Use or condition bombs

    Process bypassing

    False update disks

    Input overflow

    Hang-up hooking

    Call forwarding fakery

    Illegal value insertion

    Email spoofing

    Login spoofing

    Induced stressed failures

    Network services attack

    Combined attacks

    etc.

     

    One technique for addressing the threat is to create a list of as many types of attacks within imagination (Howard 1995). The common terms are listed in Table 4-1. These terms are one-dimensional and fail to address the nature of the threat. Also, this list states the what but provides no insight into who, when, or why .

    Another approach is to build a similar list of characteristics. This has an advantage over listing terms because it does not include hacker jargon. Still, another technique is to list attacks by some empirical means. Terms, categories, and lists can be difficult to remember. Also, they may not be as intuitive as required to understand the threat (Amoroso 1994).

    Howard (1995) developed a computer and network attack taxonomy. He uses a philosophy of ways, means, and ends to fully describe the dimensions of a cyber attack. An attacker succeeds by transitioning through the operational sequence of tools, access, and results. His method succeeds in satisfying the goals of a successful taxonomy, described by Amoroso (1994) as the following:

  • - mutually exclusive -- classification in one category excludes all others;

    - exhaustive -- the sum of categories include all possibilities;

    - unambiguous -- precise so that classification is not uncertain, regardless of who is classifying;

    - repeatable -- repeated applications result in the same classification, regardless of who is classifying;

    - accepted -- logical and intuitive so that they can become generally approved;

  • - useful -- can be used to gain insight into the field of inquiry.

    Though the classification is not completely successful in capturing every possibility, it provides the answers for who, what, when, and why. Howard’s taxonomy is provided in Figure 4-3. There are five components to his model: attackers, tools, access, results, and objectives. His taxonomy was designed with the formal definition of computer security, "preventing attackers from achieving objectives through unauthorized access or use of computers or networks" (Howard 1995).

     

    4.6 Understanding the Taxonomy

    Howard (1995) divides attackers into six categories:

    - hackers -- challenge and status,

    - spies -- information for political exploitation,

    - terrorist -- fear and political gain,

    - corporate raiders -- a financial advantage,

    - professional criminals -- personal gain, and

    - vandals -- damage.

     

    These categories are generally broad enough to capture every type of person or organization. Access is characterized as either unauthorized use or unauthorized access. Attackers simply exploit vulnerabilities in the system. Cohen writes the following:

  • "The point is that viruses do not exploit implementation flaws. They exploit flaws in the security policy. That is, the policy that allows you to share information and interpret it in a general purpose way, allows a virus to spread, regardless of its implementation" (Cohen 1994).
  • There are essentially three ways that attackers take advantage of computer systems: software bug, design, or configuration. Software bugs are common in UNIX and NT based machines. Design flaws in systems are the most difficult to overcome. Configuration is dangerous because the operator of the system believes his system is in normal operational conditions (Cohen 1994).

     

    Results are described as the damage caused by the attacker to achieve his objective. He accomplishes this by using the tools. Howard (1995) describes four distinct categories:

  • - corruption of information -- any unauthorized alteration of files stored on a host computer (Amoroso 1994),

    - disclosure of information -- dissemination of information to anyone not authorized access to the information (Howard 1995),

    - theft of service -- unauthorized use of computer or network without degrading services to other users (Amoroso 1994),

    - denial of service -- the intentional degradation or blocking of computer, or network resources (Cohen 1995).

  • The tools to conduct an attack complete the taxonomy (Howard 1995):

  • - user command -- This is a tool used to guess the password or enter a long string and telnet into system.

    - script or program -- At the User Command interface, attackers can make use of scripts or programs for the automation of commands. An example would be a "crack" program to determine passwords. Another example is a "Trojan Horse" program that is used to copy over an existing program. It performs like the program it replaced but also conducts other operations that the user is unaware about such as erasing files, logging passwords to a file, or corrupting data.

    - autonomous agent -- This is the most widely publicized of means of attacks. It is similar to a Trojan Horse. The difference is that an Autonomous Agent contains program logic to make an independent choice of what host to attack (e.g., the computer virus).

    - toolkit -- A grouping of scripts programs and autonomous agents into a GUI program (e.g., rootkit).

    - distributed tool -- A tool that attacks a host simultaneously from multiple hosts. Clock time can be used to synchronize the attack.

    - data trap -- The exploitation of the electromagnetic field surrounding a computer. This field contains information about the computer. Namely, to reveal data in transit or on the terminal.

    - *HERF Attack -- High Energy Radio Frequency Attack. The ability to emit a pulse from a device that could be hidden in a soda can in a garbage can that could destroy all electronic devices, but not damage the building or other structures. (* This form of attack was added by the author.)

     

  • Figure 4- 3 Taxonomy of Computer and Network Attacks (Howard 1995)

    This taxonomy serves a useful purpose in understanding the ways, means, and ends of an attacker. This taxonomy will provide a meaningful background in identifying the risk to SCADA for water supply systems, using hierarchical holographic modeling (HHM).

     

    4.7 Hierarchical Holographic Modeling (HHM)

    Howard’s taxonomy uses a philosophy of ways, means, and ends to fully describe the dimensions of cyber intrusion. An attacker succeeds by transitioning through the sequence of tools, access, and results. This approach is useful because it facilitates risk identification when combined with hierarchical holographic modeling (HHM) (Haimes 1981). The HHM is a framework that allows one to identify virtually all sources of risk in a complex, multifarious (e.g., hierarchical non-commensurable objectives, multiple decisionmakers, multiple transcending aspects, and risks) system (Haimes 1998). It accomplishes this by decomposing the complex system into smaller subsystems. Its approach is holographic, meaning that one observes a system through a lenseless camera. This has the advantage of providing a very broad perspective on the nature of the complex system. By analyzing systems along functional, temporal, modal, geographic, political etc., one can develop a list that identifies sources of risk, with respect to all aspects of the system. Haimes (1998) detailed these advantages of hierarchical decomposition:

  • - decomposition methods can reflect the internal hierarchical nature of large-scale systems;

    - trade-off analysis can be performed among subsystems and the overall system;

    - through decomposition, the complexity of a large-scale multiobjective system can be relaxed by solving several smaller problems;

    - adds both robustness and resilience to modeling by capturing various systems aspects and other societal elements;

    - adds more realism to the entire modeling process by recognizing that the limitations of modeling complex system via a single model are circumvented by a model that addresses specific aspects of the system.

     

  • "By considering different hierarchical structures together we can expect synergistic understanding of the overall system and its corresponding sources of risk and uncertainty"(Haimes 1998). HHM allows one to identify events from outside the system (e.g., cyber attacks) that impact on it. It also accounts for interior events that affect the system (e.g., software) (Haimes 1998).

     

    4.8 Recent Uses of the HHM in Identifying Risks

    There is justification in use of this modeling framework in recent history. Executive Order 13010 issued on July 15, 1996 established the President’s Commission on Critical Infrastructure Protection (PCCIP). It funded research on eight infrastructures: telecommunication, electric power, gas and oil, banking and financing, transportation, water supply, emergency services, and continuity of government. Universities and industry developed strategies and recommendations to harden our infrastructures. The University of Virginia’s Center of Risk Management of Engineering Systems provided the Commission with a paper detailing the vulnerabilities of water supply systems to terrorist attack. The paper used the HHM to develop a list of real, perceived, or imagined risks, and their corresponding decomposition. The Center outlined over 104 sources of risk to the water supply system (Haimes et al. 1997).

    The HHM was also used to examine the Maumee River Basin, the largest sub-basin of the Great Lakes Basin (Haimes 1998). It identified risks for five planning sub areas, eight water sheds, seven objectives, several counties, States, political, and geographical interests (Haimes 1998).

     

    4.9 Risk Modeling Using HHM

    Using HHM, the SCADA system was decomposed into eight major categories. These categories represent the major sources of risk to SCADA. In general the major categories are identified as

    A, B, C, …

    And their sub-categories as

    A1, A2, A3, …

    B1, B2, B3, …

    C1, C2, C3, …

    Category A: Function

    Given the importance of the water distribution system, their function is a major source of risk from cyber intrusion. This category may be partitioned into three sub-categories or zones:

    A1 Gathering,

    A2 Transmitting and,

    A3 Distributing.

    Gathering is defined as all actions a SCADA system requires to manage the accumulation of water (SCADA Mail List 1997). Transmitting is defined as the communication between MTU and RTU, whether the medium is telephone, radio, ISDN, or fiber optic (SCADA Mail List 1997). Distributing accounts for the process, direction, and logic a SCADA system employs (SCADA Mail List 1997).

    Category B: Hardware

    The hardware of SCADA is vulnerable to tampering in a variety of configurations. There are nine sub-categories of hardware:

    B1 MTU,

    B2 RTU,

    B3 Modem,

    B4 Telephone Line,

    B5 Radio,

    B6 ISDN,

    B7 Satellite,

    B8 Alarm and,

    B9 Sensor.

    Depending on the tool and skill of an attacker, these sub-categories could have a significant impact of water flow for a community.

    Category C: Software

    Perhaps the most complex, this category represents the most dynamic aspects of changes in water utilities. Software has many components that are sources to risk.

    C1 Controlling,

    C2 Operating System and,

    C3 Communication.

    There is additional decomposition of this sub-category in Figure 4-4 that show other sources of risk.

    Category D: Human

    There are two major sub-categories employee and attackers:

    D1 Employees --

    D11 Systems Analyst,

    D12 Technician,

    D13 Operators,

    D14 Trainees,

    D15 Manager,

    D2 Attackers --

    D21 Disgruntled Employee,

    D22 Hacker,

    D24 Terrorist,

    D25 Vandal,

    D26 Spy, and

    D27 Professional Criminal.

    D28 Corporate Raider

    This category addresses a decomposition of who is capable of tampering with a system.

    Category E: Tools

    A distinction is made between the various types of tools an intruder may use. There are six sub-categories:

    E1 User Command,

    E2 Script or Program,

    E3 Autonomous Agent,

    E4 Toolkit,

    E5 Data Trap, and

    E6 High Energy Radio Frequency Weapon (HERF).

    These tools allow an intruder the means to tamper with a system.

    Category F: Access

    An intruder has many paths into a system. An intruder can exploit these vulnerabilities and pose severe risk to the system. There are five sub-categories:

    F1 Implementation Vulnerability,

    F2 Design Vulnerability,

    F3 Configuration Vulnerability,

    F4 Unauthorized Use, and

    F5 Unauthorized Access.

    A system may be designed safe yet the installation and use may lead to multiple sources of risk.

    Category G: Geographic

    Location is not relative for many risks of cyber intrusion. In the context of the HHM, four sub-categories are identified:

    G1 International

    G2 National

    G3 Local

    G4 Internal

    There are clearly sources of risks around the world that can tamper with SCADA systems. International borders are irrelevant because of the internet.

    Category H: Temporal

    The temporal category seeks to show how present or future decisions affect the system. Replacing a legacy SCADA system in 10 years may be decided today. Also, threats to systems will change with time. Therefore, the lifecycle of the system is addressed with the temporal category. There are four partitions of this category:

    H1 Long Term > 10 Years

    H2 Short Term < 10 Years

    H3 near Term < 5 Years

    H4 Today

    The modeling effort begins by looking at all sources of risk to SCADA along functional and modal lines. The holographic approach is used in an attempt to envision all sources of risk within imagination. Next a decision is made on what sources of risks to focus the model and analysis using the results of an internet based survey. The Analytic Hierarchy Process (AHP) (Saaty 1980) may also be used to aid a decisionmaker when deciding which risks to focus. However, this thesis will use the results of the survey to select two sources of risk to focus the analysis.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Figure 4- 4 HHM for SCADA and Water Utilities

     

    4.10 Goal Development and Indices of Performance

    The purpose of goal development is to determine the decisionmaker’s needs for the risk analysis. After a consultation, the goals tree in Figure 4-5 might serve along with the indices of performance.

     

    Figure 4- 5 Goals Tree Water Utility SCADA System

     

    4.11 Event Tree and Fault Tree Analysis

    Event tree analysis asks "what if" to determine the sequence of events that lead to consequences. From the event tree one can construct a probability density and exceedance probability. Event trees help to understand how an outcome occurs as it transitions through mitigating events. The consequences are conditioned on the occurrence of the initiating event and subsequent mitigating events (e.g., hacker intrusion through a firewall and disgruntled employee accessing through a dialin connection). Fault-tree modeling adds insight into how mitigating events fail.

     

    Figure 4- 6 Event tree Development (Ang and Tang 1984)

    Figure 4-6 shows the technique of event tree development (Ang and Tang 1984). At the top of Figure 4-6, redundancy, robustness, resilience, and security have been added. In Figure 4-7 an example event tree of a hacker is willfully attacking the system with the goal of reducing the water supply to a city. In this example, the hacker transitions through the path of mitigating events and succeeds at each mitigating event with probability p and fails with probability 1-p. The event tree reads as follows: Given a cyber intrusion, does the firewall system protect