Risks of Cyber Attack to Supervisory Control and Data Acquisition for Water Supply

 

A Thesis

Presented to

the Faculty of the School of Engineering and Applied Science,

University of Virginia

 

In Partial Fulfillment

of the Requirements for the Degree

Masters of Science (Systems Engineering)

by

Captain Barry C. Ezell

United States Army

 

Thesis Advisor

 

Yacov Y. Haimes,

Quarles Professor of Engineering

and Applied Science and

Director, Center for Risk Management of Engineering Systems

 

May 1998

 

 

 This paper is also available as a word document for download: SCADA

 

 

ABSTRACT

 

Supervisory control and data acquisition (SCADA) allows a utility operator to monitor and control processes that are distributed among various remote sites. The goal of this thesis is to develop a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful threats to water utility SCADA systems. This framework can assist decisionmakers in understanding the risks of cyber intrusion, their consequences and tradeoffs in order to maximize the survivability of the system. Surety, a measure of survivability, is defined as a measure of system performance under an unusual loading. A survey is conducted to understand the current state of SCADA in water utilities, to document information on cyber intrusion, and to determine the concerns of administrators on system security. Using hierarchical holographic modeling (HHM), sources of cyber risk to SCADA are identified. Event trees and fault trees are used to model the probabilistic consequences of cyber intrusion on water supply systems. Cost, surety, expected level of percentage of water flow reduction, and conditional expected level of percentage of water flow reduction are introduced as performance measures to evaluate policy options. Alternatives are generated and then compared using multiobjective tradeoff analysis. Lastly, a prototype city is analyzed to demonstrate the applicability of the developed methodology. The methodological framework for managing cyber risk to water utility SCADA systems constitutes the major contribution of the thesis.

TABLE OF CONTENTS

  • ABSTRACT *

    CHAPTER 1 INTRODUCTION *

  • 1.1 Cyber Attacks *

    1.2 Stakeholders *

    1.3 Statement of Need *

    1.4 Thesis Tasks *

    1.5 Thesis Overview *

  • CHAPTER 2 SUPERVISORY CONTROL AND DATA ACQUISITION *

  • 2.1 Introduction *

    2.2 Master Terminal Unit *

    2.3 Remote Terminal Unit *

    2.4 SCADA History *

    2.5 Telemetry and the Mainframe Era *

    2.6 SCADA and Micro Computer Era *

    2.7 SCADA or Distributed Control Systems (DCS) *

    2.8 Trends in SCADA and the Internet *

    2.9 Estimating Attacks and Incidents from the Internet *

    2.10 Denial of Service *

  • CHAPTER 3 REVIEW OF RISK AND SYSTEMS ENGINEERING *

  • 3.1 Systems Approach *

    3.2 Total Risk Management *

    3.3 Alternative Approaches and Methods *

    3.4 Probabilistic Risk Assessment (PRA) and Management (PRAM) *

  • CHAPTER 4 FRAMEWORK FOR SCADA UTILITY SURVIVABILITY MODELING *

  • 4.1 Risk Modeling *

    4.2 Internet Survey *

    4.3 Survivability *

    4.4 Taxonomy for Assessing Computer Security *

    4.5 Definitions and Terms for a Taxonomy *

    4.6 Understanding the Taxonomy *

    4.7 Hierarchical Holographic Modeling (HHM) *

    4.8 Recent Uses of the HHM in Identifying Risks *

    4.9 Risk Modeling Using HHM *

    4.10 Goal Development and Indices of Performance *

    4.11 Event Tree and Fault Tree Analysis *

    4.12 Distributions from Event Tree Analysis *

    4.13 Partitioned Multiobjective Risk Method *

    4.14 Multiobjective Tradeoff Analysis *

    4.15 Evaluation *

  • CHAPTER 5 APPLICATION OF THE METHODOLOGY *

    5.1 Introduction *

  • 5.2 Experts for Elicitation *

    5.3 Problem Definition for City XYZ *

    5.4 Identifying Sources of Risk (Phase I) *

    5.4.1 Scenario One (Disgruntled Employee) *

    5.4.2 Scenario Two (Hacker) *

    5.5 Indices of Performance (IP) *

    5.6 Assess the Risks (Phase II) *

    5.6.2.1 Scenario One Event-tree (Current System) *

    5.6.2.1.1 PDF/CDF and Exceedance Probability graphs *

    5.6.2.1.2 Calculations of Benchmarks for Indices of Performance *

    5.6.2.2 Scenario Two Event-tree (Current System) *

    5.7 Generate Alternatives for System (Phase III) *

    5.7.1 Analysis of Results for Scenario One *

    5.8 Draw Conclusions through Tradeoff analysis (Phase IV) *

    5.9 Sensitivity Analysis *

    5.10 Summary of Methodology *

  • CHAPTER 6 CONCLUSIONS *

  • 6.0 Summary *

    6.1 Contributions *

  • 6.2 Future Work *

    REFERENCES *

    APPENDIX A SURVEY *

    APPENDIX B EVENT TREES FROM EXPERT ONE *

  •  

    ACKNOWLEDGMENTS

    I would like to thank my advisor Professor Yacov Y. Haimes for the many hours of his time that he has shared with me. His ability to coach, teach, and mentor has enabled me to chart a course for two years at the University of Virginia. His dedication to students is an example that I will try to emulate when I become an instructor at West Point. Secondly, I would like to thank Professor James H. Lambert for the hours of valuable help in understanding the math and science presented in this thesis.

    I am especially thankful to my mother who assisted in editing the thesis and survey. Also, Bruce Freer, Senior Area Manager, Rockwell Automation provided a valuable service by teaching me about the software components of programmed language controllers and communication. I am grateful to the SCADA mail list and in particular Ian Wiese, Anthony Nelson, and Cary Hillebrand. They spent countless hours teaching fundamentals and helping me design a worthy example problem to present the methodology.

    Finally, I would like to thank my wife Debbie. She provided emotional support and managed our home front, allowing me to focus on this thesis and graduate school.

  • LIST OF FIGURES

    Figure 2-1 Generic SCADA System (Boyer 1993) *

    Figure 2-2 Water Distribution System and SCADA (Wiese and Ezell 1997) *

    Figure 2-3 Inputs & Outputs for MTU (Boyer 1993) *

    Figure 2-4 Inputs & Outputs for RTU (Boyer 1993) *

    Figure 2-5 DISA Vulnerability Assessments (GAO 1996) *

    Figure 2-6 Difficulty vs. Damage in Attacking Networks *

    Figure 3-1 The Steps of Risk Analysis (White and Pooch 1996) *

    Figure 3-2 ISS Risk Management Model (ISS 1997) *

    Figure 4- 1 Risk Management Framework *

    Figure 4- 2 SCADA Black Box Model *

    Figure 4- 3 Taxonomy of Computer and Network Attacks (Howard 1995) *

    Figure 4- 4 HHM for SCADA and Water Utilities *

    Figure 4- 5 Goals Tree Water Utility SCADA System *

    Figure 4- 6 Event tree Development (Ang and Tang 1984) *

    Figure 4- 7 Event Tree for Cyber Intrusion *

    Figure 4- 8 Fault Tree for Firewall System *

    Figure 4- 9 Example Consequences from Event Tree *

    Figure 4- 10 Partitioning on the Probability Axis (Asbeck and Haimes 1984) *

    Figure 4- 11 Black-box Model of SCADA System *

    Figure 5- 1 Water Distribution System (Wiese et al., 1997) *

    Figure 5- 2 SCADA for City XYZ (Wiese et al. 1997) *

    Figure 5- 3 HHM for City XYZ *

    Figure 5- 4 Event Tree for Scenario One (Disgruntled Employee) *

    Figure 5- 5 PDF / CDF for Scenario One (Expert 1) *

    Figure 5- 6 Exceedance Probability and PMRM Scenario One (All Experts) *

    Figure 5- 7 Event Tree for Scenario Two (Hacker) *

    Figure 5- 8 Exceedance Probability for Scenario Two (All Experts) *

    Figure 5- 9 Exceedance Probability for Each Alternative *

    Figure 5- 10 Alternatives, ƒ1 vs. ƒ2, ƒ5, and ƒ4 *

    Figure 5- 11 Parameter Sensitivity Comparison *

    Figure 5- 12 Model Sensitivity Comparison *

  •  

  • LIST OF TABLES

     

    Table 2-1 Estimates of Cyber Attacks in 1995 (Howard 1995) *

    Table 2-2 Summary of Attack Estimates for Utilities *

    Table 2-3 Estimation of Risk (Howard 1995) *

    Table 3-1 Examples of Likelihood and Outcome (Kumamoto and Henley 1996) *

    Table 4-1 List of Terms (Cohen 1995) *

    Table 4-2 Alternatives vs. Performance Measures *

    Table 5- 1 Security Level Function (Boyer 1993) *

    Table 5- 2 Estimates for Scenario One (Disgruntled Employee) *

    Table 5- 3 Estimates for Scenario Two (Hacker) *

    Table 5- 4 City XYZ’s SCADA System Performance for Scenario One *

    Table 5- 5 Probability and Cost Estimates (Nelson et al. 1998) *

    Table 5- 6 Results for Each Alternative *

    Table 5- 7 Parameter Sensitivity Results *

    Table 5- 8 Model Sensitivity Results *

  •  

     

    CHAPTER 1 INTRODUCTION

  • 1.1 Cyber Attacks
  • Cyber attacks of supervisory control and data acquisition (SCADA) in water supply are uncommon and discussion of this matter has been limited. The Computer Emergency Response Team (CERT) data collected from 1989-1995 has no mention of infrastructure attacks (Howard 1995). However, the President’s Commission on Critical Infrastructure Protection (PCCIP) conducted a year-long study concluding that cyber threats are a clear danger (risk) to all infrastructures (PCCIP 1997). Estimates by various experts for 1995 placed cyber attacks between 48,000 and 44,000,000 (Howard 1995). General Marsh (retired), chairman of the PCCIP estimated to Congress that 80 percent of cyber attacks were by vandals or disgruntled employees (Aviation Weekly 1997). In short, the internet is doubling in size every 12-15 months and water utilities are becoming increasingly interconnected, interdependent, and moving toward common protocols like Transmission Control Protocol/Internet Protocol (TCP/IP). The purpose of this thesis is to answer the following questions:

    Are supervisory control and data acquisition (SCADA) systems used by water utilities vulnerable to cyber attack in the short term (five years)? And if so, how do we improve the survivability of the system through risk modeling, assessment, and management?

     

    One can argue that vulnerability is simply evaluating where exposure is greatest and access control is weakest (NSTAC 1997). In computer systems like SCADA, exposure is connectivity and visibility. Phone number pools for data transfer, dialin connections, and internet connectivity are examples of exposure. A water utility with a three-letter domain of .com enjoys less exposure to hackers than .gov or mil. Exposure is greater on three letter domains that are .mil or .gov. Utilities that share .com are in a pool of milllions of hosts that continue to double in size every year, where .gov and .mil have leveled off (Egnst 1997). Access controls are those active and passive measures taken to control access to all points of connectivity for the computer system.

  •  

    1.2 Stakeholders

  • There are four groups that can benefit from this thesis. The first and most important is the public, who expect uninterrupted flow of clean water. Secondly, water utilities, which are responsible for providing the service and the potential target under study in this thesis. Industry, which stands to benefit from this thesis because it designs the system and the software for utilities. Lastly, the government at all levels is responsible for the public’s water supply. All groups share similar objectives -- cost, profit, safety, survivability, security, surety, etc. Many objectives are conflicting. Industry seeks to maximize profit, while a potential client like a water utility or the taxpayer wishes to minimize the cost. Multiple objectives, often noncommensurable and in conflict, along with multiple decisionmakers, set the stage for the multifarious nature of problem solving that this thesis seeks to model.

     

    1.3 Statement of Need

    Despite the existence of probabilistic risk assessment and decision support tools, most approaches in computer security are qualitative, offering no quantitative metrics in modeling and evaluating the problem. One reason is that cyber attacks are a rare event. Instead of using precise distributions and reliability theory, there is a great deal of uncertainty in information, subjectivity of risk exposure, and true vulnerability.

    This thesis presents a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful intrusion into water utility SCADA systems. This framework can assist decisionmakers in understanding the risks of cyber intrusion, their consequences, and the associated tradeoffs in order to improve the survivability of the system. The development of a framework for cyber risk management to water utility SCADA systems is the centerpiece and major contribution of the thesis.

     

    1.4 Thesis Tasks

    The tasks completed in this thesis are as follows: (1) establishment of historical results of an internet-based survey that details the current state of SCADA in water utilities and documents information on cyber intrusion and system security; (2) research and review of the history of SCADA; (3) research and review of current risk management techniques in computer security; (4) identification of the sources of cyber risk to SCADA; (5) research of techniques in probability risk assessment through event and fault tree analysis; (6) application of extreme event analysis to the assessment of water supply reduction; (7) research of techniques in modeling survivability with the concept of surety, and (8) development of a realistic example problem from data aggregated from the internet survey to apply the framework developed in this thesis.

     

    1.5 Thesis Overview

    Chapter one describes the general information about SCADA, trends, goals, and tasks that set the conditions for developing a methodology to manage cyber attacks. The purpose of chapter two is to introduce the wide range of expert opinion on cyber attacks. It begins by defining and reviewing the history of SCADA. Chapter two highlights a study where the Defense Information Systems Agency (DISA) attacked government systems in order to gauge the effectiveness of their security. It concludes by making a comparison between government computer systems and water utility SCADA systems. The purpose of chapter three is to review and understand the current risk analysis methods used in computer security. After reviewing classic risk analysis and the concept of annualized present value of expected loss, it reviews and compares the qualitative approach used by software companies to probabilistic risk assessment as a more quantitative approach used in nuclear power, transportation, rail, etc. Chapter four is the center of gravity for the thesis, disclosing the major findings and lessons learned from the survey, presenting the major modeling tools used in the thesis, and culminating with a multiobjective framework to manage risk. Chapter four accomplishes the central goal of the thesis -- to present a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful intrusion to water utility SCADA systems. The centerpiece in chapter four is Figure 4-1 Probabilistic Risk Management Framework. It graphically depicts the risk management framework for water utility SCADA systems. Chapter five introduces a prototype city designed from aggregated data from the internet survey in order to apply the methodology described in chapter four. Chapter six conveys the major findings, contributions, and recommendations that result from the thesis. Appendix A contains the actual survey that was posted on the internet for water utilities.

     

    CHAPTER 2 SUPERVISORY CONTROL AND DATA ACQUISITION

     

    2.1 Introduction

    Supervisory control and data acquisition (SCADA) is a system that allows an operator to monitor and control processes that are distributed among various remote sites (Boyer 1993). There are many processes that use SCADA systems: hydroelectric, water distribution and treatment utilities, natural gas, etc. SCADA systems allow remote sites to communicate with a control facility and provide the necessary data to control processes. For many of its uses, SCADA provides an economic advantage. As distance to remote sites increase and difficulty to access increases, SCADA becomes a better alternative to an operator or repairman’s visiting the site for adjustments and inspections. Distance and remoteness are two major factors for implementing SCADA systems (Boyer 1993).

    There are four major elements to a SCADA system: the operator, master terminal unit (MTU), communications, and remote terminal unit (RTU). The operator exercises control through information that is depicted on a video display unit (VDU). Input to the system normally initiates from the operator via the master terminal unit’s keyboard. The MTU monitors information from remote sites and displays information for the operator (Figure 2-1). The relationship between MTU and RTU is analogous to master and slave. Depending on the complexity or sophistication the MTU may employ heuristics embedded into its programming that allow it to make modifications to the system to maintain optimality. In the same fashion, the sophistication in the RTU may allow local

    Figure 2-1 Generic SCADA System (Boyer 1993)

    optimization of functions. Figure 2-2 depicts the topology for SCADA and water supply for a small city. Note that it is quite possible that systems employ more than one means to communicate to remote sites. SCADA systems are capable of communicating using a wide variety of media such as fiber optics, dial-up, or dedicated voice grade telephone lines, or radio. Recently, some utilities have employed Integrated Services Digital Network (ISDN) (Lambert 1997). Since the amount of information transmitted is relatively small (less than 50K), voice grade phone lines, and radio work well (Boyer 1993).

     

     

    Figure 2-2 Water Distribution System and SCADA (Wiese and Ezell 1997)

     

    2.2 Master Terminal Unit

    At the heart of the system is the master terminal unit (MTU). The master terminal unit initiates all communication, gathers data, stores information, sends information to other systems, and interfaces with operators (Boyer 1993). The major difference between the MTU and RTU is that the MTU initiates virtually all communications by its programming and people. Almost all communication is initiated

     

     

    Figure 2-3 Inputs & Outputs for MTU (Boyer 1993)

    by the MTU (Boyer 1993). The MTU also communicates with other peripheral devices in the facility like monitors, printers or other information systems. The primary interface to the operator is the monitor that portrays a representation of valves, pumps, etc. As incoming data changes, the screen is updated. Figure 2-3 shows examples of inputs from the MTU and field devices.

     

    2.3 Remote Terminal Unit

    Remote terminal units gather information from their remote site from various input devices, like valves, pumps, alarms, meters, etc. Essentially, data is either analog (real numbers), digital (on/off), or pulse data (e.g., counting revolutions of meters). Many remote terminal units hold the information gathered in their memory and wait for a request from the MTU to transmit the data. Other more sophisticated remote terminal units have microcomputers and programmed language controllers (PLC) that can perform direct control over a remote site without the direction of the MTU. Figure 2-4 shows an example of outputs of the RTU to the MTU and field devices.

    Figure 2-4 Inputs & Outputs for RTU (Boyer 1993)

    The RTU central processing unit (CPU) receives a binary data stream in accordance with the communication protocol. Protocols can be open, like Transmission Control Protocol and Internet Protocol (TCP/IP) or proprietary. Data streams generally contain the information that is organized according to the seven layer Open Systems Interconnection Model (OSI Model). The OSI Model is used to set standards in the way information is exchanged with respect to protocols, communication, and data. The RTU receives its information because it sees its identification embedded in the protocol. The data is then interpreted, and the CPU directs the appropriate action at the site.

     

    2.4 SCADA History

    SCADA can be traced to the development of telemetry from the first half of the century. The technology of rockets and aircraft afforded man with the opportunity to investigate weather and planetary data. This required a simple way to get data that observers could not normally achieve from space (Boyer 1993). Manned stations on the surface of the Earth such as lighthouses, post offices, weather stations, etc., were able to collect and monitor data on weather. However, for accurate weather prediction, more detailed information was needed from the atmosphere. There were two questions to be answered. How could accurate data be gathered from the atmosphere and communicated back to a facility on the Earth’s surface? And, how might data be gathered from a number of sites in one centralized location to record, analyze, and then predict the weather (Boyer 1993)?

     

    2.5 Telemetry and the Mainframe Era

    The solution came from railroad companies that used telemetry devices. Railroads used telemetry that gave information on train location and switch status. During this time, advances in radio technology improved, removing the requirement to lay hundreds of miles of wire (Boyer 1993). Developments in error correction and data compression allowed more information to be reliably sent via radio. Throughout the century, more industries, such as automation plants, gas, electric, and water utilities, began using telemetry systems to monitor processes and remote sites. Two-way radio communication became common in the early sixties (Boyer 1993). During this time mainframe computing became the paradigm. Terminals with no intelligence used the mainframes to perform all calculations and store data. This method changed in the early eighties with the development of the microcomputer.

     

    2.6 SCADA and Micro Computer Era

    This era allowed information and intelligence at the fingertip of the user. The microcomputer allowed process control to be distributed among the remote sites, freeing up the dependency of the central facility mainframe. By the late 1980s, industry began shifting to the distributed systems era. Characterizations of this era are Wide Area Network (WAN) and Local Area Network (LAN) integration, open standards, relational information modeling, and icon driven applications (Applegate 1996). In the late nineties, a new computing era emerged. Management Information System (MIS) professionals refer to this time as the ubiquitous era (Applegate 1996). This is a time when all types of configurations of intranets, WANs, and LANs are conceivable. The lines are blurred among different servers with responsibilities. During this era, the need for master-slave SCADA has significantly diminished. Programmed language controllers now have the capability to monitor and control local sites. Users of SCADA have begun changing as well. Industries like electric utilities retained a centralized philosophy. However, oil and gas production companies have shifted to a more decentralized mode, putting the control of fields back in the hands of field operation specialists (Boyer 1993). There is a new trend emerging among the makers of software for SCADA systems. While current systems tend to have programming logic for PLC located at remote sites, a new method of placing this code back under the control of a central facility is developing. This is accomplished by allowing the PLC code to be embedded in Microsoft Windows software. Many small companies see this as a way to regain a market share from the big players like Rockwell Automation and Wonderware. Smaller companies boast that they can provide a much cheaper means to control remote sites if all code is maintained on the master device commonly referred to as soft PLC. Also, smaller companies argue that personal computers running windows are better suited (e.g., more memory, hard drive space, faster) to store larger code than a Program Language Controller (Freer 1997).

     

    2.7 SCADA or Distributed Control Systems (DCS)

    From its inception in the 1960s, SCADA was understood as a system that was primarily concerned with I/O from Remote Terminal Units. In the early 1970s, DCS was developed. The ISA S5.1 standard defines a distributed control system as a system which while being functionally integrated, consists of subsystems which may be physically separate and remotely located from one another. DCS were originally developed to meet the requirements of large manufacturing and process facilities that required significant amounts of analogue control.

    The major differences with SCADA and DCS are as follows:

  • 1. Historically, Distributed Control Systems use Program Language Controllers and SCADA uses Remote Terminal Units.

    2. A PLC has more intelligence than a Remote Terminal Unit.

    3. Unlike RTU, a PLC is able to control sites without the direction of a master (Byrne 1997).

  • The lines between the two have blurred considerably in the late 1990s. SCADA systems have DCS capabilities. DCS have SCADA capabilities. Systems are tailored, depending on the operation they intend to control. Water utilities are following other industries and becoming more interconnected to the Internet. Also, the control function of what were once old telemetry systems is becoming more advanced, interconnected, and accessible through Internet, dialin, etc. This interconnectivity has paralleled the era of the ubiquitous distributed computing environment. For the purposes of this thesis, all systems used to control water supply systems are termed SCADA. Also, this thesis focuses on traditional SCADA that behaves as a central controlling system.

    There is a perception in the SCADA and water business that their systems are secure from cyber intrusion and not likely targets. However, the President's Committee on Critical Infrastructure Protection (PCCIP) concluded that cyber threats are a clear danger (risk) to all infrastructures (PCCIP 1997).

     

    2.8 Trends in SCADA and the Internet

    The Internet continues to grow at a phenomenal rate, doubling in size every 12-15 months (Engst 1996). By 2001, projections show the Internet to have over 200 million hosts and 2 billion people interconnected (Howard 1995). A host is a domain name that has an Internet Protocol address record associated with it (e.g., virginia.edu). This would be any computer system connected to the Internet (via full or part-time, direct or dial-up connections). A domain has a name server record associated with it. The internet represents an ever-increasing medium for people to share information and processes, hence the temptation for misuse. With this growth, more industries are executing more financial transactions, online banking, and sharing vast amounts of information. Likewise, utilities are beginning to take advantage of this medium, due to deregulation and the Federal Energy Regulation Committee (FERC) ruling for electric utilities and industry to share information (NSTAC 1997).

    Companies like Rockwell International are developing a number of automation control software products that can exploit the interconnectedness of the Internet. RS View 32, RS Portal, and RS Tools represent a portfolio of software that incorporate the newest Microsoft technologies of Active X and Object Linking and Embedding (OLE). RS Portal allows a user the ability to connect to a SCADA system from a remote site and download information or control a process.

    Considering the advances in process control and the Internet, executives, information system managers, and system analysts should have a greater understanding of potential misuses of this communication medium. However, in research, conducted by R.C. Hollinger, system security did not rank among the top twenty of management issues (Gray 1994). This lack of priority indicates a low awareness in the risks of cyber attack and exposure from the internet and within a company.

     

    2.9 Estimating Attacks and Incidents from the Internet

    Currently, there is a wide range of results that have been developed for estimating the likelihood of an attack or incident from the Internet. For the purposes of this thesis, a comparison of results from the Defense Information Systems Agency (DISA 1996), Air Force Information Warfare Center Security Posture Studies January 1995 (AFIWC 1995), and Howard’s thesis is used. An attack is a single unauthorized access or use attempt. An incident is a group of attacks that are distinguished by attacker, tools, results, or objectives (Howard 1995).

    The range of values for attacks and incidents is difficult to measure. The size, ubiquity, and complexity of the system do not allow more traditional methods for quantifying the probability of attacks. DISA (1996), AFIWC (1995), and Howard (1995) employed different methods to make their assessment. AFICW used organized attacks against Air Force bases. DISA conducted vulnerability studies by attempting to penetrate computer systems. Howard (1995) collected data from CERT between 1989 and 1995 and estimated that incidents are approximately 10 percent of attacks.

    Another estimate had the number of attacks in 1995 at 44 million (Cohen 1995). A summary of the results is provided in Table 2-1.

    DISA conducted its vulnerability studies from 1992-1995. They simulated 38,000 attacks. The results were that 35 percent were blocked, 4 percent detected, and 27 percent reported. Figure 2-6 summarizes the findings. (Kyas 1997) states that systems connected to the internet are eight times as likely to suffer from hacking than

    Table 2-1 Estimates of Cyber Attacks in 1995 (Howard 1995)

     

    those not connected to the internet. Also, he states that of all hacks of sites, 80 percent were via the internet and 20 percent internal to the organization.

     

    Figure 2-5 DISA Vulnerability Assessments (GAO 1996)

    Assuming that water utilities are at least as good as the systems DISA attacked, the probability of detecting an attack, given an attack succeeded can be estimated by the statistic , the proportion of trials that attacks succeed, where X is the number of attacks detected and n is the total number of successful attacks. Using a confidence interval , where is the percentile from the standard normal distribution and and 1- is the confidence interval, gives the estimate contextual meaning (Milton and Arnold 1990). For a confidence interval of 95 % means =0.05 and = 1.96 from the standard normal table. Therefore, the probability of detecting an attack is estimated to be

    = . Table 2-2 summarizes the likelihood for detection, reporting, and a successful attack.

    Table 2-2 Summary of Attack Estimates for Utilities

     

    The typical water utility will probably not be attacked by a cyber terrorist this year, or next year for that matter. It probably will not be exposed to a devastating flood either. Or, a given car will not be stolen. In at least two of these three cases people and organizations buy insurance or make arrangements in an attempt to protect themselves from these rare events. The amount of insurance or protective measures depends on public law, policy, or personal risk aversion. How much risk is acceptable? At what level of risk do people insure themselves or our assets? Table 2-3 provides an estimate of attack from the internet and a comparison to other types of rare or extreme events.

    Table 2-3 Estimation of Risk (Howard 1995)

     

    2.10 Denial of Service

    Intuitively, results of an attack are not equal in difficulty for the attacker and in consequences for the utility. Figure 2-7 provides a conceptual relationship between a result and its consequences (damage) to a system.

     

    Figure 2-6 Difficulty vs. Damage in Attacking Networks

    Denial of service is the major potential source of danger for a SCADA system. Denial of service attacks intentionally blocks or degrades a computer or network (Amorosa 1994). An attacker makes resources inoperative by taking up the shared resources’ time so that other processes are effectively stopped. This can be accomplished by taking up disk space, CPU slice, network applications, etc. (Garfinkel 1996). Howard (1995) writes:

  • Denial of service attacks over the Internet can be directed against three types of targets: a user, a host computer, or a network. ...an attacker must begin a denial of service attack by using tools to exploit vulnerabilities and then either obtain unauthorized access to an appropriate process or group of processes, or use a process in an unauthorized way. The attacker then completes that attack by using some method to destroy files, degrade processes, degrade storage capability, or cause a shutdown of a process or of the file system.

     

  • In summary, there are two types of information used for gauging vulnerability from the internet: attacks and incidents. Attacks are single events directed against a host. Incidents are a grouping of attacks that are distinguished either by method, attacker, or results. There is no exact information on the probability of attacking a water utility and a great deal of uncertainty in estimates. Assuming that utilities are as equally likely as any other host that is connected to the internet, Howard (1995) estimates that the rate of attack for root break-in could be as high as 1 attack in 10 years. Clearly, hosts are not equally likely to be attacked. Government and military sites are attacked at a considerably higher rate than other three letter domains. There exists potential, given access to data, the ability to determine the distribution of cyber intrusion with respect to three-letter domain. Table 2-3 sums up the various probabilities associated with an attack, based on the DISA findings (GAO 1996).

     

    CHAPTER 3 REVIEW OF RISK AND SYSTEMS ENGINEERING

     

     

    3.1 Systems Approach

    The systems approach to problem solving is ideally suited for encapsulating Total Risk Management. Systems engineering, unlike other engineering disciplines, start at the top and decompose a problem into smaller problems that support the goals of the entire system at every level. This holistic approach has the advantage of unifying the effort of designing complex systems and nesting the goals from one level to the next. Gibson (1991) identified six major phases:

    - determine the goals of the system,

    - establish criteria for ranking alternative candidates,

    - develop alternative solutions,

    - rank alternative solutions,

    - iterate, and

    - action.

    Risk management should be conducted as an integral part of systems engineering. Many of the evaluation criteria might include risk functions such as minimizing the expected value of damage or maximizing the surety of the system. Haimes (1998) developed 13 steps that include one aspect that is missing from Gibson’s approach:

    1.  
    2. Define and generalize the client’s needs. Consider the total problem environment. Clearly identify the problem.

       

    3. Help the client determine his or her objectives, goals, performance criteria, and purpose.

       

    4. Similar to step one, considers the total problem’s environment. Evaluate the situation, constraints, limitations, and available resources.

       

    5. Study a understand the interactions among the environment, technology, system, and people involved.

       

    6. Incorporate many models and synthesize. Evaluate the effectiveness and check the validity of the models.

       

    7. Solve the models through simulation and/or optimization.

       

    8. Evaluate various feasible solutions, option, and policies. How does the solution fulfill the client’s need? What are the costs, benefits, and risk tradeoffs for each solution?

       

    9. Evaluate the proposed solution for the long term and the short term.

       

    10. Communicate the proposed solution to the client in a convincing manner.

       

    11. Evaluate the impact of current decisions on future options.

       

    12. Once the client has accepted the solution, work on its implementation. If solution is rejected, return to the above steps to correct it so that the client’s desires are fulfilled.

       

    13. Post audit your study.

       

    14. Iterate at all times.

     

     

     

    3.2 Total Risk Management

    Haimes (1998) defines total risk management (TRM) as a systematic, statistically based, holistic process that builds on a formal risk assessment and management. TRM answers the risk assessment questions (Kaplan 1997),

  • - What can go wrong?

    - What is the likelihood that it will go wrong?

    - What are the consequences?

  • risk management questions (Haimes 1998),

  • - What can be done?

    - What options are available and what are their associated tradeoffs in terms of cost, risks and benefits?

    - What are the impacts of current management decisions on future options.

  • and sources of failures (hardware, software, organization, and human) within a multiobjective framework (Haimes 1998).

     

    3.3 Alternative Approaches and Methods

    In computer security, there are a few documented techniques for evaluating security and risk to computer systems. Although none specifically address SCADA, they are similar because SCADA has many of the same attributes as computer networks. Cooper (1989) has a nine-step approach. He defines risk analysis as "a technique for quantitative assessment of relative values of protective measures". Cooper’s metric is the annual loss expectancy ALE. The ALE method seeks to value the asset (e.g., network worth $100,000). Next, the expected value e is equal to probability of loss of the asset per year times the value v of the system.

    ALE = e = p•v

    Cooper’s risk analysis methodology is summarized below:

  • 1. Identify and value assets. (What must be protected?)

    2. Identify threats. (Protect against what?)

    3. Identify vulnerabilities. (What are the potential ways in which threats can be realized?)

    4. Estimate risks. (What is the probability of a vulnerability?)

    5. Calculate ALE for each vulnerability. (What is the statistically expected loss?)

    6. Identify potential protective measures. (How can assets be protected against threats?)

    7. Estimate ALE reductions for each vulnerability due to each protective measure. (What is the statistically expected amount saved?)

    8. Select cost-effective protective measures. (How are assets best protected against threats?)

    9. Respond to experience by modifying protective measures, by recovering from disasters, and by prosecuting transgressors. (How can feedback be used?)

  • White and Pooch (1996) define computer security risk analysis as "the process of identifying and evaluating the risk of being successfully attacked and suffering a loss of data, time, and person-hours versus the cost of preventing such a loss." The ultimate goal of the analysis is to determine the strengths of the computer’s security and areas that need to be improved. The benefits are improved security and understanding the system and its flaws. The metric is the comparison of the burden B of preventing a loss juxtaposed to the probability P of a loss L. (White and Pooch 1996) refer to this approach as BPL. In this method the burden is the benchmark for determining if a solution is worthy of consideration. The three-step method is shown below.

     

    Figure 3-1 The Steps of Risk Analysis (White and Pooch 1996)

    Internet Security Systems, Inc. (ISS) is a company that specializes in computer and network security. The ISS approach is one of an adaptive security model. Their formula for risk analysis is captured in an equation as a subset of security:

  •  

    Security = Risk Analysis + Policy + Implementation + Threat/Vulnerability Monitoring + Threat/Vulnerability Response

  • The ISS Adaptive Security Model uses a portfolio of software that automates the security process. It seeks out problems and notifies the system administrator by conducting the following:

  • - attack analysis and response,

    - misuse analysis and response,

    - vulnerability analysis and response,

    - configuration analysis and response,

    - risk posture and response,

    - audit and trends analysis, and

    - real-time user awareness support.

  •  

    Figure 3-2 ISS Risk Management Model (ISS 1997)

    This is accomplished primarily through their SAFEsuite™ software. ISS states that the adaptive security supports a 100 percent solution to computer security yet admits that risk can not be reduced to 0 percent. Their risk assessment process appears to be qualitative, implying that there is no mathematical or quantitative detail documented in their approach. Figure 3-2 summarizes risk analysis for ISS.

    Cohen (1997) relates classic risk analysis to computer networks as "simply listing the events for the network, determining probabilities for each event, calculating the expected loss for each event and the ROI for each mitigation technique, and doing the arithmetic." Cohen (1997) describes the following:

  • Standard risk analysis asserts that we calculate an expected loss (L) by multiplying the probability of each event (p(e)) that can cause a loss by the expected loss from that event (l(e)) and adding these results for all of the events (all e in E). Mitigation strategies are then optimized by examining each proposed mitigation technique to derive the reduction in expected loss associated with the technique's use, dividing by the cost of the mitigation technique to derive a return on investment (ROI), and applying the most cost effective (i.e., the highest ROI) method first. Apply methods until no technique with a high enough ROI for the organization is left, and you are done (Cohen 1997).
  • Cohen (1997) believes that the size, dynamics, and complexity of computer networks do not lend themselves to risk analysis. He states that it is impossible to enumerate all the possibilities, mitigating strategies, and quantify damage.

     

    3.4 Probabilistic Risk Assessment (PRA) and Management (PRAM)

    Probabilistic Risk Assessment is a quantitative approach used in many fields like nuclear, transportation, or rail. Kumamoto and Henley (1996) define risk as "a combination of five primitives -- outcome, likelihood, significance, causal scenario, and population affected." Mathematically, they define risk as

    , assuming n potential outcomes (Oi), where losses (Li) have some outcome for a causal scenario (CSi), and there is a population affected by some outcome (POi) (Kumamoto and Henley 1996). The purpose of risk assessment is the derivation of risk profiles posed by a given situation (Kumamoto and Henley 1996). "The purpose of risk management is to propose alternatives, evaluate risk profiles, make safety decisions, choose satisfactory alternatives to control the risk, and exercise corrective actions (Kumamoto and Henley 1996)." Assessment and management taken together are probabilistic risk assessment and management (PRAM). The PRAM approach recognizes that there are two ways to express risk profiles (e.g., outcomes versus likelihood). The likelihood of an outcome can be expressed objectively, subjectively, or combinations of both. Examples of objective likelihood are probabilities, percentage, or distributions. When there is limited information, likelihood may be subjectively evaluated and even assessed as possible, plausible, rare, or frequent (Kumamoto and Henley 1996). To study scenarios and failures, PRA uses event trees and fault trees to show causal scenarios.

    Table 3-1 Examples of Likelihood and Outcome (Kumamoto and Henley 1996)

    Kumamoto and Henley (1996) point out that different PRA of the same problem can lead to different trees because tree generation is an art, not a science. Unlike Cohen (1997), who is suspect of PRA because of limited information, size, dynamics, and complexity of computer systems, Starr (1987) makes these comments:

  • In the nuclear field emphasis on PRA has focused professional concern on the frequency of core melts. The argument to whether a core can actually melt with a projected probability of one in one thousand per year, or in a million per year, represent a misplaced emphasis on these quantitative outcomes. The virtue of risk assessments is the disclosure of the system’s causal relationship and feedback mechanisms, which might lead to technical improvements in the performance and reliability of the nuclear stations. When the probability of extreme events become as small as these analyses indicate, the practical operating issue is the ability to manage and stop the long sequence of events which could lead to extreme end results. Public acceptance of any risk is more dependent on public confidence in risk management than on the quantitative estimates of risk consequences, probabilities, and magnitude (Starr 1987).
  • Haimes (1995) would disagree with Cohen as well. Translating Toffler's vision (Alvin Toffler: Powershift, Bantam Books 1990)

  • As we advance into the Terra Incognito of tomorrow, it is better to have a general and incomplete map, subject to revision and correction, than to have no map at all.
  • into the risk assessment process implies that a limited database is no excuse for not conducting sound risk assessment. On the contrary, with less knowledge of a system, the need for risk assessment and management becomes more imperative (Haimes 1995).

    Is PRAM the best approach? This type of quantitative risk analysis is an established approach recognized by government and industry. However, research indicates a less rigorous approach by some practitioners, attempting to make computer network systems one hundred percent secure. Some software companies conduct a qualitative risk analysis and describe multiple management strategies. They follow the steps, yet the crucial elements of PRA (risk quantification, modeling, and analysis) are missing. What are the tradeoffs? How much security is enough? Invariably, analysis leads to the same conclusion -- buy their software and the problem is solved. Their approach proceeds from a false notion that their methodology can prevent intrusions and make systems one hundred percent secure.

    Chapter 4 develops the concept of survivability of the system instead of the focus on 100 percent security. It is the center of gravity of the thesis as it develops the modeling tools that will be applied in chapter 5.

     

     

     

    CHAPTER 4 FRAMEWORK FOR SCADA UTILITY SURVIVABILITY MODELING

     

    4.1 Risk Modeling

    Security is a relative concept. Systems will never be one hundred percent secure. The larger question is evaluating the survivability of the system. The overall goal is to make the SCADA system more survivable, given a cyber attack. In order to accomplish this goal, a probabilistic risk assessment and management framework is presented. There are four phases to the framework. Phase I begins by determining the risks to the system. This will be accomplished by solicited expert opinion, hierarchical holographic modeling, and results from internet-based survey. Phase II focuses on determining the sources of risk, model construction, and assessing the results. This will be accomplished using event tree and fault tree analysis. The event tree will provide a probability density function, exceedance probability, and help to understand how an event occurs and its consequences. The fault tree will add insight into why mitigating events on an event tree fail. Also, Phase II entails the construction of the partitioned multiobjective risk method (PMRM) to provide insight into the conditional expectation of extreme events (Asbeck and Haimes 1984). Phase III begins with the construction of functions for the multiobjective analysis. This phase provides the cost, risk and objective functions to analyze and compare policy options. This will be accomplished by developing alternatives and examining the changes in the exceedance probability. Phase IV begins by conducting a tradeoff analysis and ends by either drawing conclusions for a decision or returning to Phase I with the information learned from the analysis.

    Figure 4- 1 Risk Management Framework

     

    4.2 Internet Survey

    June 97 - January 1998 a survey was posted at "http://watt.seas.virginia.edu /~bce4k /home.html". The purpose of the survey was to gather information about the cyber threat, understand the state of SCADA in water supply systems, document any intrusions in the past year, and analyze trends among the administrators of these systems. A total of 62 individuals responded. Twelve responses were discarded because their identities or other qualifying information was not provided. Fifty were retained for study. Assuming that the number of water utilities is estimated to number between 6,000-10,000 then the survey represents 1.1-1.9% of the population (SCADA 1997). For perspective, CNN routinely surveys 1,000 or 0.0004% Americans and represents the results of 258 million. Many respondents provided information for more than one water utility. For example, a vice president of a water company provided results for 28 distinct systems. In the United States, 93 cities and 19 counties were represented. Internationally, responses included Israel, Canada, Colombia, and Australia. The results overwhelmingly showed that the disgruntled employee is the number one concern followed by internet hackers. Other results included 41% surveyed spend less than 10 % of time on system security (51% spend no time). Ten utilities (10 out of 50) reported attempts, successful unauthorized access, or use of their system. Corruption of information and denial of service were seen by respondents as the major concerns from a cyber intrusion. Unfortunately, 17% did not know the number of valves and 11% were unclear regarding the number pumps they controlled. Forty percent allow their operators access to the internet and 66% allow email access via their LAN. Sixty-four percent have remote access via their LAN and 60% have the ability to control their system from a dial-up connection. Interestingly, only 35% of water utilities describe their system as master-slave SCADA. Forty-seven percent felt that the disgruntled employee was the primary concern followed by hacker at 13%. Fifty-five percent agreed that the ultimate objective of an attacker is damage followed by challenge or status. Only 49% believed their greatest vulnerability was in implementation of their system. Also, 39% felt their system was safe from unauthorized access and only 37% from unauthorized use. Respondents agreed that denial of service and corruption of information would have the greatest impact on their water system. The tabulated results are provided below. Also, a copy of the actual internet survey is located in Appendix A.

    There were several lessons learned from this survey. The first lesson concerns the purpose of the survey. Early in the research it was determined that a survey was needed because of lack of information. Unfortunately, limited knowledge of the problem domain created a survey that was not focused on any particular aspect of the problem. It was unclear to many respondents who exactly should complete the survey. Was the survey intended solely for water utilities or for businesses as well? Another lesson learned was in determining the best way to count SCADA systems. Only 50 surveys were retained, yet this number accounted for 93 cities. Clearly, one respondent could bias the results. An internet-based survey was not the best medium for a survey of this kind. Several respondents were openly critical to gathering data of this kind over the internet. What could a terrorist do with data on the design or configuration of a particular city’s SCADA? For this reason several questions were intentionally deleted and not disclosed in this thesis. A better approach to a survey would be a sponsored survey by the American Water Works Association (AWWA) or another credible source. A press release followed by mailing of paper copies of the survey would produce better representation of the current state of SCADA in water utilities. Lastly, an appreciation for the time to look up emails on thousands of utilities was recognized. No organization was prepared to share their email list. Support from a major water organization would make mass mailings and press releases on a survey more realistic.

    1. What city or county do you provide water resources? There were 50 responses reporting either as a city, county, or outside the US.

     

    US Cities

     

    US Counties

     

    International Countries
    93
    19
    4

     

    2. What is your position in the water utility organization?

     

    Superintendent

     

    Manager

     

    Engineer /

    Analyst

     

    Operator

     

    Technician

     

    Other duty

    status

    5/48
    17/48
    12/48
    1/48
    3/48
    10/48

     

    3. How many valves and pumps does your system control?

     

    Valves

     

    Total / Responses

    Less than 50

    14/36

    More than 50

    16/36

    Unknown

    6/36

     

    Pumps

     

    Less than 50

    31/45

    More than 50

    9/45

    Unknown

    5/45

     

    4. How many Remote Terminal Units do you supervise/control with your SCADA system? There were 47 responses.

     

    Unknown

     

    0-5

     

    6-10

     

    11-20

     

    21-50

     

    51-100

     

    101-200

     

    over 200
    7/47
    9/47
    1/47
    2/47
    13/47
    3/47
    10/47
    2/47

     

    5. Do you allow access to the internet for your operators? There were 48 responses.

     

    Yes

     

    No
    19/48
    29/48

     

    6. Do you or your operators have access to email via an administrative LAN? There were 47 responses.

     

    Yes

     

    No
    31/47
    16/47

     

    7. Is the Local Area Network accessible via remote connections? There were 48 responses.

     

    Yes

     

    No
    33/48
    15/48

     

    8. Do you have the ability to control your system via a dial-up connection? (e.g. laptop, modem, and dial in to your server). There were 48 responses.

     

    Yes

     

    No

     

    Unknown
    29/48
    19/48
    0

     

    9. What communication medium do you use? There were 50 responses.

    Radio

    11/50

    Telephone leased line

    10/50

    Telephone party line

    1/50

    Combination

    13/50

    ISDN dial-up

    0

    ISDN dedicated

    0

    Other

    15/50

     

    10. Which best describes your SCADA? There were 46 responses.

    Distributed control

    22/46

    Master-slave control

    16/46

    Other

    8/46

     

    11. What is the speed of your connections in bits per second (bps)? There were 43 responses.

    300 bps

    0

    300-2,400 bps

    22/43

    4,800 bps

    1/43

    9600 bps

    1/43

    14,400 bps

    3/43

    28,000 bps

    2/43

    56,000 bps

    0

    64,000 bps

    0

    128,000 bps

    1/43

    Other

    11/43

     

    12. In your judgment, who do you see as your system’s primary concern from cyber attacks where one is the highest primary concern and 6 is the least? Responses varied from 36 to 46. The number of responses is the denominator for each score.

     

    Concern / Rank

     

    1

     

    2

     

    3

     

    4

     

    5

     

    6

    Hackers

    6/46
    7/46
    6/46
    4/46
    11/46
    12/46

    Spies

    1/38
    8/38
    23/38
    1/38
    3/38
    2/38

    Terrorists

    3/39
    5/39
    2/39
    6/39
    7/39
    16/39

    Corporate raiders

    2/38
    2/38
    4/38
    8/38
    3/38
    19/38

    Professional criminals

    2/36
    2/36
    4/36
    3/36
    11/36
    14/36

    Disgruntled employees

    18/38
    6/38
    5/38
    6/38
    2/38
    1/38

     

    13. What do you believe is the ultimate objective of an attacker? There were 45 responses.

    Challenge or status

    16/45

    Political gain

    1/45

    Financial gain

    3/45

    Damage

    25/45

     

    14. What tools do you think a potential threat is most likely to use to attack your system where 1 is the highest primary concern and 6 is the least. The responses varied from 38 to 44.

     

    Concern / Rank

     

    1

     

    2

     

    3

     

    4

     

    5

     

    6

    User command

    11/44
    4/44
    10/44
    1/44
    5/44
    13/44

    Script or program

    1/44
    8/44
    11/44
    3/44
    6/44
    15/44

    Autonomous agent

    2/40
    5/40
    10/40
    5/40
    2/40
    16/40

    Toolkit

    1/43
    3/43
    9/43
    4/43
    7/43
    19/43

    Distributed tool

    0/41
    5/41
    7/41
    2/41
    5/41
    22/41

    Data trap

    1/38
    3/38
    7/38
    1/38
    9/38
    17/38

    HERF attack

    3/39
    4/39
    7/39
    0/39
    6/39
    19/39

     

    15. Please rank which vulnerability is greatest in your SCADA system where 1 is your primary concern and 3 is the least. The responses varied from 37 to 39.

     

    Concern / Rank

     

    1

     

    2

     

    3

    Design vulnerability

    10/39
    9/39
    20/39

    Configuration vulnerability

    9/37
    22/37
    6/37

    Implementation vulnerability

    18/37
    10/37
    9/37

     

    16. Do you believe the design, configuration, and implementation of your system is safe from unauthorized access or unauthorized use? There were 46 responses.

     

     

    Yes

     

    No

    Unauthorized access

    18/46
    28/46

    Unauthorized use

    17/46
    29/46

     

    17. Which results from an attack on your SCADA system would have the greatest impact on your water resource system? There were 47 responses.

    Corruption of information

    22/47

    Theft of service

    1/47

    Disclosure of information

    2/47

    Denial of service

    22/47

    18. Who would you call in the event your SCADA system was tampered with? There were 49 responses.

    CERT (Computer Emergency Response Team)

    4/49

    Outsourced security firm

    1/49

    Police department

    22/49

    FBI

    8/49

    Other

    14/49

     

    19. Did you experience a computer system intrusion? Indicate the type checking all that apply. There were 10 responses from 50 surveyed.

     

    Manipulated data

    3/10

    Installed a sniffer program

    0

    Stolen password

    1/10

    Probing/scanning your system

    2/10

    Trojan logons

    0

    IP spoofing

    0

    Introduced viruses

    5/10

    Denied use of service

    3/10

    Downloaded data

    0

    Compromised information security

    1/10

    Compromised email/documents

    0

    Publicized intrusions

    1/10

    Harassed personnel

    2/10

    Other

    1/10

     

    20. Do you have the capability to detect attempts to gain access to your system? There were 44 responses.

    yes

    19/44

    no

    16/44

    unknown

    9/44

     

    21. Have you detected any attempts to gain access to your system in the past year? There were 44 responses.

    Yes

    1/44

    No

    35/44

    Unknown

    8/44

     

    22. If yes, how many successful unauthorized attempts have you detected in the past 12 months?

    The respondent from question 26 did not answer this question.

    23. How much time do you spend on ensuring your network is secure? There were 43 responses to this question.

    none

    22/43

    < 10%

    18/43

    10-20%

    2/43

    21-30%

    0

    31-40%

    0

    41-50%

    0

    51-60%

    0

    61-70%

    0

    > 71%

    1/43

    4.3 Survivability

    Survivability is defined as the capability of the system to exist, function, and recover in spite of adversity (American Heritage 1996). Matalucci (1998) defines surety as "as a level of confidence that a system will perform in acceptable ways in both the expected and unexpected circumstances." Matalucci and Miyoshi (1997) characterize surety as a combination of attributes like safety, security, use control, and reliability. "Surety describes an elevated state of safety and security; a state which is under control and very reliable" (Matalucci 1998). However, no mathematical definition has been discovered to date. There are many ways to measure survivability. Reliability and coverage are well known. Reliability is the probability of correct performance under normal system operation at time tn, given that the system was operational at tn-1 (Haimes 1998). Coverage is defined as the ability of the system to automatically recover from a fault during normal system operation (Dugan and Trivedi 1989). Unfortunately, cyber intrusion is an unusual loading on the system and not indicative of normal system operation.

    This thesis introduces a slightly different concept of surety. Surety is defined as a measure of acceptable system performance under an unusual loading. An unusual loading may be characterized as a rare or extreme event that was not envisioned in the design of the system. Examples of unusual loading are physical attack, natural disaster, and cyber intrusion. Surety will be used to quantitatively assess risk. Surety also shares a qualitative dimension that is analogous to safety -- the level of risk deemed acceptable. Surety will be used to measure the survivability of a water utility and its SCADA system, given a willful attack to the SCADA. Survivability is characterized as a vector whose attributes or components are redundancy, robustness, resilience, and security.

    Figure 4-2 models the various inputs and outputs for a SCADA/water supply system. This black box model helps one to visualize all the inputs that may influence the output - survivability.

    Figure 4- 2 SCADA Black Box Model

    In this thesis, survivability is assumed to be dependent upon four states of the system e.g., the state of redundancy, robustness, resilience, and security of the system. Each state variable is dependent upon exogenous, random, and decision variables that affect the output vector survivability.

    Redundancy is defined as the ability of certain components of a system to assume functions of failed components without adversely affecting the performance of the system itself (Matalas and Fiering 1977). Examples of redundancy in water supply systems are additional pumps, valves, water lines or tanks beyond those needed for normal operation. An example of redundancy in SCADA is additional communication mediums between the master and remote terminal unit.

    Robustness is defined as the degree of insensitivity of a system design to errors in the estimates of those parameters affecting design choice (Matalas and Fiering 1977). Robustness reduces the sensitivity of the system to extraordinary conditions. An example of robustness in water utilities might be additional capacity of water during periods where demand may otherwise exceed supply. In SCADA, robustness may best be characterized in systems where remote terminal units operate under conditions where communication from the master terminal unit is delayed or interrupted. In general, SCADA systems with distributed intelligence are inherently more robust than centrally controlled systems (SCADA 1997). If intrusion occurs at the MTU and communications become disabled, remote terminal units are capable of function in spite of the intrusion.

    Resilience is defined as the ability of a system to operate close to its optimal design technically and institutionally over a short run after an attack, such that the losses are within manageable limits (Matalas and Fiering 1977). Resilience in water supply systems is emergency and crisis action plans that provide a utility with the ability to continue to function after an attack. In SCADA, a remote terminal unit’s electronic programmable read-only memory (EPROM) can be reset by cutting and restoring power (Lambert 1997). Another example is a crisis action plan that details how the utility will cover down on remote sites and manually control their function, given a system wide SCADA failure. Another component of resilience that is closely aligned with the software of networks and SCADA is valency. Valency is the ability of the system to react to intrusion and to restore normal system operation. The concept of valency is borrowed from its biological meaning and introduced in this thesis. High valency implies a strong capability to recover from unauthorized access or use. Software that actively seeks computer viruses and destroys the corruption is an example of valency or resilience.

    Security is defined as the ability of certain components of the system to deter, detect, and defend against attacks (Haimes et al. 1997). In water supply systems security is multifaceted. There are numerous examples like fences, locks, alarms, and sensors. In SCADA, sensors, and alarms provide feedback on water quality. Some SCADA systems also have features that work to prevent unauthorized access or use. In general, secure systems have properties that reduce the likelihood of successful attacks.

     

    4.4 Taxonomy for Assessing Computer Security

    Understanding how SCADA systems are vulnerable to cyber threats requires an understanding of the nature of cyber threats. Are water utilities concerned about cyber terrorists? Is the threat primarily from a disgruntled employee? What is the motivation of the attacker? There are many ways for attacking computer systems. Depending on the goal of the attacker, the arsenal of tools is large, and the number of malicious viruses are impossible to enumerate (Cohen 1994). What is needed is a method for characterizing the threat to SCADA systems. Assessing the threat from the internet has been attempted in a variety of ways. However, until recently, there has been no systematic method. In order to explore these methods, an understanding of some basic types of attack is required.

     

    4.5 Definitions and Terms for a Taxonomy

    A computer virus is defined "as programs that can infect other programs by modifying them to include a possibly evolved version of itself" (Cohen 1994).

    Table 4-1 List of Terms (Cohen 1995)

    Trojan horses

    Toll fraud networks

    Fictitious people

    Infrastructure observation

    Email overflow

    Time bombs

    Get a job

    Protection limit poking

    Infrastructure interference

    Human engineering

    Bribes

    Dumpster diving

    Sympathetic vibration

    Password guessing

    Packet insertion

    Data diddling

    Computer viruses

    Invalid values on calls V

    Van Eck bugging

    Packet watching

    PBX bugging

    Shoulder surfing

    Open microphone listening

    Old disk information

    Video viewing

    Backup theft

    Data aggregation

    Use or condition bombs

    Process bypassing

    False update disks

    Input overflow

    Hang-up hooking

    Call forwarding fakery

    Illegal value insertion

    Email spoofing

    Login spoofing

    Induced stressed failures

    Network services attack

    Combined attacks

    etc.

     

    One technique for addressing the threat is to create a list of as many types of attacks within imagination (Howard 1995). The common terms are listed in Table 4-1. These terms are one-dimensional and fail to address the nature of the threat. Also, this list states the what but provides no insight into who, when, or why .

    Another approach is to build a similar list of characteristics. This has an advantage over listing terms because it does not include hacker jargon. Still, another technique is to list attacks by some empirical means. Terms, categories, and lists can be difficult to remember. Also, they may not be as intuitive as required to understand the threat (Amoroso 1994).

    Howard (1995) developed a computer and network attack taxonomy. He uses a philosophy of ways, means, and ends to fully describe the dimensions of a cyber attack. An attacker succeeds by transitioning through the operational sequence of tools, access, and results. His method succeeds in satisfying the goals of a successful taxonomy, described by Amoroso (1994) as the following:

  • - mutually exclusive -- classification in one category excludes all others;

    - exhaustive -- the sum of categories include all possibilities;

    - unambiguous -- precise so that classification is not uncertain, regardless of who is classifying;

    - repeatable -- repeated applications result in the same classification, regardless of who is classifying;

    - accepted -- logical and intuitive so that they can become generally approved;

  • - useful -- can be used to gain insight into the field of inquiry.

    Though the classification is not completely successful in capturing every possibility, it provides the answers for who, what, when, and why. Howard’s taxonomy is provided in Figure 4-3. There are five components to his model: attackers, tools, access, results, and objectives. His taxonomy was designed with the formal definition of computer security, "preventing attackers from achieving objectives through unauthorized access or use of computers or networks" (Howard 1995).

     

    4.6 Understanding the Taxonomy

    Howard (1995) divides attackers into six categories:

    - hackers -- challenge and status,

    - spies -- information for political exploitation,

    - terrorist -- fear and political gain,

    - corporate raiders -- a financial advantage,

    - professional criminals -- personal gain, and

    - vandals -- damage.

     

    These categories are generally broad enough to capture every type of person or organization. Access is characterized as either unauthorized use or unauthorized access. Attackers simply exploit vulnerabilities in the system. Cohen writes the following:

  • "The point is that viruses do not exploit implementation flaws. They exploit flaws in the security policy. That is, the policy that allows you to share information and interpret it in a general purpose way, allows a virus to spread, regardless of its implementation" (Cohen 1994).
  • There are essentially three ways that attackers take advantage of computer systems: software bug, design, or configuration. Software bugs are common in UNIX and NT based machines. Design flaws in systems are the most difficult to overcome. Configuration is dangerous because the operator of the system believes his system is in normal operational conditions (Cohen 1994).

     

    Results are described as the damage caused by the attacker to achieve his objective. He accomplishes this by using the tools. Howard (1995) describes four distinct categories:

  • - corruption of information -- any unauthorized alteration of files stored on a host computer (Amoroso 1994),

    - disclosure of information -- dissemination of information to anyone not authorized access to the information (Howard 1995),

    - theft of service -- unauthorized use of computer or network without degrading services to other users (Amoroso 1994),

    - denial of service -- the intentional degradation or blocking of computer, or network resources (Cohen 1995).

  • The tools to conduct an attack complete the taxonomy (Howard 1995):

  • - user command -- This is a tool used to guess the password or enter a long string and telnet into system.

    - script or program -- At the User Command interface, attackers can make use of scripts or programs for the automation of commands. An example would be a "crack" program to determine passwords. Another example is a "Trojan Horse" program that is used to copy over an existing program. It performs like the program it replaced but also conducts other operations that the user is unaware about such as erasing files, logging passwords to a file, or corrupting data.

    - autonomous agent -- This is the most widely publicized of means of attacks. It is similar to a Trojan Horse. The difference is that an Autonomous Agent contains program logic to make an independent choice of what host to attack (e.g., the computer virus).

    - toolkit -- A grouping of scripts programs and autonomous agents into a GUI program (e.g., rootkit).

    - distributed tool -- A tool that attacks a host simultaneously from multiple hosts. Clock time can be used to synchronize the attack.

    - data trap -- The exploitation of the electromagnetic field surrounding a computer. This field contains information about the computer. Namely, to reveal data in transit or on the terminal.

    - *HERF Attack -- High Energy Radio Frequency Attack. The ability to emit a pulse from a device that could be hidden in a soda can in a garbage can that could destroy all electronic devices, but not damage the building or other structures. (* This form of attack was added by the author.)

     

  • Figure 4- 3 Taxonomy of Computer and Network Attacks (Howard 1995)

    This taxonomy serves a useful purpose in understanding the ways, means, and ends of an attacker. This taxonomy will provide a meaningful background in identifying the risk to SCADA for water supply systems, using hierarchical holographic modeling (HHM).

     

    4.7 Hierarchical Holographic Modeling (HHM)

    Howard’s taxonomy uses a philosophy of ways, means, and ends to fully describe the dimensions of cyber intrusion. An attacker succeeds by transitioning through the sequence of tools, access, and results. This approach is useful because it facilitates risk identification when combined with hierarchical holographic modeling (HHM) (Haimes 1981). The HHM is a framework that allows one to identify virtually all sources of risk in a complex, multifarious (e.g., hierarchical non-commensurable objectives, multiple decisionmakers, multiple transcending aspects, and risks) system (Haimes 1998). It accomplishes this by decomposing the complex system into smaller subsystems. Its approach is holographic, meaning that one observes a system through a lenseless camera. This has the advantage of providing a very broad perspective on the nature of the complex system. By analyzing systems along functional, temporal, modal, geographic, political etc., one can develop a list that identifies sources of risk, with respect to all aspects of the system. Haimes (1998) detailed these advantages of hierarchical decomposition:

  • - decomposition methods can reflect the internal hierarchical nature of large-scale systems;

    - trade-off analysis can be performed among subsystems and the overall system;

    - through decomposition, the complexity of a large-scale multiobjective system can be relaxed by solving several smaller problems;

    - adds both robustness and resilience to modeling by capturing various systems aspects and other societal elements;

    - adds more realism to the entire modeling process by recognizing that the limitations of modeling complex system via a single model are circumvented by a model that addresses specific aspects of the system.

     

  • "By considering different hierarchical structures together we can expect synergistic understanding of the overall system and its corresponding sources of risk and uncertainty"(Haimes 1998). HHM allows one to identify events from outside the system (e.g., cyber attacks) that impact on it. It also accounts for interior events that affect the system (e.g., software) (Haimes 1998).

     

    4.8 Recent Uses of the HHM in Identifying Risks

    There is justification in use of this modeling framework in recent history. Executive Order 13010 issued on July 15, 1996 established the President’s Commission on Critical Infrastructure Protection (PCCIP). It funded research on eight infrastructures: telecommunication, electric power, gas and oil, banking and financing, transportation, water supply, emergency services, and continuity of government. Universities and industry developed strategies and recommendations to harden our infrastructures. The University of Virginia’s Center of Risk Management of Engineering Systems provided the Commission with a paper detailing the vulnerabilities of water supply systems to terrorist attack. The paper used the HHM to develop a list of real, perceived, or imagined risks, and their corresponding decomposition. The Center outlined over 104 sources of risk to the water supply system (Haimes et al. 1997).

    The HHM was also used to examine the Maumee River Basin, the largest sub-basin of the Great Lakes Basin (Haimes 1998). It identified risks for five planning sub areas, eight water sheds, seven objectives, several counties, States, political, and geographical interests (Haimes 1998).

     

    4.9 Risk Modeling Using HHM

    Using HHM, the SCADA system was decomposed into eight major categories. These categories represent the major sources of risk to SCADA. In general the major categories are identified as

    A, B, C, …

    And their sub-categories as

    A1, A2, A3, …

    B1, B2, B3, …

    C1, C2, C3, …

    Category A: Function

    Given the importance of the water distribution system, their function is a major source of risk from cyber intrusion. This category may be partitioned into three sub-categories or zones:

    A1 Gathering,

    A2 Transmitting and,

    A3 Distributing.

    Gathering is defined as all actions a SCADA system requires to manage the accumulation of water (SCADA Mail List 1997). Transmitting is defined as the communication between MTU and RTU, whether the medium is telephone, radio, ISDN, or fiber optic (SCADA Mail List 1997). Distributing accounts for the process, direction, and logic a SCADA system employs (SCADA Mail List 1997).

    Category B: Hardware

    The hardware of SCADA is vulnerable to tampering in a variety of configurations. There are nine sub-categories of hardware:

    B1 MTU,

    B2 RTU,

    B3 Modem,

    B4 Telephone Line,

    B5 Radio,

    B6 ISDN,

    B7 Satellite,

    B8 Alarm and,

    B9 Sensor.

    Depending on the tool and skill of an attacker, these sub-categories could have a significant impact of water flow for a community.

    Category C: Software

    Perhaps the most complex, this category represents the most dynamic aspects of changes in water utilities. Software has many components that are sources to risk.

    C1 Controlling,

    C2 Operating System and,

    C3 Communication.

    There is additional decomposition of this sub-category in Figure 4-4 that show other sources of risk.

    Category D: Human

    There are two major sub-categories employee and attackers:

    D1 Employees --

    D11 Systems Analyst,

    D12 Technician,

    D13 Operators,

    D14 Trainees,

    D15 Manager,

    D2 Attackers --

    D21 Disgruntled Employee,

    D22 Hacker,

    D24 Terrorist,

    D25 Vandal,

    D26 Spy, and

    D27 Professional Criminal.

    D28 Corporate Raider

    This category addresses a decomposition of who is capable of tampering with a system.

    Category E: Tools

    A distinction is made between the various types of tools an intruder may use. There are six sub-categories:

    E1 User Command,

    E2 Script or Program,

    E3 Autonomous Agent,

    E4 Toolkit,

    E5 Data Trap, and

    E6 High Energy Radio Frequency Weapon (HERF).

    These tools allow an intruder the means to tamper with a system.

    Category F: Access

    An intruder has many paths into a system. An intruder can exploit these vulnerabilities and pose severe risk to the system. There are five sub-categories:

    F1 Implementation Vulnerability,

    F2 Design Vulnerability,

    F3 Configuration Vulnerability,

    F4 Unauthorized Use, and

    F5 Unauthorized Access.

    A system may be designed safe yet the installation and use may lead to multiple sources of risk.

    Category G: Geographic

    Location is not relative for many risks of cyber intrusion. In the context of the HHM, four sub-categories are identified:

    G1 International

    G2 National

    G3 Local

    G4 Internal

    There are clearly sources of risks around the world that can tamper with SCADA systems. International borders are irrelevant because of the internet.

    Category H: Temporal

    The temporal category seeks to show how present or future decisions affect the system. Replacing a legacy SCADA system in 10 years may be decided today. Also, threats to systems will change with time. Therefore, the lifecycle of the system is addressed with the temporal category. There are four partitions of this category:

    H1 Long Term > 10 Years

    H2 Short Term < 10 Years

    H3 near Term < 5 Years

    H4 Today

    The modeling effort begins by looking at all sources of risk to SCADA along functional and modal lines. The holographic approach is used in an attempt to envision all sources of risk within imagination. Next a decision is made on what sources of risks to focus the model and analysis using the results of an internet based survey. The Analytic Hierarchy Process (AHP) (Saaty 1980) may also be used to aid a decisionmaker when deciding which risks to focus. However, this thesis will use the results of the survey to select two sources of risk to focus the analysis.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    Figure 4- 4 HHM for SCADA and Water Utilities

     

    4.10 Goal Development and Indices of Performance

    The purpose of goal development is to determine the decisionmaker’s needs for the risk analysis. After a consultation, the goals tree in Figure 4-5 might serve along with the indices of performance.

     

    Figure 4- 5 Goals Tree Water Utility SCADA System

     

    4.11 Event Tree and Fault Tree Analysis

    Event tree analysis asks "what if" to determine the sequence of events that lead to consequences. From the event tree one can construct a probability density and exceedance probability. Event trees help to understand how an outcome occurs as it transitions through mitigating events. The consequences are conditioned on the occurrence of the initiating event and subsequent mitigating events (e.g., hacker intrusion through a firewall and disgruntled employee accessing through a dialin connection). Fault-tree modeling adds insight into how mitigating events fail.

     

    Figure 4- 6 Event tree Development (Ang and Tang 1984)

    Figure 4-6 shows the technique of event tree development (Ang and Tang 1984). At the top of Figure 4-6, redundancy, robustness, resilience, and security have been added. In Figure 4-7 an example event tree of a hacker is willfully attacking the system with the goal of reducing the water supply to a city. In this example, the hacker transitions through the path of mitigating events and succeeds at each mitigating event with probability p and fails with probability 1-p. The event tree reads as follows: Given a cyber intrusion, does the firewall system protect against cyber intrusion? The answer is yes with probability p1 and no with probability p2 where p2 = 1-p1. In the case of yes for mitigating events one and two, the consequences on water supply are described as

     

    Figure 4- 7 Event Tree for Cyber Intrusion

    none with probability p1 + p2p3. By following the path from intrusion to consequences one can determine the likelihood of each path and its associated consequences. The consequences of each path may be provided through expert elicitation. Subsequently, simulated attacks of systems may be useful in understanding the consequences for a particular water utility.

    Fault tree analysis seeks to understand why a mitigating event fails. This is accomplished by starting with mitigating event as the top event and enumerating through the use of Boolean logic, all the possible failure modes attributing to the ultimate mitigating event failure. The mitigating event firewall protection, (Figure 4-7) is

    Figure 4- 8 Fault Tree for Firewall System

    modeled as a firewall system fault tree in Figure 4-8. The firewall system fails with probability p2. For example assume that a firewall system has only a packet filter firewall protecting the SCADA system, represented by Firewall A. The failure of the firewall system is the union of A1, A2, and A3 which is the event E(A1 and A2 and A3), then P(A1 and A2 and A3) = P[(A1)(A2)(A3)] = p2. Given additions to this example of Firewall B, C, ... n, the probability of p2 decreases as additional firewalls are added to the system. These additions represent potential policy options for a decisionmaker. For each addition, there is a decrease in risk that is in conflict with cost. Indeed, for the event E(Firewall A and B and C) the probability p2 is P(A and B and C)= P[(A)(B)(C)].

     

    4.12 Distributions from Event Tree Analysis

    The event-tree model of SCADA ultimately produces a probability density function that provides a density for each path. Figure 4-9 shows an example of the PDF

    and exceedance probability (1- CDF) generated by an event tree. The exceedance probability for the current system and comparison to future policy options can serve as a useful model to understand measures of outcome.

     

    Figure 4- 9 Example Consequences from Event Tree

     

    4.13 Partitioned Multiobjective Risk Method

    The Partitioned Multiobjective Risk Method (PMRM) (Asbeck and Haimes 1984) provides a useful tool in analyzing risks of extreme events that are low probability and high consequences. "Catastrophic, or extreme events, are defined as having a low probability of occurrence but a high damage level. Proper risk management must focus on the management of extreme events -- for it is these catastrophic disasters, not the more common "expected value" events that cause grave harm to the system. A risk measure associated with extreme events -- the conditional expectation -- can be useful in management of extreme events as it does not average out catastrophic events with more high-frequency, low-consequence events" (Asbeck and Haimes 1984).

    Let X be a continuous random variable of damage or consequences (e.g., percent water supply reduction) that has a cumulative distribution function F(X) and a probability density function f(x), which are defined by the relationships and The CDF represents the nonexceedance probability of x, the probability that X is observed to be less than or equal to some value x (Asbeck and Haimes 1984). The exceedance probability of x is defined as the probability that X is observed to be greater than or equal to x, and is equal to one minus the CDF evaluated at x (Asbeck and Haimes 1984). The expected value, average, or mean value of the continuous random variable X is defined by . In the partitioned multiobjective risk method (PMRM), the expected value is extended to generate conditional expected values for damages. This allows a decision maker to see not only the expected value of damage but adds understanding of low probability high damage. It also serves as a useful tool to demonstrate the surety of a system.

    PMRM reserves the use of ƒ25 for risk functions that partition the damage axis. ƒ1 is reserved for cost (Haimes 1998):

  • ƒ2 -- high probability low consequences

    ƒ3 -- medium probability moderate consequences

    ƒ4 -- low probability large consequences

    ƒ5 -- expected value of damage

     

  • Figure 4- 10 Partitioning on the Probability Axis (Asbeck and Haimes 1984)

    For this thesis, the damage axis is the percentage of waterflow reduction denoted by q. Therefore, the function f4 is the expected value of Q, given that q is greater than F-1 ():

    ƒ4 = E[Q | q > F-1 ()]

    Thus, for a particular policy option, there is the additional measure of risk f4, in addition expected value E(Q), denoted by f5.

    For the purpose of the thesis, ƒ2 will be introduced to demonstrate surety in the system. Mathematically, the surety function ƒ2 is the expected value of Q given that q is less than F-1(). The objective is to maximize the surety of the system. Using the PMRM, this is accomplished by "minimizing the unsurety" of the system. For a given partition, 1-is the probability that damages are less or equal to F-1 (). Damages from 0 to F-1() connote the surety region. The partition 1- is a qualitative assessment a decisionmaker makes. For example, the partition at 1- may connote the decisionmaker the surety of the system. Thus for the decisionmaker, the surety is the conditional expected value of damage, ƒ2.

    ƒ2 = E[Q | q < F-1 ()]

     

    4.14 Multiobjective Tradeoff Analysis

    In order to analyze and compare alternatives, four objective functions will be used in this thesis to model system survivability regarding the state variable security v. Figure 4-11 shows the black box model of the SCADA system this thesis will address in chapter 5. The thesis will focus on cyber intrusion as one random input r, decision variables for components of alternatives xi, exogenous variables of costs ci, and the output will be the percent water flow reduction q. The state variable v is modeled as a one-to-one functional relationship with the output function ƒ(q). The state of security v is improved by minimizing the percent water flow reduction q and thus, improving the system survivability. The concept is to input cyber intrusion as an initiating event into the event tree and determine the percent water flow reduction to the current system. Alternatives defined as sn will be comprised of decision variables xi. Alternative scenarios will be constructed from decision variables to improve the capability of mitigating events. Each alternative will be measured using the four objective functions below. Figure 4-11 describes the scope of the mathematical model addressed in the thesis. As more information is understood regarding the relationships of the remaining state variables and inputs, the mathematical model may be improved.

     

     

     

    Figure 4- 11 Black-box Model of SCADA System

    1. Cost. The cost function is ƒ1 for a proposed alternative, where ci is the cost associated with each decision variable and xi are the decision variables for a given alternative. For example, decision variables might include a new firewall, a dial-back modem, or communication protocol. The objective is to minimize the cost of an alternative where

    ƒ1 = cixi .

    2. Expected Value of Damage. The risk function for a proposed alternative is ƒ5, where objective is to minimize the percent water flow reduction q. For the discrete case, ƒ5 is

     

    = E[Q] = p(q)q where p(q) is the probability mass function and q is the average damage. For the continuos case .

    3. Conditional Expected Value of Damage. The risk function for a proposed alternative is ƒ4, where the objective is to minimize the conditional expected value of percent water flow reduction q. For a 1/n probability partition, this is the worst case expectation.

    ƒ4 = E[Q | q > FQ-1 ()].

    4. Surety. The surety function for a proposed alternative is ƒ2, where the objective is to minimize the conditional expected value of percent water flow reduction q within the surety region. For a 1/n probability partition, this is the best case expectation.

    ƒ2 = E[Q | q < FQ-1 ()]

     

    4.15 Evaluation

    Each alternative si has various options that affect the outcome of each objective function. A table similar to the one below will be used to annotate the results. Lastly, each objective function will be plotted to understand the tradeoffs. In Chapter Five a prototype city is introduced that serves as a useful method for applying the methodology of this thesis.

    Table 4-2 Alternatives vs. Performance Measures

     

    CHAPTER 5 APPLICATION OF THE METHODOLOGY

     

    5.1 Introduction

    Chapter five introduces a prototype city designed from aggregated data from the internet survey in order to apply the methodology described in chapter four. The purpose of this chapter is to conduct a risk analysis focusing on the risk management framework presented in chapter four. Three experts have assisted in the design of the city, policy options, and performance assessments. All assessments of the performance of policy options will be based on research and expert opinion. After analyzing the city, this chapter will present the results of the analysis.

    There are two scenarios under study. Scenario one is a disgruntled employee who seeks to disrupt water supply for City XYZ. As a recent employee, she is familiar with the SCADA system and its capabilities. Scenario two is similar to scenario one except that the intruder is a hacker who attempts to gain access to the utility’s SCADA via the internet. He attempts to penetrate the firewall system and tampers with the SCADA system. These scenarios were chosen for three reasons. The survey shows that these scenarios are the most dangerous and most likely a utility will face. Expert opinion confirms the concern, especially in the case of the disgruntled employee. Lastly, there is documented evidence in the survey that shows attempts and to a lesser degree success with these intruders.

     

    5.2 Experts for Elicitation

    Three experts were used for the purposes of assessing likelihood and outcomes. They share expertise as management consultant specializing in business automation and information security with decades of combined experience in the field. Expert One (Nelson 1997) has worked with a wide range of applications from business and government accounting through technical applications such as Electronic and Mechanical Computer Assisted Drafting. He has worked on a variety of standard and proprietary platforms including UNIX, PC, DOS, and networks. He has experience with computer security issues both for PC Networks and for UNIX multi-user systems. He has been involved with implementing security measures at the computer hardware level, the operating systems level, and at the applications level. Associated with this, he has been involved in a number of projects involving disaster recovery. Expert One’s distinguishing talent is in computer fraud tracing and intrusion testing for internet firewalls and mission critical applications like SCADA (Nelson 1997).

    Expert Two’s (Wiese 1997) experience is in information technology, SCADA, and in both the oil and gas industry, and the water industry. He has extensive experience in the development of large information technology systems, including groundwater modeling, Monte Carlo simulation of water supply sources, pipe network analysis, water resource information systems, oil production recording, and oil well drilling information systems. Expert Two has worked with an international oil company in Brunei (Borneo), successfully introducing three very large SCADA systems to the offshore operations. Currently, expert two is responsible for the SCADA program of the Water Corporation of Western Australia installing eight medium to large SCADA systems, and a host of smaller ones. Responsibilities include project management, user requirements analysis, project justification, design, tender specification and analysis, and development of technical standards. Expert Two has in depth knowledge of water supply, SCADA communications issues, the application of SCADA to water supply and wastewater, and the use of SCADA in water supply operations.

    Expert Three (Hillebrand 1997) has significant experience in consulting for SCADA system development and employment. During the past ten years, he has specialized in the planning and design of SCADA based and distributed control systems for regional and municipal water distribution networks and water/waste water treatment plants. His professional duties range from proposal preparation and marketing, through needs analysis and detailed design, to the drawing up of technical specifications and project management services. He has extensive experience in the field, including international work in Israel and participation in World Bank funded projects for Nigeria, Ghana, and El Salvador (Hillebrand 1997).

     

    5.3 Problem Definition for City XYZ

    City XYZ is a very small city of 10,000. It has a water distribution system that accepts processed and treated water "as is" from an adjacent city. XYZ’s water utility is primarily responsible for the uninterrupted flow of water to its customers. The SCADA system uses a master-slave relationship, relying on the total control of the SCADA Master. Remote terminal units are dumb. They accept instructions and perform their functions in accordance with their programming.

    There are two tanks and two pumping stations. See Figure 5-1. The first tank serves the majority of customers. The second tank serves relatively fewer customers in a high-level area. Tank Two is at a point higher than the highest customer served. The function of the tanks is to provide a buffer and to allow the pumps to be sized less than peak instantaneous demand. They also allow customers to be provided with water at a constant head. Tank capacity has two component segments. One is reserve storage that allows the tank to operate over a peak week when demand exceeds pumping capacity. The other component is control storage. This is the portion of the tank between the pump cut out level and the pump cut in level. Visually the control storage is the top portion of the tank. When the water level falls to the tank cut-in-level the pump will begin to operate. The cut-in-level refers to a level in the tank that water reaches and triggers the pump to start. If demand is less than pump rate, (low demand periods) the level will rise until it reaches the pump cut-out-level. If the demand is greater than the pump rate, the level will continue to fall, and fall into the reserve storage. The tank level will stay in this area until the demand has fallen for a sufficient time to allow the level to recover. The reserve storage is sized according to demand (e.g., tank one, which serves more customers with a larger reserve storage).

    The SCADA Master communicates directly with pumping stations one and two and instructs them when to start and stop. The operating levels are kept in the SCADA Master. Pump Station One boosts the flow of water from the supply beyond the rate that can be supplied by gravity. The primary operational goal is to maximize gravity flow and as necessary, to pump off peak as much as possible. Chlorine residual is monitored at this site to verify quality of supplied water. Pumping Station Two’s function is to pump

     

    Figure 5- 1 Water Distribution System (Wiese et al., 1997)

    off peak. This pumping station pumps water from tank number one to tank number two. The primary operational goal is to pump off peak as far as possible. The pumping stations receive a start command from the SCADA Master (via MTU) and attempt to start the duty pump. At each tank there are separate inlets (from the source) and outlets (to the customer). Tank level, and flow in and out are measured at each. There is an altitude control valve that shuts the inlet when the tank is full. The tank’s full position is defined above the pump cut-out level so there is no danger of shutting the valve while pumping. If something goes wrong and the pump does not shut off, the altitude valve will close, and the pump will stop on overpressure on delivery, to prevent the main from bursting (Wiese et al. 1997).

    The SCADA system is always dependent on the communications network of the MTU and the SCADA master. See Figure 5-2. The SCADA master regularly polls all remote sites. Remote terminal units only respond when polled to ensure no contention on the communication network. The system operates automatically; the decision to start and

     

    Figure 5- 2 SCADA for City XYZ (Wiese et al. 1997)

    stop pumps is made by the SCADA Master and not an operator sitting at the terminal (Wiese et al., 1997). The system has the capability to contact operations staff after hours via a paging system in the event of an alarm.

    The staff has dial-in access. So if contacted, they can dial-in from home, and diagnose the extent of the problem. The dial-in system has a dedicated PC that is connected to the Internet and the office's Local Area Network (LAN). There is a packet filter firewall protecting the LAN and the SCADA system (Figure 5-3). The SCADA Master commands and controls the entire system. The communications protocols in use for the SCADA communications are proprietary. The LAN and the connection to the internet use transmission control protocol and internet protocol TCP/IP. Dial-in connections use TCP/IP. Instructions to the SCADA system are encapsulated with TCP/IP. Once the instructions arrive to the LAN, the SCADA Master de-encapsulates TCP/IP, leaving the proprietary terminal emulation protocols for the SCADA system.

    The central facility is organized into different access levels for the system. Depending on need, an operator or technician has a certain level of access. Table 5-1 summarizes the control centers policy on access.

    Table 5- 1 Security Level Function (Boyer 1993)

     

    Security Level

     

    Employee

     

    Functions Available
    A
    All

    View any screen

     

    B

     

    Operator Trainee

    All "A" functions, change controller set points, acknowledge alarms, & turn equipment on/off

     

    C
    Qualified Operators

    All "B" functions, change alarm points, & disable controllers and alarms

     

    D

     

    Instrument Technicians

    All "C" functions, view any screen,

    tune controllers, analyze all alarm reports, & simple configuration

     

    E

     

    Systems Engineer

    All "D" functions, complex configuration, adjust accounting factors, & assign security codes

     

    5.4 Identifying Sources of Risk (Phase I)

    The City’s SCADA system is modeled using HHM. The HHM identifies 60 sources of risk for a centrally controlling SCADA system. The access points for the system are the dial-in connection points and the firewall that connects the utility to the internet.

    The most likely course of action for an intruder is to use a password, access the system and control devices (Nelson 1997). Beyond this one must assume the intruder’s intentions are to gain notoriety, attention, or cause damage. Since physical damage from

     

    Figure 5- 3 HHM for City XYZ

    dial-in is inherently difficult for equipment due to analog fail-safes, the intruder’s probable goal is to manipulate the system in such a way to effect the flow of water (Nelson 1997). For example, creating water hammers may burst mains and damage customers’ pipes. Or, the intruder can shut off valves and pumps to reduce water flow. The decisionmaker concludes that the scenarios should focus on an intruder’s ability to manipulate the system with the goal of reducing the flow of water for the city. After discussing the potential threats from intruders, the decisionmaker also concludes that he is most concerned with the prospect of a disgruntled employee tampering with the SCADA system in such a way that would affects the water flow for the City. His secondary concern is the prospect of a hacker intruding the system for the same purpose as the disgruntled employee. To focus the analysis, two plausible scenarios are developed. For the purposes of the thesis, five alternatives will be analyzed using scenario one -- a disgruntled employee.

     

    5.4.1 Scenario One (Disgruntled Employee)

    The water utility control center personnel are organized by security level function as shown in Figure 5-4 (Boyer 1993). This is a theoretical arrangement that is different in practice. The operators have worked in the control center for 10 years. They have been allowed greater access than is part of policy. Passwords are supposed to be updated by the engineer every 90 days, but this policy is not enforced because many use the same system password as they do with other aspects of their life (e.g., ATM card, savings account, checking account). Downsizing recently required the water utility supervisor to lay off one of the three operators. The utility purchased a pager system to replace the night-time operator of the system. The operator was quite upset because in her opinion the water utility supervisor and systems engineer are needless overhead for such a small facility. In her opinion, her job should be spared and the duties of either the engineer or the supervisory be combined into one. After 60 days, unemployment checks have been exhausted and the operator has fallen on hard times. She rationalizes that if she can cause a major disruption of water flow in the City, the utility will have to hire her back. The City will probably blame the cut backs in the night time position at the facility for the incident. Her plan is to break into the SCADA system via the dial-in connection to the master terminal unit and manipulate the remote terminal units. Her goal is to reduce water flow by 50-100 percent. She reasons that the passwords are the same that have been in place for months. She thinks her chances are 95 percent for getting remote access (Nelson 1998). Her knowledge of the system and ease of use of the laptop emulation software, will allow her to make the necessary changes. After accessing the system, she will disable the alarms and shut down each remote site.

    She reasons that she could probably respond to the alarms from her laptop if they do go off, and prevent a page of the duty person. She knows that the system’s software will allow her to turn off the alarms. She believes that her chances of success are 80 percent at defeating the alarms (Nelson 1998). Another challenge will be manipulating the fail-safes. The system is designed in a way that prevents her from controlling physical fail-safes by remote access. However, her work around is to prevent the upper storage tanks from filling over night. She estimates that her chances are 70 percent (Nelson 1998). In the morning, daytime usage will quickly exceed supply. The tanks will empty and the fail-safes will not have responded until the tanks have emptied. Due to system design, the physical fail-safes will only respond if the tanks become too full or empty. Afterwards, she will log back on to the operating system after the water levels are down and trash the SCADA system disk, thereby destroying all traces, and making it necessary to revert back to manual operation of the system (Wiese 1998). This will prevent the paging system from alerting the supervisor or the engineer. She estimates that she her chances of prevent others from being notified via the paging system at 80 percent (Nelson 1998).

     

    5.4.2 Scenario Two (Hacker)

    Sam is an operator at the control center. Sam maintains his personal finances on his computer at work. The systems engineer does not mind, because the space for storing the data is really quite small. Mistaking the allowance for this type of activity, Sam also creates a homepage that is maintained on the utility web server. Being a computer enthusiast, he enjoys downloading shareware. One download required a serial number. Sam searched for a hack site to get the serial number. He discovered a new site at alt.hackushackyou. This sight was very intriguing because it had downloads of hundreds of passwords for many of the new games. He emailed the webmaster for the password to get to the file transfer protocol (ftp) site. The next day he received an email back from the webmaster of the site with the code. The webmaster pointed Sam to the correct ftp site and Sam began his download.

    The webmaster noticed the email from Sam. The webmaster was surprised that Sam was actually sending him an email from work with a homepage in the signature block. The webmaster now had access to personal information about an employee with access to an important computer, at a American water utility. He believed that he had sufficient information to scan the firewall and guess Sam’s password. The webmaster decides to use a trojan attack. He reasons that Sam will probably try to run some of his acquired software on his office computer. If nothing else, just to check that they work before taking them home. If Sam executes a file containing a trojan gateway, it will give the webmaster a session on Sam’s work computer from which he can roam the internal network. Depending upon the authentication mechanisms in place, the session on Sam’s computer may also give him proxy access to the SCADA systems. If this is true, Jim can attempt to disrupt the flow of water. For this scenario, Jim’s goal is to reduce water flow 50 percent for as long as he can prevent the utility from discovering his actions (Nelson 1998). It is surmised that a hacker has access to a multitude of wardialer programs and can quickly break the Sam’s password. Therefore, Jim’s ability to defeat the packet filter firewall is assessed at 80 percent (Nelson 1998). Given that Jim gains access through the OS his chances are 80 percent of getting into the SCADA system (Nelson 1998). This assumes that Jim finds the emulation software that is residing on the server. For the purposes of the thesis, the scenario concludes with the expert’s assessment of the hacker’s ability to defeat alarms at 5 percent, defeat fail-safes at 0.1 percent, and prevent notification at 30 percent (Nelson 1998). The reduced chance of success is due to a hacker’s limited knowledge of the system. Tables 5-2 and 5-3 summarize the estimates of success for both scenarios. Expert Three’s assessments were significantly different for scenario two. In his judgment, the hacker in the example, coupled with the description of the city, influenced quite a significant difference in this scenario.

    Table 5- 2 Estimates for Scenario One (Disgruntled Employee)

     

    Firewall

    Penetrated

    OS Penetrated

     

    SCADA

    Accessed

     

    Alarms

    Defeated

    Fail-

    Safes

    Operator

    Notified

    Water-

    flow

     

    Reduced

    (%)

     

    Expert 1

    N/A

    0.95

    0.95

    0.80

    0.70

    0.80

    100

     

    Expert 2

    N/A

    0.90

    0.90

    0.99

    0.70

    0.80

    < 100

     

    Expert 3

    N/A

    0.80

    0.80

    0.80

    0.70

    0.70

    < 100

    Table 5- 3 Estimates for Scenario Two (Hacker)

     

     

    Firewall

    Penetrated

     

    OS Penetrated

     

    SCADA

    Accessed

     

    Alarms

    Defeated

     

    Fail-

    Safes

     

    Operator

    Notified

     

    Water-

    flow

    Reduced

    (%)

     

    Expert 1

    0.80

    0.80

    0.80

    0.05

    0.01

    0.30

    < 50

     

    Expert 2

    0.80

    0.80

    0.80

    0.01

    0.01

    0.20

    < 1

     

    Expert 3

    0.60

    0.80

    0.90

    0.95

    0.95

    0.99

    < 25

     

     

    5.5 Indices of Performance (IP)

    For each risk function, the decisionmaker wants to know what is the percent water flow reduction at probability 0.90 and the extreme damage at probability 0.01. To accomplish this, the PMRM will partition on the probability axis at 1- = 0.90 and 1- = 0.01 for all alternatives.

    1. Cost. The cost function is ƒ1 for a proposed alternative, where ci is the cost associated with each decision variable and xi are the decision variables for a given alternative. The objective is to minimize the cost of an alternative where

    ƒ1 = cixi .

    2. Expected Value of Damage. The risk function for a proposed alternative is ƒ5, where objective is to minimize the percent water flow reduction q. For the discrete case, ƒ5 is

     

    = E[Q] = p(q)q where p(q) is the probability mass function and q is the average damage. For the continuos case .

    3. Conditional Expected Value of Damage. The risk function for a proposed alternative is ƒ4, where the objective is to minimize the conditional expected value of percent water flow reduction q. For a 1/n probability partition, this is the worst case expectation.

    ƒ4 = E[Q | q > FQ-1 ()].

    4. Surety. The surety function for a proposed alternative is ƒ2, where the objective is to minimize the conditional expected value of percent water flow reduction q within the surety region. For a 1/n probability partition, this is the best case expectation.

    ƒ2 = E[Q | q < FQ-1 ()]

     

    5.6 Assess the Risks (Phase II)

    The next two sections contain event-trees for each scenario. They represent the current system’s state of performance given expert opinions assessment of the intruder’s ability to transition through the mitigating events of the event tree.

     

    5.6.2.1 Scenario One Event-tree (Current System)

    Figure 5-4 shows the event-tree for scenario one (disgruntled employee) from expert one. The initiating event, cyber intrusion negotiates each mitigating event and culminates with consequences at the end of each probability path.

     

    Figure 5- 4 Event Tree for Scenario One (Disgruntled Employee)

     

    5.6.2.1.1 PDF/CDF and Exceedance Probability graphs

    Assuming a uniform distribution of damages for each level of consequences, a probability density is generated. See Figure 5-5. For instance, small consequences have a range of 0-5 percent. The "distribution of damage" is uniform from 0-5 U~(0-5), 5-25 U~(5-25), 25-50 U~(25-50), and 50-100 U~(50-100) percent. This PDF is converted to a cumulative distribution function (CDF). Lastly, the exceedance probability is calculated by plotting one minus the CDF to analyze the risk function using partitioned multiobjective risk method (PMRM). Figure 5-6 shows a summary for each expert.

     

    Figure 5- 5 PDF / CDF for Scenario One (Expert 1)

     

    Figure 5- 6 Exceedance Probability and PMRM Scenario One (All Experts)

    Using the PMRM, the exceedance probability is partitioned for surety on the probability axis at 1-=0.90. The worst case conditional expected value (ƒ4) is partitioned at 1-=0.01. These partition points will be used throughout the analysis to show how the value of each risk function changes for each policy option sn. The calculations for F-1() and F-1() and their corresponding worst case and best case conditional expected value for the current system.

     

    5.6.2.1.2 Calculations of Benchmarks for Indices of Performance

    The current system is evaluated using the equations for each IP in order to benchmark the system using expert one. For the current system, cost (f1) is not calculated because there are no new costs for the current system. Cost is incurred as alternatives are developed. Therefore, cost is ƒ1 = cixi = $ 0.00 for the current system.

    The worst case conditional expected value of percent water flow reduction is calculated by first determining the corresponding partition for 0.01 probability, F-1() = 98.7 % partition on the damage axis. In general one can simply interpret this from the graph by inspection or one may interpolate. The conditional expected value for this new region is

    ƒ4 = .

    There is a 0.01 probability (1 chance in 100) that the water flow reduction will be greater than 98.7 percent and the expected level of water flow reduction in this region is 99.5%.

    Surety is calculated by partitioning on the probability axis at 0.90 and determining the partition on the damage axis, F-1() = 1.20 %. Therefore, the best case conditional expected level of water flow reduction in this surety region is

    ƒ2 = 0.60 %. The expected value of water flow reduction is ƒ5 =E[Q] = = 36.9 %.

    The current system’s percent water flow reduction is summarized in Table 5-4.

     

    Table 5- 4 City XYZ’s SCADA System Performance for Scenario One

    Cost

    ($) ƒ1

    Best Case 1/10 Water Flow Reduction (%) ƒ2
    Worst Case 1/100 Water Flow Reduction (%) ƒ4
    Expected Value of Water Flow Reduction (%) ƒ5
    0.00
    0.60
    99.5
    36.9

     

     

    5.6.2.2 Scenario Two Event-tree (Current System)

    Figure 5-7 contains the event tree for scenario two (hacker). In this scenario, the hacker’s point of entry is through the utility’s firewall system. Note that in this example, the probabilities are less than a disgruntled employee regarding his ability to successfully defeat the alarms and failsafes. Expert Three’s estimates are provided below. In this scenario the firewall system is a very simple packet filter firewall. It is easily scanned

     

    Figure 5- 7 Event Tree for Scenario Two (Hacker)

     

    and penetrated. There are many alternatives in firewall protection. For the rest of the analysis, the focus on generating alternatives will be with respect to scenario one and expert one. Figure 5-8 shows the results of an exceedance plot on a log scale. This was necessary because the extreme event damage has such low probability. In a normal plot the tail does not adequately show the information for a partition at 0.01. The log plot provides more insight into what the tail looks like at the partition of 0.01.

     

    Figure 5- 8 Exceedance Probability for Scenario Two (All Experts)

     

    5.7 Generate Alternatives for System (Phase III)

    Alternative one consists of three components: (1) Outsource the web page hosting x1. (2) Policy on password sharing (enforced) e.g. no sharing x2. There is no single system password. (3) Policy on no personal internet usage x3. All internet usage logged and randomly checked by administrator. The estimated cost to implement is $1,000. The yearly operating cost is $7,200 (Expert 2).

    Alternative two consists of three components: (1) Implement a new application filter firewall that isolates the internal SCADA system from the web server x4. (2) Dialback modem configured for call back with logging of all phonecalls with second person after normal business hours x5. (3) Terminated employee’s access canceled immediately upon separation. The estimated cost to implement is $15,500. The yearly operating cost is $10,00 (Expert 1).

    Alternative three same as two but adds an additional component of a token based authentication server with dialin x6. The estimated cost to implement is $32,500. The yearly operating cost is $6,000 (Expert 1).

    Alternative four consists of three components: (1) Re-program system to restrict actions that can be done by remote dialin users x7. (2) Re-program system to restrict access to user ID and time of day by duty roster file to control "out of hours" access control x8. (3) Re-program system to alarm the action of someone trying to suppress the system alarms x9. This must be acknowledged by a second person. The estimated cost to implement is $50,000. The yearly operating cost is $6,000 (Expert 2).

    Alternative five consists of four components: (1) Add alarm limits to tanks that will allow earlier warning x10. (2) Add alarm to notify unusual periods of excessive pumping x11. (3) Make the system enforce password changes x12. (4) Separate operations from administrative roles. Admin access is designated at a specific terminal x13. The estimated cost to implement is $8,000. The yearly operating cost is $1,200 (Expert 2).

    Alternative six is simply the status quo that is, do nothing and accept the current risk. There is no cost for this alternative $0.00. Table 5-5 summarizes the new probability estimates from expert one and net present value at 7 % and a 10 year cost to maintain for each alternative. See Appendix B event trees for every estimate for expert one. For the purposes of the thesis ƒ1 was not directly calculated. For the value of ƒ1, each alternative’s cost was simply estimated by the cost to implement the entire alternative plus that cost of yearly maintenance for a 10 year time horizon.

    Table 5- 5 Probability and Cost Estimates (Nelson et al. 1998)

     

    OS Penetrated

    SCADA

    Accessed

    Alarms

    Defeated

    Fail-

    Safes

    (averted)

    Operator

    Notified

    (paged)

    Discount rate of 7% for

    10 years ($)

    Alt 1
    0.70
    0.80
    0.80
    0.70
    0.80
    51,504
    Alt 2
    0.30
    0.40
    0.80
    0.70
    0.80
    84,721
    Alt 3
    0.01
    0.01
    0.80
    0.70
    0.80
    72,515
    Alt 4
    0.50
    0.50
    0.30
    0.10
    0.10
    88,870
    Alt 5
    0.75
    0.75
    0.20
    0.10
    0.10
    15,904
    Alt 6
    0.95
    0.95
    0.80
    0.70
    0.80
    0.00

    5.7.1 Analysis of Results for Scenario One

    Table 5-6 shows the results for each risk function and its associated cost.

    Table 5- 6 Results for Each Alternative

     

    Alternatives
    Performance Measures
    1
    2
    3
    4
    5
    6
    ƒ1 Cost ($)
    51,504
    84,721
    72,515
    88,870
    15,904
    0.00
    ƒ2 Best Case for 1/10 (%)
    0.5
    0.17
    0.20
    0.18
    0.18
    0.60
    ƒ5 Expected Value of Water Flow Reduction (%)
    24.1
    7.5
    3.02
    3.73
    4.0
    36.9
    ƒ4Worst case for 1/100 (%)
    99.5
    95.6
    42.5
    56.1
    42.3
    99.8

     

    Figure 5-9 shows a graph of the exceedance probability for each alternative. A log graph was chosen to better illustrate the behavior of the distribution at extremely low probability.

    Figure 5- 9 Exceedance Probability for Each Alternative

     

    5.8 Draw Conclusions through Tradeoff analysis (Phase IV)

    Figure 5-10 shows the tradeoff for each alternative identified by sn. The value of ƒ2 for all alternatives except s6, the status quo were very close to the same. The expected

    value of water flow reduction within the surety region varied 0.06-0.17 percent. Alternatives s3 and s4 were dominated by s2, s5, and s6.

    Alternative s3 had the best overall score for the expected level of water flow reduction ƒ5. However, s5 performed almost as well and cost $56,500 less than s3. Alternatives s1, s2, and s4 were dominated by s3, s5, and s6. Alternative s3 scored the same as s5 for the conditional expected value of water flow reduction ƒ4. However, it cost $56,500 more than s5. Alternatives s1, s2, s3, and s4 were dominated by s5 and s6. For each risk function, alternatives s5 and s6 remained on the Pareto optimal frontier.

     

    5.9 Sensitivity Analysis

    An important aspect of this analysis is sensitivity. The decisionmaker might ask, "how sensitive are the results to changes in parameter and model changes?" The following sections address sensitivity. For this analysis there are two guiding assumptions provided by the decisionmaker. The decisionmaker believes that the estimates on the mitigating event fail-safes (ME4) underestimates an intruder’s ability. He directs that this parameter be changed by two orders of magnitude, and , to see the changes in values for ƒ4 and ƒ5 for each alternative. He also believes the structure of the event tree might affect the model’s output. He questions the value of mitigating event -- OS password protection (ME1). Therefore, he directs that ME1 be removed and to compare the changes in ƒ4 and ƒ5 for alternatives s5 and s6.

    Table 5-7 and Figure 5-11 shows the results for parameter sensitivity. Essentially, the parameter changes for and shifted the Pareto optimal frontier to the left. Figure 5-12 shows the model sensitivity frontier for ƒ1 versus ƒ5.

    There was a significant shift in the expected value for and . Alternatives s3, s5, and s6 remained on the Pareto optimal frontier for both orders of magnitude change. There was a significant drop in the expected value of damage for all alternatives. For example, alternative s5 expected value decreased to 3 percent.

     

     

    Figure 5- 10 Alternatives, ƒ1 vs. ƒ2, ƒ5, and ƒ4

     

    Table 5- 7 Parameter Sensitivity Results

    sn
    ƒ1 ($)
    ƒ4
    ƒ4()
    ƒ4()
    ƒ5
    ƒ5()
    ƒ5()
    s1
    51,504
    99.5
    90.5
    90.5
    24.1
    5.1
    3.2
    s2
    84,721
    95.6
    57.2
    5.54
    7.5
    3.5
    3.0
    s3
    72,515
    42.5
    15.5
    57.1
    3.0
    3.0
    3.0
    s4
    88,870
    56.1
    20.4
    20.4
    3.7
    3.1
    3.0
    s5
    15,904
    42.3
    20.2
    5.5
    4.0
    3.1
    3.0
    s6
    0
    99.8
    NA
    NA
    36.9
    NA
    NA

     

    Table 5-8 compares the original results for alternatives s5 and s6 with the changes from removing ME1. To provide additional insight into the model’s sensitivity, alternatives are compared to the an event tree with every estimate set to 0.5.

    Table 5- 8 Model Sensitivity Results

    Results from Table 5-6
    ME1

    Removed

    ME1

    Removed

    All ME

    set to 0.5

     

    s1
    s2
    s3
    s4
    s5
    s5
    s6
    sn
    ƒ4
    99.5
    95.6
    42.5
    56.1
    42.3
    83.5
    96.5
    92.5
    ƒ5
    24.1
    7.5
    3.0
    3.7
    4.0
    4.3
    11.2
    7.1

     

    Figure 5-12 shows the sensitivity of ƒ4 and ƒ5 to changes in the event-tree model. The value of ƒ5 for alternative s5 essentially remained unchanged yet the value of ƒ4 increased by 40 percent. Overall, removing a mitigating event may change the results.

     

    Figure 5- 11 Parameter Sensitivity Comparison

     

    Figure 5- 12 Model Sensitivity Comparison

     

    5.10 Summary of Methodology

    Chapter five has synthesized the concepts presented in chapters three and four to illustrate how the risk management framework might be used to help decisionmakers determine a preferred solution to the cyber intruder threat. The framework was applied to a small city from information learned from the internet survey and input from experts. This example city provided a useful means to demonstrate the applicability of the methodology.

    Multiobjective tradeoff analysis provides a decisionmaker with a tool that graphically shows the cost-risk-benefit for an alternative. Alternative s5 achieved a 67 percent decrease in risk with respect to surety. That is, at probability 0.90, the best case conditional expected value of percent water flow reduction is less than 0.20 percent. For every test of sensitivity alternatives s5 and s6 remained on the Pareto optimal frontier. Chapter 6 reviews the methodology presented in this thesis and the contributions to systems engineering. Lastly, it presents ideas on future work with SCADA systems.

     

     

     

    CHAPTER 6 CONCLUSIONS

     

     

    6.0 Summary

    This thesis has presented a risk management framework for analyzing risks to supervisory control and data acquisition systems. The thesis focused on master-slave centrally controlling systems but may be apply to other control systems as well. The goal of this thesis was to develop a risk management framework that uses existing probabilistic risk assessment (PRA) methodology to quantify the risks of willful threats into water utility SCADA systems. This framework can assist decisionmakers in understanding the risks of cyber intrusion, consequences and tradeoffs in order to maximize the survivability of the system. Surety, a measure of survivability, was defined as a measure of acceptable system performance under an unusual loading. A survey was conducted to understand the current state of SCADA in water utilities, to document information on cyber intrusion and the concerns of system security administrators. Using hierarchical holographic modeling (HHM), sources of cyber risk to SCADA were identified. Event trees and fault trees are used to model the probabilistic consequences of cyber intrusion on water supply. Cost and risk were introduced as the performance measures to evaluate policy options using the partitioned multiobjective risk method. Lastly, a prototype city is analyzed to demonstrate the applicability of the methodology.

     

     

    6.1 Contributions

    There are seven contributions to the field of systems engineering.

  • • Developed the first risk management framework for cyber risk to water utility SCADA systems.

    • Developed the first mathematical formulation of surety using the partitioned multiobjective risk method.

    • Established the first historical results of an internet-based survey that details the current state of SCADA in water utilities and documents information on cyber intrusion and system security.

    • Used HHM for identifying sources of cyber risk to SCADA.

    • Applied the first use of extreme event analysis to the assessment of water supply reduction.

    • Developed the first conceptual black box model of survivability for SCADA systems.

    • Applied the risk management framework to a realistic example problem.

  •  

    6.2 Future Work

    There are many areas where this research may be extended. One such area may be to model the system’s supply and demand of water, to determine the effect of SCADA failure on water supply. Under normal conditions supply exceeds demand. However, if a cyber intrusion reduces the flow of water, there exist the probability that demand will exceed supply. Another area may be in modeling a different types of control systems. For example, how would distributed control systems that allow remote sites a greater degree of local control be modeled using event trees and fault trees? In one sense, distributed control systems (DCS) may not be vulnerable to an attack on the central site, yet remote sites might be vulnerable. A future effort might also include an assessment of the different classes of systems to determine which (SCADA, DCS, Soft PLC, etc.) option is better at cyber intrusion. This endeavor would significantly help utilities make better decisions about the type of system they buy to gather, deliver, and transmit water.

     

     

    REFERENCES

    Applegate, Lynda, M. Corporate Information System Management: Text and Cases, Fourth Edition, Irwin, Inc. Boston, MA. 1996.

     

    Amoroso, Edward, Fundamentals of Computer Security Technology, Prentice-Hall PTR, Uppersaddle River, NJ, 1994.

     

    Ang, Alfredo H-S. and Tang, Wilson H. Probability Concepts in Engineering Planning and Design: Volume II Decision, Risk, and Reliability, John Wiley and Sons, New York, NY. 1984.

     

    Asbeck, E., and Y. Y. Haimes. "The Partitioned Multiobjective Risk Method," Large Scale Systems 6(1), 13-38, 1984.

     

    Behar, Richard, "Who’s Reading Your Email?" Fortune Text Edition, February 3, 1997, http://www.pathfinder.com/@@uxdr5ayaeny7duax/fortune/1997/97020 (July 1, 1997).

     

    Boyer, Stuart, A. SCADA: Supervisory Control and Data Acquisition, Instrument Society of America, Research Triangle, NC. 1993.

     

    Brown, Eryn, "The Myth of Email Privacy", Fortune Text Edition, February 3, 1997, http://www.pathfinder.com/ @@uxdr5ayaeny7duax/fortune/1997/97020 (July 1, 1997).

     

    Bryne, Bob, SCADA Mail List, "Email with Mr. Byrne", SCADA expert, scada@gospel.iinet.au (May1997).

     

    Cohen, Frederick, B. A Short Course on Computer Viruses: Second Edition, John Wiley and Sons, Inc., 1994.

     

    Cohen, Frederick, B. "Relativistic Risk Analysis", Network Security Magazine, http://all.net (June 5, 1997).

     

    Cohen, Frederick, B. "Risk Management or Risk Analysis?", Network Security Magazine http://all.net (March 10, 1997).

     

    Cohen, Frederick, B. "Where Should We Concentrate Protection?", Network Security Magazine, http://all.net (December 12, 1996).

     

    Cooper, James, A. Computer and Communications Security: Strategies for the 1990s, McGraw-Hill, New York City, NY. 1989.

     

    Denning, Peter, J. Computers Under Attack: Intruders, Worms, and Viruses, Association for Computing Machinery Press, New York City, NY. 1990.

     

     

    Duganm Joanne, Bechta and Trivedi, Kisitur, S., "Coverage Modeling for Dependability Analysis of Fault-Tolerant Systems", IEEE Transactions on Computers, Vol. 38, No. 6, 1989.

     

    Elgamal, Taher, "Securing Communications on the Intranet and Over the Internet", Netscape Communication Corporation, http://home.netscape.com/newsref/ref/128bit.html (June 30, 1997).

     

    Engst, Adam, C. Internet Starter Kit, Macmillion Computer Publishing, Indianapolis, IN, 1996.

     

    Garfinkel, Simson and Spafford, Gene, Practical Unix and Internet Security: Second Edition, O’Reilly and Associates, Inc., New York, NY. 1996.

     

    Gray, Paul, Management of Information Systems: Second Edition, The Dryden Press, New York City, NY. 1994.

     

    Gibson, John E., How to Do a Systems Analysis, University of Virginia Printing and Copying Services, Charlottesville, VA. 1996.

     

    Haimes, Yacov, Y., "Hierarchical Holographic Modeling", IEEE Transactions on Systems on Systems, Man and Cybernetics, Vol. SMC-11, No. 9, 1981.

     

    Haimes, Yacov, Y., Matalas, Nicholas, C., Lambert, James H., Brown, Donald, E., Johnson, Barry, W., Fellows, James, F.R., and Jackson, Bronwyn A., "Reducing the Vulnerability of Water Supply Systems to Attack", Center of Risk Management, University of Virginia, June 11, 1997.

     

    Haimes, Yacov, Y., Risk Modeling, Assessment, and Management, John Wiley and Sons, Inc., New York, 1998.

     

    Hanson, Royce, Editor, Perspectives on Urban Infrastructure, National Academy Press, Washington D.C. 1984.

     

    Haslam, Edward, P. "Indiana WETnet: A Virtual Water Resource", Water Resources Update, Universities Council on Water Resources, Issue Number 99, Spring 1995.

     

    Howard, John, D. "An Analysis of Security Incidents on the Internet", Carnegie Mellon University, April 7, 1997, http://www.cert.org/research/JHThesis/Start.html (May 28, 1997).

     

    Hillebrand, Cary, Expert Three, Technical expert specializing in the planning and design of SCADA based and distributed control systems for regional and municipal water distribution networks and water/waste water treatment plants, 1997.

     

    Internet Security Solutions, "The Right Answer-Adaptive Security", ISS White Papers, http://www.iss.net (October 5, 1997).

     

    Kaplan, Stan, "The Words of Risk Analysis", Risk Analysis, Vol. 17, No. 4, 1997.

     

    Kline-Robach, "Embracing Information Technology: A New Era for Water Resources Programming", Water Resources Update, University’s Council on Water Resources, Issue Number 99, Spring 1995.

     

    Kumamoto, Hiromitsu, and Henley, Ernest J. Probabilistic Risk Assessment and Management for Engineers and Scientist, IEEE Press, New York, 1996.

     

    Kyas, Othmar, Internet Security: Risk Analysis Strategies and Firewalls, International Thompson Publishing, 1997.

     

    Lambert, James, H. "Probabilistic Risk Assessment of Water Systems Vulnerable to Tampering", Risk-Based Decision Making in Water Resources VIII. Y.Y. Haimes, D.A. Moser, and E.Z. Stakhiv, Eds., New York: American Society of Civil Engineers (to appear)March 1998.

     

    Levin, Richard, B. The Computer Virus Handbook, Osborne McGraw-Hill, Berkeley, CA. 1990.

     

    Lambert, Robert, "An Interview in Newport News, Va.", President, Automation, Inc., June 12, 1997.

     

    Matalus, N.C. and M.B. Fiering, "Water-Resource System Planning, in Climate, Climate Change and Water Supply", Studies in Geophysics, National Research Council, National Academy of Sciences, Washington, D.C., 1977.

     

    Matalucci, Rudolph V. Personal communication of a course on architectural surety designed by Matalucci at the Department of Civil Engineering, University of New Mexico and Sandia National Laboratory, Albuquerque, NM., March 1998.

     

    Matalucci, Rudolph V. and Miyoshi, Dennis S. "An Introduction to the Architectural Surety Program", Assuring the Performance of Buildings and Infrastructures Conference, Sandia National Laboratory, http://www.sandia.gov/archsur/archs~10.htm (February 18, 1998).

     

    McAfee, John, Computer Viruses, Worms, Data Diddlers, Killer Programs, and Other Threats to Your System, Saint Martin’s Press, New York City, NY. 1989.

     

    Milton, J.S., and Arnold, J.C., Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences, 2nd Ed., McGraw-Hill, New York City, NY. 1990.

     

    Mungo, Paul and Clough, Bryan, Approaching Zero: The Extraordinary Underworld of Hackers, Phreakers, Virus Writers, and Keyboard Criminals, Random House, New York, City, NY. 1992.

     

    National Security Telecommunications Advisory Committee (NSTAC), "Information Assurance Task Force Risk Assessment", http://www.ncs.gov/n5_hp/reports/EPRA.html

    (October 10, 1997).

     

    Nelson, Anthony, Expert One, Technical expert for computer fraud tracing and intrusion testing for internet firewalls and mission critical applications like SCADA, 1997.

     

    Nelson, Anthony, Expert One, Personal email on scenario and estimates, 1998.

     

    Nothdurft, William, E. Renewing America: Natural Resource Assets and State Economic Development, Council of State Planning Agencies, Washington D.C. 1984.

     

    President’s Commission on Critical Infrastructure Protection (PCCIP), http://www.pccip.gov/summary.html (October 23, 1997).

     

    Saaty, Thomas, L. The Analytic Hierarchy Process, McGraw-Hill Book Company, New York City, NY. 1980.

     

    Starr, C. "Risk Management, Assessment, and Acceptability." In Uncertainty in Risk Assessment, Risk Management, and Decision Making, edited by V.T. Covello, et al. Plenum Press, New York City, NY. 1987.

     

    White, Gregory B., Fisch, Eric, A., and Pooch, Udo, W. Computer System and Network Security, CRC Press, New York City, NY. 1996.

     

    Wiese, Ian , Hillebrand, Cary, and Ezell, Barry, "Scenarios One and Two: Source to No 1 PS to No 1 Tank to No 2 PS to No 2 tank (High level) for a Master-Slave SCADA System", SCADA Consultants, SCADA Mail List, scada@gospel.iinet.au (August 1997).

     

    Wiese, Ian, Expert Two, Technical expert in information technology, SCADA, and in both the oil and gas industry, and the water industry. He has extensive experience in the development of large information technology systems, including groundwater modeling, Monte Carlo simulation of water supply sources, pipe network analysis, water resource information systems, oil production recording, and oil well drilling information systems, 1997.

     

    Vaughan, Roger, J. and Pollard, Robert, Rebuilding America Volume One and Two: Planning and Managing Public Works in the 1980s, Council of State Planning Agencies, Washington D.C. 1984.

     

    APPENDIX A SURVEY

    This survey is also available on the internet at: http://watt.seas.virginia.edu/~bce4k/home.html. Fill free to fill out the survey at the web link above if you prefer. Or mail this survey to:

    Barry Ezell, 11 Tennis Dr, Charlottesville, VA 22901.

     

    A SURVEY

    ON SUPERVISORY CONTROL AND DATA ACQUISITION SYSTEMS FOR

    WATER SUPPLY AND ITS VULNERABILITY TO CYBER RISKS

    by:

    Barry C. Ezell

    Graduate Student, Systems Engineering Department, University of Virginia

    Advisor

    Professor Yacov Y. Haimes

    Lawrence R. Quarles Professor of Systems Engineering and Civil Engineering

    Director, Center for Risk Management of Engineering Systems

    University of Virginia

    Charlottesville, VA 22901

    August 25, 1997

     

    Background:

     

    I am doing my thesis on assessing the vulnerabilities of Supervisory control and data acquisition (SCADA) systems to cyber terrorist and network attacks. SCADA is a system that allows an operator to monitor and control processes that are distributed among various remote sites. The Center of Risk Management at the University of Virginia, along with industry and other universities are researching the vulnerabilities of the water resources infrastructure. A subset of that research is the cyber terrorist component which is my domain. My thesis is written to answer the following question:

     

    Are SCADA systems for water supply systems vulnerable to cyber terrorism in the near term (five years)? And if so, what is the nature of the threat?

     

    Purpose:

    The purpose of this survey is to gather information about the cyber threat. The ultimate goal of this research is to make our water system more survivable to a cyber attack. Note: All references to cities and people will be eliminated from the thesis. Also, data will be aggregated to protect cities.

     

    Scope:

    The survey is constructed to provide feedback with respect to risk assessment and management. We are interested in assessing the redundancies, robustness, and resiliency of current SCADA systems. In order to accomplish this, we are very interested in the following:

    1. Redundancy of the system. Redundancy refers to the ability of certain components of a system to assume functions of failed components without adversely affecting the performance of the system itself.

    2. Robustness of the system. Robustness refers to the degree of insensitivity of a system design to errors in the estimates of those parameters affecting design choice. Robustness or those properties that make the system less vulnerable to attack (stability).

    3. Resilience of the system. Resilience is the ability of a system to operate close to its closest possible design technically and institutionally over a short run after an attack, such that the losses are within manageable limits.

    If you have questions or comments regarding this work, please email me at bce4k@virginia.edu or, bcezell@aol.com. If you feel uncomfortable or unqualified answering a question, please fill free to simply leave blank. My phone number is 804 975 3525.

     

    Section One (administrative): Name: ___________________

    phone number: __________________ or email: _________________

    1. What city or county do you provide water resources?

    2. What is your position in the water utility organization?

    3. Please describe the scope of your SCADA operation (e.g., number of treatment plants, sensors, sewage, etc.).

    4. How many valves and pumps does your system control?

    5. How many Remote Terminal Units do you supervise/control with your SCADA system?

    6. Do you allow access to the internet for your operators?

    Yes, no

    7. Do you or your operators have access to email via an administrative LAN?

    Yes, no

    8. Is the Local Area Network accessible via remote connections?

    Yes, no

    9. Do you have the ability to control your system via a dial-up connection? (e.g., laptop, modem, and dial in to your server)

    Yes, no, unknown

    10. If question 9 does not adequately address your remote capabilities, please describe below how you accomplish remote access (e.g., intranet, Wide Area Network, etc.).

    11. What type of communication protocol do you use?

    Control Protocol/Internet Protocol (TCP/IP)

    other (please describe)

    12. Do you use:

    radio

    telephone leased line

    telephone party line

    combination

    ISDN dial-up

    ISDN dedicated

    other (please describe)

    13. What is the speed of your connections in bits per second (bps)?

    300 bps

    300-2400 bps

    4800 bps

    9600 bps

    14,400 bps

    28,000 bps

    56,000 bps

    64,000 bps

    128,000 bps

    other

    14. Please describe how data is sent from Remote Terminal Unit to the Master Terminal Unit.

    15. Which best describes your SCADA?

    Distributed control

    Master-slave control

    Other (please explain)

    16. Please provide any additional information about your system that you deem is important.

     

    Section Two (Survivability of the System):

    17. In your judgment, who do you see as your system’s primary concern from cyber attacks? Please rank 1-6 where one is the highest primary concern and 6 is the least:

    Hackers 1 2 3 4 5 6

    Spies 1 2 3 4 5 6

    Terrorists 1 2 3 4 5 6

    Corporate Raiders /Trusted Insiders 1 2 3 4 5 6

    Professional Criminal 1 2 3 4 5 6

    Vandals / Disgruntled employees 1 2 3 4 5 6

    none

    18. What do you believe is the ultimate objective of an attacker?

    Challenge or Status 1 2 3 4

    Political Gain 1 2 3 4

    Financial Gain 1 2 3 4

    Damage 1 2 3 4

    19. Indicate what tools you think a potential threat is most likely to use to attack your system. Please rank 1-6 where one is the highest primary concern and 6 is the least:

    User command 1 2 3 4 5 6

    To guess the password or enter a long string and telnet into system.

    Script or program 1 2 3 4 5 6

    At the User Command interface, attackers can make use of scripts or programs for the automation of commands. An example would be a "crack" program to determine passwords. Another example is a "Trojan Horse" program that is used to copy over an existing program. It performs like the program it replaced but also conducts other operations that the user is unaware about such as erasing files, logging passwords to a file, or corrupting data.

    Autonomous agent 1 2 3 4 5 6

    This is the most widely publicized of means of attacks. It is similar to a Trojan Horse. The difference is that an Autonomous Agent contains program logic to make an independent choice of what host to attack (e.g., the computer virus).

    Toolkit 1 2 3 4 5 6

    A grouping of scripts programs and autonomous agents into a GUI program (e.g., rootkit).

    Distributed tool 1 2 3 4 5 6

    A tool that attacks a host simultaneously from multiple hosts. Clock time can be used to synchronize the attack.

    Data Trap 1 2 3 4 5 6

    The exploitation of the electromagnetic field surrounding a computer. This field contains information about the computer. Namely, to reveal data in transit or on the terminal.

    HERF Attack: 1 2 3 4 5 6

    HERF: High Energy Radio Frequency Attack. The ability to emit a pulse from a device that could be hidden in a coke can in a garbage can that could destroy all electronic devices, but not damage the building or other structures.

    20. Please rank which vulnerability is greatest in your SCADA system:

    Design Vulnerability 1 2 3

    Configuration Vulnerability 1 2 3

    Implementation Vulnerability 1 2 3

    21. Do you believe the design, configuration, and implementation of your system is safe from:

    Unauthorized Access yes no

    Unauthorized Use yes no

    22. Which results from an attack on your SCADA system would have the greatest impact on your water resource system:

    Corruption of Information 1 2 3 4

    Disclosure of Information 1 2 3 4

    Theft of Service 1 2 3 4

    Denial of Service 1 2 3 4

    23. Who would you call in the event your SCADA system was tampered with? (check all that apply)

    CERT (Computer Emergency Response Team)

    outsourced security firm

    Police department

    FBI

    Other (please specify)

    24. If you experienced a computer system intrusion, indicate the type (check all that apply):

    manipulated data

    installed a sniffer program

    stolen password

    probing/scanning your system

    Trojan logons

    IP spoofing

    Introduced viruses

    denied use of service

    downloaded data

    compromised information security

    compromised email/documents

    publicized intrusions

    harassed personnel

    other (please specified)

    25. Do you have the capability to detect attempts to gain access to your system?

    Yes, no

    26. Have you detected any attempts to gain access to your system in the past year?

    Yes, no, unknown

    27. If yes, how many successful unauthorized attempts have you detected in the past 12 months?

    1-10, 11-20, 21-30, 31-40, 41-50, > 50

    28. How much time do you spend on ensuring your network is secure?

    None, < 10%, 10-20%, 21-30%, 31-40%, 41-50%, 51-60%, 61-70%, > 71%

     

    APPENDIX B EVENT TREES FROM EXPERT ONE

    Alternative One from Expert One

    Alternative Two from Expert One

     

    Alternative Three from Expert One

     

    Alternative Four from Expert One

     

     

    Alternative Five from Expert One