ware involved in the BlackEnergy [1] and Dragonfly [2] campaigns operated this .... 2 clock cycles. We now look at the i
Timing Performance Profiling of Substation Control Code for IED Malware Detection Julian L. Rrushi Department of Computer Science Western Washington University Bellingham, WA 98225
[email protected]
ABSTRACT We present a binary static analysis approach to detect intelligent electronic device (IED) malware based on the time requirements of electrical substations. We explore graph theory techniques to model the timing performance of an IED executable. Timing performance is subsequently used as a metric for IED malware detection. More specifically, we perform a series of steps to reduce a part of the IED malware detection problem into a classical problem of graph theory, namely finding single-source shortest paths on a weighted directed acyclic graph (DAG). Shortest paths represent execution flows that take the longest time to compute. Their clock cycles are examined to determine if they violate the real-time nature of substation monitoring and control, in which case IED malware detection is attained. We did this work with particular reference to implementations of protection and control algorithms that use the IEC 61850 standard for substation data representation and network communication. We tested our approach against IED exploits and malware, network scanning code, and numerous malware samples involved in recent ICS malware campaigns.
CCS CONCEPTS • Security and privacy → Malware and its mitigation; • Computer systems organization → Embedded and cyber-physical systems;
KEYWORDS Intelligent electronic devices, control system malware, graph theory, binary code analysis ACM Reference format: Julian L. Rrushi. 2017. Timing Performance Profiling of Substation Control Code for IED Malware Detection. In Proceedings of ACM Conference, Washington, DC, USA, July 2017 (Conference’17), 9 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn
1
INTRODUCTION
Because intelligent electronic devices (IED) are the machines that interact directly with the physical equipment of electrical substations, they are the ultimate target of industrial control system (ICS) Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Conference’17, July 2017, Washington, DC, USA © 2017 Copyright held by the owner/author(s). ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. https://doi.org/10.1145/nnnnnnn.nnnnnnn
malware that attack the electrical power grid. ICS malware may control IEDs over the network by compromising human-machine interface (HMI) machines, engineering servers, or any other machines with access to a substation network. Possible intermediate targets include general-purpose machines in the enterprise network of a power utility company with virtual private network (VPN) access to substation networks. Once landing on one of these machines, ICS malware use rogue client software to send commands to IEDs, similarly to the HMI software of a legitimate operator. The ICS malware involved in the BlackEnergy [1] and Dragonfly [2] campaigns operated this way. Some of the malware samples of Dragonfly did not initiate physical damage of substation equipment, but instead spied on Object Linking and Embedding (OLE) for Process Control (OPC) servers [3], which in turn commonly exchange data with IEDs through an IEC 61850 driver. While the network control of IEDs could enable ICS malware to open and close circuit breakers, and hence cut power to entire regions, its potential to lead to physical damage of substation equipment is more limited. This is because the protection and control algorithms that run on IEDs have a strong sense of safety. Some of those algorithms react to malware’s destructive actions by protecting substation equipment and stabilizing the power system. This is why ICS malware may push further and compromise IEDs. As discussed in [4], hacking into an IED enables ICS malware to affect substation equipment through its own code, as well as modify or even eliminate the IED’s protection and control algorithms. Since ICS malware would control all I/O on the compromised IED, code integrity based on hashing techniques is ineffective. We seek a differentiator between the code of protection and control algorithms and IED-bound malware code, which in this paper we simply refer to as IED malware. The time critical operation of IEDs is of paramount importance to an electrical substation. All faults must be sensed, processed, and addressed within a few milliseconds, which requires protection and control code to run in nearly real time. The substation computing and communication standard IEC 61850 specifies the tolerated delays for application recovery, as well as the required communication recovery times, for various protection schemes [5]. The limits on application recovery time range from 800 ms for supervisory control and data acquisition (SCADA) to 400 µs for Sampled Values, while the limits on communications recovery time range from 400 ms for SCADA to nearly 0 for Sampled Values [6]. Functional control applications on IEDs do not exceed the tolerated delays. Any violations of the timing requirements disturb the physics of substation equipment and result in physical damage.
Conference’17, July 2017, Washington, DC, USA Contribution. We present a binary static analysis approach, which leverages the real-time characteristics of the power monitoring and control computations conducted at an electrical substation to detect IED malware. We formulate the timing performance of IED binary code in a way that enables us to model it using graph theory techniques. The timing performance is then used as a metric for IED malware detection, which takes the form of finding the shortest path on a weighted directed acyclic graph (DAG). Our approach was initially guided, and subsequently evaluated, by analyzing the binary code of substation protection algorithms that use the IEC 61850 standard for substation data representation and network communication. Novelty. To the best of our knowledge, this is the first work to counter IED malware by profiling the timing performance of IED binary code and correlating it with the timing requirements of electrical substations. As of this writing, no other works have captured substation-relative timing deviations by IED or field device malware in general. While graph theory has been widely applied to solve various computer security problems, the graph model that we developed in this work is original in its ability to reduce a part of the IED malware detection problem into a classical problem of graph theory, namely finding single-source shortest paths on DAGs. Organization. The remaining of this paper is organized as follows. In Section 2 we provide background on ICS malware campaigns and attack code. Section 3 describes the graph-theoretical approach to analyzing the timing performance of IED binary code. Section 4 describes an empirical evaluation of our work against live ICS malware and test attack code in a testbed that resembles the industrial computer environment of an electrical substation. In Section 5 we discuss related work on anti-malware techniques for field devices. Section 5 also compares our approach with malware for field devices ethically developed by related academic works. Section 6 summarizes our findings and concludes the paper.
2
ICS MALWARE
The ICS malware used in BlackEnergy and Dragonfly targeted electrical substations at various ranges. They penetrated the networks of energy companies through e-mail spear phishing attacks supported by spam campaigns. Phishing e-mails contained malicious attachments, such Portable Document Format (PDF) containing embedded JavaScript code, or Microsoft Office file attachments that leveraged the macro functionality. Both resulted in code execution on the target computer. The attackers ran watering hole attacks by injecting an HTML iframe into several websites related to energy. The HTML iframe redirected visitors to another website, which the attackers had compromised as well. That other website in turn ran the LightsOut exploit kit, which executed exploits against a visitor’s browser or browser plugins to install the malware. In both cases, once inside the networks of the power company, the malware executed ICS specific code to search for and later access target ICS servers. Lastly, the attackers compromised the websites of three different ICS equipment providers, and subsequently trojanized the software bundles that were available for download. The installation of those software bundles can bring ICS malware directly onto the human-machine interface (HMI) machines, i.e. computers, engineering machines, and even on protective relays.
J. Rrushi Stuxnet was able to propagate over the network by exploiting a vulnerability in the Print Spooler service, and a vulnerability in the Server service [7]. The reader is referred to the Microsoft security bulletins MS10-061 and MS08-067, respectively, for a description of the root cause of those vulnerabilities. Stuxnet was also able to propagate through network shares. The ICS worm may travel a shorter range when the target industrial facility is predetermined and located ahead of time. Stuxnet could land on the networks of a target industrial facility right away if an insider purposely plugged a Stuxnet-infected removable flash drive into a Windows machine in those networks. Stuxnet modified the Siemens ICS engineering software SIMATIC WinCC/Step 7, more specifically the dynamiclink library s7otbxdx.dll of SIMATIC WinCC/Step 7, to inject attack MC7 code into target programmable logic controllers (PLCs). MC7 is the compiled assembly of code written in the Structured Text (ST) programming language. ST in turn is one of the five programming languages supported by the IEC 61131-3 standard to program PLCs. The MC7 code injection was done over the network via the S7comm protocol, i.e., a Siemens proprietary protocol used for PLC programming, data exchange between PLCs, and PLC data access from SCADA systems [8]. Stuxnet turned a compromised machine into a modified S7comm client and was mostly selective when searching for its targets, although unintended machines were compromised during the attacks. Although the ICS attack code may be inside the networks of an industrial facility, there is still way to go. A machine that the ICS attack code has just compromised may not be usable to attack or manipulate field devices. Stuxnet, for example, needed to compromise the so-called SIMATIC Field PGs, which are Windows machines used to program PLCs. Stuxnet spread within the networks of a target industrial facility. It hopped from machine to machine until landing on a SIMATIC Field PG, from whence it injected attack code into PLCs and hence sabotaged the physical processes they were controlling. Another ICS attack code, namely IronGate, uses attack techniques that are similar to those of Stuxnet [9]. IronGate operates at medium range, and searches for very specific target machines, as Stuxnet did. Nevertheless, IronGate was deemed to be simply a proof of concept or research activity, given that it is not associated with any attack campaigns or threat actors [9].
3 APPROACH 3.1 Performance Graph Derived from the control-flow graph. The binary code of an IED executable is initially viewed as a control-flow graph between sources and sinks. A control-flow graph is a graph that models all paths that might be traversed by a program during its execution [10]. A source is an instruction address in the code at which the executable reads data in input. These input data may originate in sensors that measure power parameters such as voltages, currents, and frequencies. They may also be delivered to the executable over the network as Pieces of Information for Communication (PICOMs), in which case they may be sensor measurements or data computed by other IEDs. For instance, a composite monitoring and control function is collaboratively implemented by two or more IEDs, which need to communicate with each-other over the network to exchange data and jointly exercise the function.
Timing Performance Profiling of Substation Control Code for IED Malware Detection
Conference’17, July 2017, Washington, DC, USA
Figure 1: From a control-flow graph to a performance graph. A sink in this context is an instruction address in the code at which the executable writes data. These output data may be actuator data to open or close a circuit breaker, or computed data that are to be processed by another IED. They may directly update the attributes of logical nodes in the switchgear group, such as virtual circuit breakers (XCBR) and virtual circuit switches (XSWI), or participate in computations that lead to the update of the attributes of those logical nodes. The control-flow graph is revised to remove cycles and thus become a DAG, while retaining the original functionality of the executable. The reader is referred to [11] for techniques to remove cycles from a control-flow graph, which at this point is comprised of code blocks that are connected with eachother in a directed way by conditional or unconditional jumps, as well as by function calls. An excerpt from an example control-flow graph is depicted on the right part of Figure 1. Its structure models execution time. Each of the aforementioned code blocks is represented by a vertex in the performance graph. A vertex is labeled by the address of the branch instruction in the corresponding code block. An edge goes from a vertex ν to a vertex υ if the code block of ν could eventually jump to the code block of υ, conditionally or unconditionally. The execution time of a given code block is modeled by the weight of the edge that lands onto its corresponding vertex. We determine the weight in question by considering the total amount of clock cycles consumed by the instructions of the code block at hand. The clock cycles per instruction depend on the processor of the IED. Table 1 recalls the number of clock cycles for the instructions of an Intel instruction set referenced in the code blocks of Figure 1. Those instructions generally take 1 or 2 clock cycles. We now look at the instructions per second (IPS) metric of the IED processor, as measured by Dhrystone computing benchmark programs under realistic workloads. Dhrystone reasons in terms of million of instructions per second (MIPS). Its measurements are cognizant of instruction pipelining. An Intel Core i7, for instance, has a MIPS of 4, 800, which is about 4, 800 instructions per microsecond. Counting an average of 1.5 clock cycles per instruction, the
processor of this example consumes approximately 7, 200 clock cycles per microsecond. A time threshold, which is based on tolerated delays as specified by IEC 61850, is converted into a clock cycles threshold. All paths of the performance graph from source vertices to sink vertices should not exceed the threshold for the IED. For SCADA, we choose an approximate time threshold of 1, 200 ms, which includes the limit of 800 ms on application recovery time, the limit of 400 ms on communications recovery, and an additional tolerance of 800 ms to minimize the risk of false positives and increase the confidence of detection. A time threshold of 2, 000 ms is approximately equivalent to 2000 ∗ 103 ∗ 7200 = 144 ∗ 108 clock cycles. When a probability of 1.0 corresponds to 144 ∗ 108 clock cycles, which is a way of stating that the executable is certainly malicious, then each clock cycle consumed by the IED processor makes a contribution of 1.0 = 0.00000000006944 to the cutoff likelihood for the exe144∗108 cutable to be deemed malicious. We refer to this probability contribution as contribution to maliciousness (CtM). The CtM of each instruction is the product of its clock cycles and the CtM of a clock cycle, i.e. #clock cycles ∗ 0.00000000006944. The executable is deemed malicious if an execution flow from a source to a sink consists of instructions with CtMs that amount to at least 1.0. Lower cumulative CtMs do not suffice to conclude that the executable under analysis is malicious. A similar calculation method applies to Sampled Values. The approximate time threshold is set to 800 µs, which includes the limit of 400 µs on application recovery time, and an additional tolerance of 400 µs. The threshold of 800 µs is approximately equivalent to 800 ∗ 7200 = 576 ∗ 104 clock cycles. The CtM of each clock cycle is 1.0 = 0.00000017361111, which can be used to calculate equal to 576∗10 4 the CtM of each instruction. As an illustration, Table 1 indicates the CtMs of the instructions referenced in the code blocks of Figure 1, for both SCADA and Sampled Values. Weights are expressed as negative log probabilities. So far, the weight of an edge has reached a CtM form. Let the CtM of a path be the sum of the CtMs of its edges. Since CtMs are probabilities,
Conference’17, July 2017, Washington, DC, USA J. Rrushi Table 1: Examples of latencies (taken from [12]) of some instructions of an Intel processor, and their respective CtM probabilities. Instruction
Clock Cycles
CtM for SCADA
CtM for Sampled Values
mov
1
0.00000000006944
0.00000017361111
cmp
2
0.00000000013888
0.00000034722222
jnz
1
0.00000000006944
0.00000017361111
jz
1
0.00000000006944
0.00000017361111
ret
2
0.00000000013888
0.00000034722222
dec
1
0.00000000006944
0.00000017361111
sub
1
0.00000000006944
0.00000017361111
test
1
0.00000000006944
0.00000017361111
jge
1
0.00000000006944
0.00000017361111
any paths from source vertices to sink vertices with CtM at least 1.0 leads to detection. In other words, the longest paths have potential to indicate whether or not the executable is malicious. Given that we need to formulate the IED malware detection problem as a shortest path problem on a DAG, we define weights as negative log probabilities. The weight of an edge is set to −ln CtM. The base of the logarithm is the Euler’s number e. Because a CtM is a value between 0.0 and 1.0, its logarithm is a negative value, except for 1.0. A positive value is obtained when we negate the logarithm of a CtM. The CtMs of the edges of the example performance graph of Figure 1 are converted into negative log probabilities in Table 2 for Sampled Values. The negative log probabilities for SCADA are calculated using the same techniques. The higher the CtM of an edge is, the lower its negative log probability becomes. With the negative log probability transformation in place, now it is the shortest path that represents an execution flow which could exceed the time thresholds, and hence lead to detection. For the purpose of illustration, most of the elements of the performance graph derived from the example control-flow graph of Figure 1 are depicted in Figure 1 as well, on the left part of it. An identical reasoning can be applied to other IED processors, such as ARM and PowerPC, to construct performance graphs for binary code analysis.
3.2
Malicious Flows as Shortest Paths
Edge relaxation is the key. We now apply the graph algorithms of [13] to compute shortest paths on performance graphs. For each vertex of the performance graph, we maintain two attributes, namely a shortest path estimate d, as well as an attribute π , which we call predecessor. The shortest path estimate of a vertex ν is an upper bound on the weight of the shortest path from a source vertex s to ν . Shortest path estimates are expressed as negative log probabilities, just like the weights of the edges of a performance graph. The predecessor of a vertex ν refers to another vertex that is known to be on the best known shortest path from a source vertex s to ν . The chain of predecessors originating at vertex ν runs backward along that specific shortest path. Algorithm 1 is the engine of the shortest path estimation. Of paramount importance is how we calculate the shortest path estimate of a vertex ν . Simply adding the weight of an edge υ − > ν
to the shortest path estimate of vertex υ would be incorrect due to both these quantities being negative log probabilities. To solve this issue, we rely on a fundamental property of logarithms, namely loд(x + y) ≈ loд(e loд(x ) + e loд(y ) ). υ.d is equal to the negative logarithm of some CtM, which in turn is equal to the product of a number of clock cycles, say k 1 , and the CtM of a clock cycle, say CtMcycl e . w (υ,ν ) is also the negative logarithm of some CtM, which in turn is similarly equal to the product of some number of clock cycles k 2 and CtMcycl e . Thus ν .d is equal to the negative logarithm of some CtM, which in turn is equal to (k 1 +k 2 )∗CtMcycl e and hence (k 1 ∗ CtMcycl e ) + (k 2 ∗ CtMcycl e ). Explicitly, ν .d = −[ln ((k 1 ∗CtMcycl e ) + (k 2 ∗CtMcycl e ))] ≈ −ln (e −υ .d +e −w (υ,ν ) ). A concrete example helps. Algorithm 1 uses these techniques to calculate all candidate shortest path estimates, and at the end chooses the smallest one. For example, let us calculate the shortest path estimate of vertex 0x4130d0 with reference to the example performance graph of Figure 1. There are two edges under consideration, namely (0x4130c8 − > 0x4130d0) and (0x4130c2 − > 0x4130d0). It is given that the shortest path estimate of vertex 0x4130c8 is 13.62, while w (0x4130c8, 0x4130d0) is 14.18. Computing −ln (e −13.62 +e −14.18 ) =⇒ −ln (0.000001215931632+ 0.000000694551169) yields approximately 13.17. Now the other candidate. It is given that the shortest path estimate of vertex 0x4130c2 is 14.18, while w (0x4130c2, 0x4130d0) is 14.18 as well. Computing −ln (e −14.18 + e −14.18 ) =⇒ −ln (0.000000694551169 + 0.000000694551169) yields approximately 13.49. Given that 13.17 < 13.49, we set the shortest path estimate of vertex 0x4130d0 to 13.17. Furthermore, we set the predecessor of vertex 0x4130d0 to 0x4130c8. Table 3 provides the shortest path estimates of the other vertices of the example performance graph of Figure 1 for Sampled Values. Calculating those shortest path estimates for SCADA follows the same technique. The only difference would be the CtM per clock cycle, which would have a different value. The shortest path from a source vertex to a sink vertex in a performance graph is computed by Algorithm 2. The vertices of the performance graph are initially sorted via the topological sort algorithm. The shortest path estimates d of those vertices, except the source vertex of course, are set to +∞, while their predecessor attributes π are set to nil. This
Timing Performance Profiling of Substation Control Code for IED Malware Detection Conference’17, July 2017, Washington, DC, USA Table 2: Negative log calculations of the weights of the edges of the graph of Figure 1 for Sampled Values. Edge
Clock Cycles
CtM
- Ln
s -> 0x4130c2
4
0.00000069444444
14.18
0x4130c2 - > 0x4130d0
4
0.00000069444444
14.18
0x4130c2 - > 0x4130c8
3
0.00000052083333
14.47
0x4130c8 -> 0x4130d7
3
0.00000052083333
14.47
0x4130c8 -> 0x4130d0
4
0.00000069444444
14.18
Algorithm 1: Algorithm to relax the edges of the performance graph of the binary code of an IED executable. Adopted from [13]. 1
2 3 4 5 6 7
Function Relax (υ,ν ,ω); Input : Vertices υ and ν , and weight function ω. Output : A possible update of the attributes d and π of vertex ν . if ν .d > −ln (e −υ .d + e −ω (υ,ν ) ) then ν .d = −ln (e −υ .d + e −ω (υ,ν ) ); ν .π = υ; else Leave ν .d and ν .π unchanged; end
is a way of saying that at this point we have no knowledge of the shortest path in question whatsoever. Now, relaxing all the edges of the performance graph as per Algorithm 2 leads to revising the predecessor attributes π of all vertices, including the sink vertex. Those predecessor attributes π , in turn, allow for walking backward from the sink vertex all the way to the source vertex along their shortest path. If the clock cycles of the execution flow along the shortest path at hand reach a CtM of at least 1.0, the executable is deemed to be malicious. The shortest path computation and the respective CtM check are done for every (source vertex, sink vertex) pair connected by paths. One single violation suffices to mark the executable as malicious. Algorithm 2: Algorithm to calculate the shortest path from a source vertex to a sink vertex, as in [13], in the performance graph of the binary code of an IED executable. 1
2 3 4 5 6 7 8
Function ShortestPath (DAG,ω,s); Input : Performance graph, weight function ω, and source vertex s. Output : Update the predecessor attributes of the vertices of the performance graph. Topological sort of the vertices of the performance graph; Initialize the vertices of the performance graph; for each vertex υ taken in order do for each vertex ν such that there is an edge from υ to ν do Relax (υ,ν ,ω); end end
Table 3: Shortest path parameters for the vertices of the graph of Figure 1 for Sampled Values.
4
Vertex
Clock Cycles
CtM
Estimate
s
0
0
0
0x4130c2
4
0.00000069444444
14.18
0x4130c8
7
0.00000121527777
13.62
0x4130d7
10
0.0000017361111
13.26
0x4130d0
11
0.00000190972221
13.17
EVALUATION
Testbed resembled an electrical substation. We created a computer environment that is similar to that of a real-world electrical substation. A part of the testbed is shown in Figure 2. The protective relays in the testbed were all real. Their main characteristics are summarized in Table 4. More specifically, the SEL-487E-3 is a protective relay that can monitor and protect a power transformer from electrical faults. It runs intelligent algorithms to detect various types of faults, and can take action in a timely manner by operating electrical circuit breakers and disconnect switches. The SEL-421-4 can perform industrial automation functions. It includes 32 programmable elements for local control, remote control, automation latching, and protection latching. A SEL-421-4 can also conduct various functions to protect overhead electrical transmission lines and underground cables. The SEL-3355 conducts multiple substation functions too. It has an integrated human-machine interface (HMI), with a local display port. In this work, the SEL-3355 was used as a real-time automation controller. The SEL-3355 polls the SEL-487E-3 and SEL-421-4 to collect substation data from them. The network communications between these ICSs take place over the IEC 61850 protocol. We integrated an engineering server into the testbed. The engineering server hosted an OPC server, which in turn ran an IEC 61850 protocol driver to get substation data from the protective relays. The IEC 61850 protocol driver subsequently stored those substation data in the OPC server as data items. We added an OPC client machine to the testbed. The OPC client application has a graphical user interface for a human operator to enter commands using a keyboard and mouse. All machines in the testbed were connected on a local area network. Effectiveness. Our approach was tested in the lab against emulated malware propagation and operation on a substation automation network. We emulated the vulnerability exploitation phase of an ICS malware attack, which involved code that in testing terms
Conference’17, July 2017, Washington, DC, USA
J. Rrushi
Figure 2: Some of the IEDs and other machines in our substation testbed. exploited several memory vulnerabilities with shellcode injection and heap spraying on the IEDs. The test exploits were preceded by a scanning with the nmap tool of a range of internet protocol (IP) addresses in the lab, and subsequently a scan of the network services that were running on the IEDs. The vulnerability exploitation on the IEDs was emulated under two different scenarios, namely the malware has no prior knowledge of the power system state, and the malware has managed to obtain full knowledge of the power system state prior to launching the exploits. We also emulated the malware installation phase, in which the exploit injects and runs a dropper on a compromised IED. As in traditional malware, the dropper is responsible for installing the ICS malware modules on the compromised IED. We emulated both a single-stage dropper and a dual-stage dropper. The single-stage dropper incorporated the emulated malware modules that it aimed at installing on the compromised IED, whereas the dual-stage dropper downloaded those modules over the network from another compromised machine. Experimentation included malware samples involved in the Dragonfly cyber espionage campaign. There are many versions of those malware. We obtained those malware samples from public research malware repositories, and thus used those versions in our experiments. The various versions of the Havex ICS plugin [14] are of particular interest to this work, since they all use OPC. All attack code exhibited visible deviations from the timing performance expected from any functional implementation of protection and control algorithms. Figure 3 shows the timing anomalies of network scanning tools, while the timing deviations of exploit
Figure 3: Timing anomalies in the performance graph of network scanning code. code are show in Figure 4. Some of the timing anomalies of test malware are given in Figure 5. Countering mimicry attacks. We assessed the possibility that an attacker could add dummy I/O operations to IED malware such as to create performance graphs that do not exceed time thresholds. Nevertheless, various forms of dummy I/O are easy to detect even via static analysis. Dummy I/O can be hidden via deceptive
Timing Performance Profiling of Substation Control Code for IED Malware Detection Conference’17, July 2017, Washington, DC, USA Table 4: The principal testbed machines and their main industrial communication protocols. Machine SEL-3555 SEL-487E-3 SEL-421-4 General-purpose Windows server General-purpose Windows client General-purpose Windows machine
ICS function Automation controller, HMI visualization, data concentrator Transformer protection relay Protection, automation, and control system OPC server OPC client Development and testing
ICS protocol IEC 61850 IEC 61850 IEC 61850 OPC, IEC 61850 OPC OPC, IEC 61850
Figure 4: Timing anomalies in the performance graph of exploit code.
Figure 5: Timing anomalies in the performance graph of test malware.
computations, however those computations take away clock cycles, which are otherwise needed by the malware instructions. The more computations are done to hide dummy I/O operations, the lesser room is left for malware code before reaching the time thresholds. Furthermore, we can leverage the fact that IED malware executables often travel through general-purpose machines until they land on field devices, as it was the case of BlackEnergy and Stuxnet. We can run our approach on the executables of general-purpose machines, especially those located in the enterprise networks of power utility companies. The goal is to intercept malicious executables en route to IEDs. This time we look for paths that do comply with the time thresholds of IEDs. We used our approach to analyze the executable files of University computers. Most of their performance graphs did not comply with any real-time models. Consequently, if we find that an unknown executable on a general-purpose machine has a performance graph that matches perfectly the time thresholds of IEDs, we can flag that executable as malicious. Limitations. Firstly, implementations of generator synchronization algorithms resulted to be challenging for our approach. These algorithms measure the parameters of the power system, as well as those of the power generator, and operate a circuit breaker when the two are in synchrony. The actions on the circuit breaker do not depend on time, but only on the states of the power system and generator. Our approach reported code of this kind as malicious,
while clearly it is not. Secondly, when analyzing executables that traverse a general-purpose machine, our approach had trouble with VoIP, chatting, instant messaging, and videoconference applications. While these programs have non-real-time features, such as those that pertain to software configuration, they also contain functions that operate mostly in real-time. Further work on these situations is needed to optimize our approach. Applicability. Although in this paper we referred to x86 assembly when illustrating the main ideas behind our approach, this work applies to MC7 and other assembly languages as well. They all have a control-flow graph that we can derive a performance graph from. Whether the industrial communication protocol is S7comm or IEC 61850/MMS [15], our approach works the same because sources and sinks can be tied to any industrial communication protocol. Efficiency. Given that our approach can be utilized as a static analysis tool, it can run on any general-purpose machines, including those that do not involve substation monitoring and control. Consequently, our approach imposes no overhead on IEDs and hence is safe to use.
5
RELATED WORK
Yoon et al. proposed Memory Heat Map (MHM) as a machine learning approach that discovers memory patterns in the operating system of a real-time embedded machine. The authors indicated that multiple attack scenarios, which included kernel rootkits and
Conference’17, July 2017, Washington, DC, USA shellcode, had memory usages that deviated from normal patterns. MHM is based on the premise that real-time embedded applications are of a predictable nature [16]. IEDs may not quite match that profile. Various protection functions are only exercised when the electrical substation requires a safety intervention. Those events make computations on an IED quite intense. Their occurrence and the memory usage they cause are unpredictable. IEDs that use IEC 61850 may have complex configurations, which create highly diverse memory usages. A human operator may also interact with an IED, which makes memory usage exhibit unpredictable patterns. Zimmer et al. use checkpoints based on timing bounds, which are preliminarily set on code sections of legitimate applications on real-time systems, to detect the exploitation phase. The authors’ work focuses on code injection exploits, and is proposed as an alternative to address-space layout randomization (ASLR) and canary approaches such as StackGuard, which the authors deem to be ineffective on real-time systems [17]. Nevertheless, data injection exploits, such as those based on return-oriented programming (ROP), are able to use existing application code and thus comply with checkpoints and their timing bounds. The authors’ work relies on code instrumentation and time validation checks, both of which add runtime overhead that breaks the timing of Sampled Values and possibly of SCADA in electrical substations. Power usage monitoring is another technique proposed to detect ICS malware. The idea is to characterize the power consumption of normal applications, and then look for fluctuations. This technique is based on the premise that ICS malware will have a power consumption that breaks pre-established patterns [18]. An IED, however, can be highly dynamic and variable in the operations that it runs, therefore its power consumption may vary in an unpredictable way. Several academic ICS malware are discussed in literature. Garcia et al. designed Harvey as a PLC rootkit. Harvey performs a manin-the-middle interception of sensor measurements and actuator commands. Harvey simulates internally the power system and thus is aware of its physics. Harvey does so for two reasons. Firstly, to change actuator commands such as to maximize the physical damage to the power system. Secondly, to change sensor measurements so that human operators receive data that they would expect to see based on the history of their interaction with the power system [19]. PLC-Blaster by Spenneberg et al. is a PLC worm that can modify I/O and render a compromised PLC dysfunctional by violating its cycle time limit [20]. On-the-fly modification of I/O can be done in a few clock cycles, which may not cause time thresholds to be exceeded. Nevertheless, our approach may tackle these PLC malware when they perform auxiliary computations. Harvey’s simulation of the power system may involve extensive computing, which is when our approach could detect the timing anomalies. PLC-Blaster communicates with a command and control server and also searches for targets over the network. These are all activities that break the time thresholds.
6
CONCLUSION
Time factors have clear potential to help with the differentiation between IED malware and the binary code of legitimate protection and control algorithms. Their ability as an anomaly detector
J. Rrushi stems from the tight coupling of protection and control algorithms with the physics of electrical substations. These algorithms are designed with substation safety in mind, which makes them absolutely compliant with the timing demands of electrical substations. IED malware, on the other hand, direct their computations towards violating substation safety, which makes their timing disconnected from the needs of electrical substations. In this work we intercepted those time discrepancies by transforming IED malware detection into a shortest path problem on DAGs, which can be solved efficiently and without adding any overhead to IEDs. While there is still road ahead as time factors have so much to offer to antimalware capabilities for IEDs, this work aimed at changing the rules of the game by challenging IED malware: operate as before and get detected on IEDs, or fix the timing and get detected en route to IEDs.
ACKNOWLEDGEMENTS This work was supported in part by the Defence R&D Canada, and in part by the U.S. Office of Naval Research through a DURIP grant with contract number N00014-15-1-2891. All views are those of the authors only.
REFERENCES [1] R. M. Lee, M. J. Assante, and T. Conway, "Analysis of the Cyber Attack on the Ukrainian Power Grid", defense use case white paper, March 2016, Available online at https://ics.sans.org/media/E-ISAC_SANS_Ukraine_DUC_5.pdf [2] Symantec, "Dragonfly: Cyberespionage Attacks Against Energy Suppliers", July 2014, Available online. [3] J. Lange, F. Iwanitz, and T. Burke, "OPC - From Data Access to Unified Architecture", 4th edition, VDE VERLAG GMBH, 2010. [4] M. Zeller, "Myth or Reality - Does the Aurora Vulnerability Pose a Risk to My Generator?", 64th Annual Conference for Protective Relay Engineers, College Station, Texas, April 2011. [5] International Electrotechnical Commission, "IEC 61850: Communication Networks and Systems in Substations", Technical Committee 57, 2004. [6] R. Hunt, and B. Popescu, "Comparison of PRP and HSR Networks for Protection and Control Applications", Western Protective Relay Conference, Spokane, WA, October 2015. [7] N. Falliere, L. O. Murchu, and E. Chien, "W32.Stuxnet Dossier", Symantec Security Response, version 1.4, (2011), Available online. [8] Siemens: What properties, advantages and special features does the S7 protocol offer? Available online at https://support.industry.siemens.com/cs/document/26483647/ what-properties-advantages-and-special-features-does-the-s7-protocol/ -offer-?dti=0&lc=en-WW [9] Homan, J., McBride, S., Caldwell, R.: IronGate ICS malware - Nothing to see here... Masking malicious activity on SCADA systems. FireEye threat research blog, (2016), Available online at https://www.fireeye.com/blog/threat-research/2016/ 06/irongate_ics_malware.html [10] F. E. Allen, "Control Flow Analysis", Symposium on Compiler Optimization, pp. 1–19, Urbana-Champaign, Illinois, July 1970. [11] A. F. Donaldson, L. Haller, D. Kroening, and P. Rummer, "Software Verification Using k-Induction", Lecture Notes in Computer Science, vol 6887, pp 351–368, Springer, Berlin, 2011. [12] A. Fog, "Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD and VIA CPUs", Technical University of Denmark, December 2016. [13] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, "Introduction to Algorithms", 3rd edition, MIT Press, July 2009. [14] D. Hentunen, and A. Tikkanen, "Havex Hunts for ICS/SCADA Systems", Available online at https://www.f-secure.com/weblog/archives/00002718.html [15] International Organization for Standardization, Technical Committee 184: Manufacturing Message Specification. Available online at https://www.iso.org [16] M. K. Yoon, S. Mohan, J. Choi, J. E. Kim, and L. Sha, "Memory Heat Map: Anomaly Detection in Real-Time Systems using Memory Behavior", Design Automation Conference, San Francisco, CA, USA, June 2015. [17] C. Zimmer, B. Bhat, F. Mueller, and S. Mohan, "Time-based Intrusion Detection in Cyber-Physical Systems", In ACM/IEEE International Conference on CyberPhysical Systems, pp. 109–118, Stockholm, Sweden, April 2010.
Timing Performance Profiling of Substation Control Code for IED Malware Detection [18] PFP security, "Analyzing Power Consumption to Detect Malware", SCADA Security Scientific Symposium, Miami, FL, USA, January 2017. [19] L. Garcia, F. Brasser, M. H. Cintuglu, A. R. Sadeghi, O. Mohammed, and S. A. Zonouz, "Hey, my malware knows physics! Attacking PLCs with physical model aware rootkit", Annual Network & Distributed System Security Symposium, San Diego, CA, USA, February 2017. [20] R. Spenneberg, M. Brüggemann, and H. Schwartke, "PLC-Blaster: A Worm Living Solely in the PLC", Black Hat, Las Vegas, Nevada, USA, July 2016.
Conference’17, July 2017, Washington, DC, USA