Optimal Strategy Selection for Moving Target Defense ... - IEEE Xplore

27 downloads 0 Views 1MB Size Report
target defense, optimal strategy selection for moving target defense based on Markov game is proposed to balance the hopping defensive revenue and network ...
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235

1

Optimal Strategy Selection for Moving Target Defense Based on Markov Game Lei Cheng, Ma Duo-He, Zhang Hong-Qi  Abstract—With the evolution of the research on network moving target defense, the selection of optimal strategy has become one of the key problems in current research. Directed to the problem of the improper defensive strategy selection caused by inaccurately characterizing the attack and defense game in moving target defense, optimal strategy selection for moving target defense based on Markov game is proposed to balance the hopping defensive revenue and network service quality. On the one hand, traditional matrix game structure often fail to describe moving target defense confrontation accurately. To deal with this inaccuracy moving target defense based on Markov game is constructed. Markov decision process is used to characterize the transition among network multi-states. Dynamic game is used to characterize the multi-phases of attack and defense in moving target defense circumstances. Besides, it converts all the attack and defense actions into the changes in attack surface or the ones in exploration surface, thus improving the universality of the proposed model. On the other hand, traditional models care little about defense cost in the process of optimal strategy selection. After comprehensively analyzing the impact of defense cost and defense benefit on the strategy selection, an optimal strategy selection algorithm is designed to prevent the deviation of the selected strategies from actual network conditions, thus ensuring the correctness of optimal strategy selection. Finally, the simulation and deduction of the proposed approach are given in case study so as to demonstrate the feasibility and effectiveness of the proposed strategy optimal selection approach. Index Terms—Moving Target Defense; Markov Game; Optimal Strategy Selection; Attack Surface; Exploration Surface

I. INTRODUCTION

W

ITH the evolution of network attacks, internet security, often challenged by such attacks as zero-day exploitation and Advanced Persistent Threats (APT), is now faced with a serious predicament of “easy to attack and hard to defense” [ 1 ]. On the one hand, with time advantage and Cheng Lei, born in 1989. Doctoral candidate in China National Digital Switching System Engineering & Technological Research Center. His main research interests include network security, secure net-flow exchange and moving target defense technique. (Email: [email protected]) Duo-he Ma , born in 1982, Doctor, research assistant in State Key Laboratory of Information Security, Institute of Information Engineering. His main research interest includes application security, moving target defense, cloud security. (Email:[email protected]) Hong-qi Zhang, born in 1962. Professor and PhD supervisor in China National Digital Switching System Engineering & Technological Research Center. His main research interests include network security and classification protection. (Email: [email protected])

information asymmetry, attackers can scan, collect and exploit the resource vulnerabilities of the targeted network systems for a long period to time. On the other hand, based on prior knowledge, existing network defense techniques, such as firewalls, intrusion detection and anti-virus, are often confined to cognitive limitations, and consequently lag behind network attack techniques. Generally speaking, the main problems are as follows: the existence of network vulnerabilities is inevitable since the security of network architecture designed is hard to prove. The deterministic and static nature of network architecture enables attackers with enough time to launch detections and attacks. What’s more, the isomorphism of network architecture equips attackers with the advantage of low attack cost. Once successfully implementing an attack, attackers can expand the scope of their attacks at a lower cost. Therefore, with network attacks gradually towards combination and automation, it is difficult for existing defense methods to cope with the increasingly complex network intrusion effectively, leading to the deterioration of the asymmetry in network attack and defense. To extricate defenders from the predicament, moving target defense (MTD) is proposed. It changes the attributes of network elements in a controlled manner for defenders, making the targeted network random, dynamic and heterogeneous. MTD, entirely destroying the dependency of the attack chain on the determinacy, the static state and the isomorphism of network environment, enhances the difficulty of attackers. Although existing researches have proposed numerous MTD techniques and implementation methods for different network security threats [2], a simple combination of different MTD techniques will tremendously increase the performance overhead of network systems [3]. Defense “at all costs” cannot be applied in actual conditions. Therefore, how to select the optimal defense strategy based on limited network resources to achieve the balance between network performance overhead and MTD defensive revenue has become one of the hotspots in current MTD researches. Game theory [4] is regarded as the prior decision analysis theory, which is to determine countermeasures from the aspect of measuring both pros and cons based on the analysis of confrontation situations. The goal of game theory is consistent with optimal hopping strategy selection of MTD in terms of defense cost and defense benefit. Besides, in the confrontation between offense and defense in MTD, 1) the purposes of attack and defense are opposite. Attackers attack targets by scanning and exploiting network resource vulnerabilities and network configuration attributes,

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 while defenders try to increase attack difficulties by shifting attack surface and expanding exploration surface. 2) The strategies of attack and defense are inter-dependent. The selection of attack and defense strategies depends not only on their own behaviors but also on the strategies chosen by their opponents. 3) Meanwhile, both offense and defense are non-cooperative. Attackers and defenders both want to maximize the strategic implementation by selecting optimal strategies. In summary, the features of purpose opposition, strategic inter-dependency and non-cooperative relationship in MTD attack-defense are highly compatible with the characteristics of game theory. Therefore, Game theory will play an important role in selecting MTD optimal defense strategy so as to achieve the balance between network security and performance overhead [5]. Motivated by the existing researches, optimal strategy selection for moving target defense based on Markov game (MG-MTD) is proposed so as to solve the shortcomings in current research. The main contributions are as follows: (1) In terms of model construction, the mono-phase or mono-state attack and defense game are extended to multi-state and multi-phase by combining Markov decision process with stochastic game theory, which makes the constructed game model more in accordance with the features of attack-defense in MTD. Besides, the attack and defense behaviors in MTD are converted to the shifting of attack surface and expansion of exploration surface based on our original work [ 6 ], thus ensuring the model universality. (2) In terms of optimal strategy selection, the criterion function and revenue function in MG-MTD take into account both defensive cost and benefit, which makes the optimal strategy selection more practical. Based on it, the nonlinear programming method to solve MG-MTD optimal strategies is given, thus enhancing the model availability. In section 2, the basic principle and hopping factors of MTD are expounded from the view of attack surface and exploration surface. The related concepts of game theory and the progress of related work are given. Section 3 analyzes the game categories of MTD. The MTD model based on Markov game is constructed. After that, the existence of optimal strategy of MG-MTD is analyzed, and the selection algorithm is designed. In section 4, a study case illustrates the effectiveness of the proposed model and the practical significance of the selected optimal defensive strategy. Finally, we conclude our work and future research directions in Section 5. II. BACKGROUND KNOWLEDGE AND RELATED WORK A. MTD fundamental principle and hopping factors Moving target defense is the development of moving target. In the report of Trustworthy Cyberspace: Strategic Plan for the Federal Cybersecurity Research and Development Program published by the Executive Office of the President, National Science and Technology Council, December 2011 [ 7 ], the moving target defense is defined as “It enables to create, analyze, evaluate, and deploy mechanisms and strategies that are diverse and that continually shift and change over time to increase

2

complexity and cost for attackers, limit the exposure of vulnerabilities and opportunities to attack, and increase system resiliency”. The basic architecture of moving target defense is shown in Figure 1, it keeps moving the resource vulnerabilities of the protected network systems through randomly shifting the configuration and the status of network components, such as IP address, port, and system fingerprints. In this way, the attack surface exposed and the exploration surface attackers need to explore appear chaotic and changeable over time. As a result, it can deceive and confuse the attackers’ reconnaissance, and the efforts of attackers to successfully launch attacks will remarkably increase. The fundamental principle is as follows:  Formulate network security policy and functional tasks, and initialize network resources;  Select hopping element and hopping period according to the pre-defined security policy, and the hopping implementation is configured by hopping configuration management;  Deploy the configured hopping scheme in the protected network system by hopping implementation;  Implement hopping scheme in legitimate users in the protected network after the hopping strategy received.  Feed the network security situation back to hopping triggering component by perceiving and analyzing the current network status in analysis engine;  Determine the hopping strategy in the next hopping period based on the current network security status in hopping triggering.

Fig.1 Moving target defense architecture

Since the selection of MTD hopping strategy will directly lead to the difference of attack surface and exploration surface, and the changes of attack surface and exploration surface can reflect the effectiveness of MTD defense, existing researches [1,8,9] use attack surface and exploration surface to depict the effectiveness of MTD hopping strategy. Definition 1: Attack Surface [8] (AS) is the set of network system’s properties that defenders need to protect so as to prevent being used for attacks at a certain time t. It consists of attack surface dimension (ASD) and attack surface value (ASV), t t which is AS (t )   ASDi ASVi . ASD consists of the available network resource set, such as ftp service, and network configuration attributes, such as IP address and port. ASD can be presented as ASDit ={asd1t ,asd 2t ,...,asd kt ,...} . ASV is the value of ASD at a certain time t, which can be presented as ASVi t ={asv1t ,asv1t ,..., asvlt } . In ASV, “0” means the corresponding ASD is not belonged to the protected network system at a certain time t. Definition 2: Exploration Surface [9](ES) is the set of network system’s properties that attackers need to explore in

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 order to find available network vulnerabilities to launch the attack at a certain time t. It consists of exploration surface dimension (ESD) and exploration surface value (ESV), which is ES (t )   ESDit ESVi t . ESD is the set of network resources attackers need to explore at a certain time t, which can be presented as ESDit ={esd1t ,esd 2t ,...,esd kt ,...} . ESV is the value range of ESD at a certain time t, which can be presented as ESVi t ={esv1t ,esv1t ,..., esvlt } . In ESV, “0” means the corresponding ES is not belonged to the protected network system at a certain time t. In AS and ES, the dimension represents the hopping element, and the value represents the hopping space of corresponding hopping elements, which is shown below in details. Apart from that, the properties of AS and ES are as follows: Property 1: Different network system configuration might have the same ASD (ESD), but the ASV (ESV) might be different. It can be presented as asd it  AS S (t )  AS S (t ),s.t. AS S (t ){asvit }  AS S (t ){asvit } ; 1

2

1

2

esd it  ES S (t )  ES S (t ),s.t. ES S (t ){esvit }  ES S (t ){esvit } . 1

2

1

2

Property 2: In the network system, the AS and ES will change with time. It can be presented as t  0 , AS S (t )  AS S (t +t ) ; ES S (t )  ESS (t +t ) . 1

2

1

2

On the other hand, studies show [9, 10] that MTD strategy is the combination of different hopping method, hopping elements and hopping period (Figure 2). It has a mapping relationship with the changes of attack surface and exploration surface. The time of AS and ES change is the hopping period; the choosing dimension of AS and ES is the hopping elements; and the way of AS and ES change is the hopping method. Therefore, the MTD strategy selection is equal to the changes of AS and ES.

Fig. 2 MTD hopping strategy and AS/ES change

(1) Hopping method Hopping method is the way MTD implement defense. In literature [ 11 ], exploration surface and attack surface are defined from the aspect of the attacker and the defender respectively. Exploration surface is regarded as the set of network system properties that the attacker need to explore in order to find available network vulnerabilities to launch attacks. It can be expanded by deploying honeypot [12], increasing the heterogeneous of protected network system [13], and so on. While attack surface is regarded as the set of network system properties that the defender need to protect so as to prevent being exposed. It can be shifted by changing network attributes, altering network configuration, and so on [14]. Hence, MTD hopping method can be divided into attack surface shifting, exploration surface expanding and the mixed strategy, as

3

illustrated in Figure 3.

Fig.3 Basic method of MTD hopping

Definition 3: Attack surface shifting is the network resource in network system S satisfying any one of the following two conditions at a certain time t: t1  t2 , asd {(asd  AS S (t1 ), asd  AS S (t2 ))  (asd  AS S (t1 ), asd  AS S (t 2 ))} , which means the ASD is changed; t1  t2 , asd {(asd  AS S (t1 )  AS S (t2 ))  (AS S (asvt )  AS S (asvt ))} , 1

2

which means the ASV is changed. Definition 4: Exploration surface enlarging is the network resource in network system S satisfying any one of the following two conditions at a certain time t: (1) t1  t2 , ESDt  ESDt , which means the ESD is enlarged; 1

2

(2) t1  t2 , ESDt =ESDt , ESVt  ESVt , which means the ESV is expanded. Therefore, MTD can improve the randomness of the protected network system by attack surface shifting, so as to improve the unpredictability of network resource vulnerabilities. Moreover, MTD can also improve the heterogeneousness of the protected network system by exploration surface expansion, thus expanding the moving space of the network resource vulnerabilities. Undoubtedly, MTD achieves the randomness, the dynamicness and the heterogeneousness of the protected network. (2) Hopping element Hopping element is the set of network resource changed in MTD hopping, i.e., the dimension of AS and ES. The selection of hopping elements can be divided into five categories according to the layer in network [15], which is data layer, application software layer, runtime environment layer, system platform layer and communication network layer. On the other hand, existing MTD hopping mechanisms can be divided into independent single element hopping and collaborative multi-element hopping according to the number and the layer of the hopping elements selected. Independent single element hopping [16] means there are only one hopping element selected during each hopping period. Collaborative multi-element hopping [17] means there are more than one hopping elements selected during each hopping period, and the hopping elements selected are mutually orthogonal. (3) Hopping period 1

2

1

2

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 Hopping period is the frequency of MTD hopping. On the one hand, if the hopping frequency is too low, the attacker has enough time to conduct reconnaissance and launch attacks, making MTD hopping meaningless. On the other hand, if hopping frequency is too high, the corresponding high network overhead will inevitably lead to the decrease of the service quality in the network. Current MTD hopping mechanisms are divided into fixed hopping period and varying hopping period. Fixed hopping period [14] means that the hopping period is pre-set parameters before implementing. It will not change during the implementation of MTD hopping. Varying hopping period [16] means the hopping period can be changed randomly according to the network security status change or pre-defined changing parameters. In summary, MTD hopping strategy selection is collaboratively associated with hopping elements, hopping method and hopping period. It can be reflected in the changes of attack surface and exploration surface. What’s more, the judgement of MTD strategy selection should be comprehensively analyzed from the defense cost and the defense benefit of the selected hopping elements, hopping method and hopping period. The strategy selection can be equivalent to the cost and the benefit of MTD in changing attack surface and exploration surface. B. Game theory concept Game theory [5] , as a mathematical tool, is used to describe and solve games. Game theory describes a game by specifying the players involved in the game, the possible actions of the players, the order in which the players take actions, and each player’s payoff after taking actions. Note that game theory is to select the best reward strategy in the pattern that players’ interests are mutually influenced under the rational player assumption. The so-called mutual influence means that any one of the players in the game will subject to the impact of other players’ action. The so-called rationality refers to the players involved in the game trying to take the most beneficial action on their own. Besides, since the players are interdependent, the rational selection of strategy must be based on the prediction of other players’ reaction. The basic elements of game theory are as follows: (1) Player: The entities involved in a game, who, usually regarded as rational, can interact with each other. It can be sub-divided into complete rationality and finite rationality. In the process of MTD attack-defense, the two sides can be considered as complete rationality [5, 18 ], which means attackers will not launch unprofitable attacks, while defenders will not defend at all costs. (2) Policy/Action: A player’s strategy is its plan of action that specifies which action to take based on prior knowledge of the action history. In the process of MTD attack-defense, the defender can either select existing passive defensive methods or select MTD strategies by combining different hopping elements, hopping methods and hopping period. (3) Action Sequence: It is the sequence of taking actions. When there are many independent players in the game, they can take actions at the same time so as to ensure fairness and

4

rationality. On the other hand, there is action sequence in the game, which should be defined by the game model. Even with the same set of policy, different action sequence can lead to different revenue. In the process of MTD hopping, it can be considered that the actions are simultaneously taken for both attack and defense sides. The reason is both sides will update strategies according to the state of the network in a certain period in the game [21,25]. (4) Payoff/Revenue: After all of the players have taken actions in the game, each of them will get either a negative or a positive return. It is the quantitative result of each player’s action. In MTD, both the attacker and the defender need to take the cost and the benefit into consideration. In game theory, the basic architecture can be determined after the four elements above being determined. Since every player in the game wants to maximize the benefit, there is an optimal strategy selection problem, which is the equilibrium problem in game theory in other words. Nash Equilibrium [4] is the combination of optimal policies or action of all players in the game. It depicts a stable state among players based on rational choice. That is, any one of the players cannot unilaterally change their strategies to increase its revenue. C. Related work Based on the analysis of section 1.1 and 1.2, to select appropriate hopping elements, hopping method and hopping period is of fundamental importance to keep the balance between network performance and MTD defense. Since MTD attack and defense confrontation is highly compatible with the characteristics of game theory, researches on MTD optimal hopping strategy selection based on game theory have experienced a tremendous growth. They are summarized as follows: Manadhata et.al [18] proposed an optimal attack surface transfer strategy selection based on complete and perfect information dynamic game. It formalizes MTD confrontation as a two-person stochastic game. The balance between MTD defense and network performance is obtained by achieving the equilibrium. Colbaugh et.al [19] analyzed the MTD defense strategy against the self-learning attackers. It is concluded that uniform randomization defense strategy is the optimal solution. However, the mono-phase game is hard to characterize the continuous change of MTD. To this aim, Zhu et.al [20] formalized MTD attack-defense as a multi-phased feedback game model. It analyzed the optimal strategy selection of MTD hopping based on the assumption of zero-sum game of MTD attack-defense. Carter et.al [ 21 ] analyzed the dynamic transformation of system platforms under different attack threat conditions. The result qualitatively shows that the maximization of the platform difference can effectively increase the difficulty of attacks, while simultaneously increasing the probability of vulnerability exploitation. It is concluded that the optimal MTD hopping strategy is to make the balance between network performance and defense benefit. Prakash et.al [22] analyzed the optimal hopping strategy based on game theory under 72 different network configurations, attack and defense cost and benefit. The result shows that

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 compared with proactive MTD, reactive MTD hopping triggering method has higher defense benefit, which, however, depends on the defender’s detection capability. Vadlamudi et.al [ 23 ] used Bayesian Steinberg game to formalize MTD attack-defense process specifically based on Web platforms. The experimental result confirms the effect of vulnerability importance and attackers’ sensitivity on game results. However, the policies of both attack and defense sides will lead to the change of network system states, which makes the attack-defense game in multi-state. Apart from that, the transferred next state is only related to the current state and the strategies of attackers and defenders with Markov properties. It can be concluded that mono-phase or mono-state game model is hard to describe the continuous transition of network system states and the multi-phases of attack-defense strategy selection according to network states. The limitation of mono-phase or mono-state hypothesis reduces the value and practicality of the research results. To this aim, Miehling et.al [24] proposed the optimal MTD strategy selection method based on Bayesian attack graph. Bayesian attack graph is used to describe the relationship among resource vulnerabilities attackers exploited, the observable attack behavior, and the network security state. Based on it, MTD confrontation process can be regarded as a partially observable Markov decision process, then, the optimal hopping strategy can be selected. However, the partially observable Markov decision process is only used in describing MTD attack and defense confrontation, while game theory is not adopted in optimal strategy selection. Valizadeh et.al [25] proposed an MTD game model based on Markov decision process. It compared single IP hopping with multi-target IP hopping mechanisms by using Markov game model. The result shows that multi-element selection can effectively improve the hopping defense benefit. However, the strategy construction process of those methods does not convert the attacker and defender’s exploitation of network resource vulnerabilities to attack surface and exploration surface changes, leading to a poor model universality. On the other hand, since the defense cost is not taken into consideration in [25], and the optimal strategy selection algorithm is not specifically given, it is difficult to make accurate optimal strategy selections in different specific conditions. From the above analysis, it shows that MTD optimal strategy selection based on game theory has made certain achievements. Unfortunately, some problems still exist as follows: (1) In terms of game model construction, mono-phase or mono-state game model is difficult to describe the multi-phases and multi-states of MTD hopping. In addition, the construction of game model based on specific MTD scenarios leads to a poor model universality. (2) In terms of the optimal strategy selection, since the defense cost is not taken into account in revenue function and criterion function, the selected strategies are hard to guide MTD hopping accurately. Besides, the specific optimal strategy selection algorithm is not given, which reduces the practicality of the proposed models.

5

III. MTD GAME MODEL CONSTRUCTION AND OPTIMAL STRATEGY SELECTION

A. The game categories of MTD confrontation The randomness, dynamicness, and heterogeneousness features of MTD lead to the multi-states transition in network. Combined with different game categories, the game theory categories of MTD attack-defense confrontation is analyzed as follows: (1) Non-cooperative: In the process of MTD confrontation, both the offensive and the defensive sides will not inform their strategies to each other in advance. The goal of the attacker is to scan the resource vulnerabilities in exploration surface so as to launch attacks. The goal of the defender is to avoid or reduce the resource vulnerabilities exposed in attack surface in order to improve the security of the protected network. Therefore, both the offensive and the defensive sides hope to select the optimal strategies to maximize the benefit of their own. (2) Dynamic: Since the change of MTD hopping method and hopping elements occurs at the beginning of different hopping period, MTD network confrontation can be regarded as a multi-phased dynamic event in discrete time sequence. In each phase, players will take appropriate strategies based on their previous experience and current network state. Since both the offensive and the defensive sides will gain different benefits after taking actions, they will in turn adjust their strategies based on the gained benefit and the observable network state. (3) Markov feature: In the process of MTD hopping, both attack and defense confrontation and network tasks will lead to the random transfer of network states. Meanwhile, the offensive and defensive strategies in the next hopping period are selected based on the transferred network state. Therefore, the network multi-states transition feature can be characterized by Markov decision process.

Fig. 4 The mapping relationship between MTD confrontation and game theory

From the above analysis, different hopping periods divide MTD hopping into different confrontation phases. On the one hand, the change of network states depends on the choice of both offensive and defensive strategies. On the other hand, the transition of network states will affect the selection of offensive and defensive strategy selection in turn. Therefore, MTD confrontation is in multi-phases and multi-states. Besides, the revenue matrix and network state in each confrontation phase are different. Since dynamic game model can depict the MTD confrontation in multi-phases, and Markov decision process can

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 depict the MTD confrontation in network multi-states transitions, the Markov game model is constructed by combining Markov decision process and stochastic game theory (Figure 4).

6 m

A strategy in the network state Si . It satisfies  Pi k  P A . pri A k

j

i 1

is the possibility of attackers to select strategy Pi A (0  j  m) j

m

A pri j  1 . Similarly, in network state Si , which satisfies  j 1

P D  {P1D , P2D ,

Pl D } represents the strategy set defender can

select including l types of strategies, where Pi D is the selected j

l

j D defensive strategy in network state Si . It satisfies  Pi  P .

D

j 1

i

Besides, pri D is the possibility of defenders to select strategy j

Pi

Dj

l

pri j  1 . (0  j  l ) in network state Si , which satisfies  j 1 D

(4) T ={P(a j | Si ) ,P(d j | a j ) ,P(Si | d j )} is the network state transition possibility. It can be divided into three types: a) P(a j | Si ) is the possibility of selecting attack strategy a j in network state Si . b) P(d j | a j ) means the possibility of selecting defending strategy d j after attacker uses strategy a j . c) Fig.5 Illustration of Markov game model based MTD

As illustrated in Figure 5, in a specific network system, the attacker has the rights to access endpoint A and C, and the defender deploys defensive mechanism in G, K, and I at the beginning. When the first hopping period expired, both the strategies of the attacker and the defender change, which leads to the change of attack surface and exploration surface. The network state transfers from S0 to S1. In network state S1, the attacker implements attacks to endpoint E through B instead of endpoint C, which gains the permission privileges. The defender deploys defensive mechanism to endpoint H instead of I at the same time. The selection of attack and defensive strategy changes the states of network step by step. After a finite number of attack and defense confrontations, network state is transferred to Sn, both offensive and defensive sides achieve equilibrium at endpoints C and F by adjusting the selection of strategy. B. The construction of MTD confrontation model based on Markov game The MTD confrontation model based on Markov game (MG-MTD) is constructed in details as follows: Definition 5: Moving target defense model based on Markov game (MG-MTD) consists of six-tuple ( N , S , P,T,R,U ) , which is: (1) N  {N A , N D } is the player set. MG-MTD assumes that all players are completely rational. Besides, there are only two players in MG-MTD, which are attacker N A and defender N D . (2) S  {S1 , S2 ,...,Sk } is network state set. Assume there are k different network states in total, each network state represents the network security state within a certain hopping period. The transition of network multi-states depends on the confrontation policies of attacker and defender, which are reflected in the change of attack surface and exploration surface. (3) P  {P A , P D } is the strategy set of MG-MTD. It consists of hopping method, hopping elements and hopping period. P A  {P1A , P2A , PmA } represents the strategy set attacker can select including m types of strategies, where Pi A is the selected attack j

P(Si | d j ) means the possibility of network state transition to Si

under the condition of using defensive strategy d j . Therefore,

T can be presented as S  P(a j | Si )  P(d j | a j )  P(Si | d j )  S . The transition of network state occurs in the alternate process of different confrontation phase. The alternate of the confrontation is based on the hopping period. Besides, network state transition possibility depends on both strategies of attacker and defender and network attributes, such as network configuration, endpoint system platform. (5) R  {RA , RD } is the revenue set of attacker and defender. It is determined by all players in the game, since both offensive and defensive sides need to take cost and benefit into consideration when selecting the strategy. According to section 1.1, the cost and benefit of attacker and defender can be converted to the change of attack surface and exploration surface, as shown in (1) and (2). F means the change of network feature. PC represents the performance cost of MTD hopping. ES is the change of exploration surface, and AS is the change of network attack surface. For the offensive side, it exploits the network resource vulnerabilities by exploring the exploration surface, which leads to the unavailability of network system function or the dramatic increase of network performance overhead. For the defensive side, it enlarges exploration surface or shift attack surface by selecting different hopping strategy, which leads to the improve of network system security and normal operation of network function. Hence, general sum game is used to describe the revenue of attacker and defender in the game.

RD ( S , Pi A , Pi D )  F  PC  ES  AS

(1)

RA ( S , Pi A , Pi D )  PC  AS  ES

(2)

(6) U is the criterion function, which is used to judge the selected strategies of both offensive and defensive side. Commonly used criterion functions [18] are discount expected

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 revenue criterion function and average expected revenue criterion function. In the process of MTD confrontation, since the value of network system information is time dependent, discount expected revenue criterion function is used as the criterion function in MG-MTD, as shown in (3).  is the discount rate, indicating that the benefit in the future is not treated as equal with the benefit at present.  T (S , P A , P D , S )U S ' is the discount revenue in the future if S

the attacker selects strategy P A ,and the defender selects strategy P D .

7 k

R = sup [ Rin (Pi , Pin )   t  T ( Si , Pi , Pin , S j ) R nj (P j , P jn )] Pi S S n

A

D

i

i

(4)

j 1

From the above analysis, since set 1A  1D ... kA  kD and

I1A  I1D ... I kA  I kD are closed set, there is {R },{R },{P } , which satisfies R  R, R  R , P  P , and P   (R, P) . It can be concluded by (4), the strategy revenue vector set R and R in satisfies the following constraints: k

n n n t n n n  R i  Ri (P i , Pi )    T ( Si , P i , Pi , S j ) R j (P j , P j ) ; j 1

U S ( P , P )  RS ( P , P )    T ( S , P , P , S )U S ' A

D

A

D

A

D

S

(3)

Based on the MG-MTD model, the existence of optimal strategy is analyzed in section 2.3, and the selection algorithm is designed in section 2.4. C. The existence analysis of optimal strategy Since MG-MTD model is a multi-phased and stated matrix game model, when the network is in a certain state, MG-MTD can be regarded as a finite matrix game. Meanwhile, because the network state and policy set are finite, MG-MTD can be regarded as finite Markov game model. By referring to the idea in [26], Theorem 1 proves the existence of optimal strategy of MG-MTD. Theorem 1. If S and P set are the finite set, MG-MTD  ( N , S , P,T,R,U ) exists Nash equilibrium in mixed strategy. Proof: Indicated by (3), the total revenue from network state Si of each player n  N in the game can be presented as: G  R (P )   n i

n i

n i

k

 T ( S , P, S ) R j

i

n i

(P )  n j

j 1

2

k



T ( Si , P, S j )T ( S j , P, Su )Run (Pun )  …

j 1,u 1

Let the nonempty convex compact set of linear normed spaces be I  [ x, x],( x  0) . Because offensive and defensive strategy vectors P A and P D are for the k-dimensional respectively. Let strategy vector P and P be 2k-dimensional respectively, which satisfies P, P  1A  1D ... kA  kD . In any A D network state Si , the policy space of each player is Si  Si .

Similarly, the offensive and defensive revenue vectors R A and R D are for the k-dimensional respectively. Revenue vector R and R  are 2k-dimensional respectively, which satisfies R, R   I1A  I1D ... I kA  I kD . A D A D A D Based on it, K  I1  I1 ... I k  I k  Si  Si can be defined.

Since 1A  1D ... kA  kD and I1A  I1D ... I kA  I kD are closed set, set K is a compact convex set with locally convex spaces. Define a mapping relationship  (R, P( K ))  (R,  (R, P)) in set K, then the strategy benefit in its function range satisfies the following conditions:

n n n t  R i   Ri (Pi , Pi )  

k

 T ( S , P , P , S ) R (P , P i

i

n i

n j

j

j

n j

).

j 1

The mapping relationship of strategy set is P (  (R , P ))  P . The function range of  ( ) in set K is upper half continuous. Meanwhile, the analysis of theory Ky-Fan [27] indicates that there is a revenue vector and strategy vector R, P   (R, P) satisfying: k

R in*  Rin (Pi , Pin )   t  T ( Si , Pi , Pin , S j ) R nj * (P j , P jn )

(5)

j 1

k

n n n t n n n Let Pi =Ri (Pi , Pi )    T (Si , Pi , Pi , S j ) R j (P j , P j )] , for j 1

any network state Si in MTD hopping, there is Pin  R in* . Iterate k

n n n t n n n (5) and Pi =Ri (Pi , Pi )    T (Si , Pi , Pi , S j ) R j (P j , P j )] , j 1

and it can be concluded that R in*  G n ( Si , P n , P n ) . Hence, when R in* =G (PiA* , PiD* ) establishes, equality is established in (5), and

there is G (PiA* , PiD* )  G(Pin , Pin ) . In summary, optimal strategy exists in MG-MTD. Because of the existence of optimal strategy in MG-MTD, when the network state is in Si , the policy set of offensive and defensive sides are {Pi A } and {Pi D } respectively, the necessary and sufficient conditions for strategy (Pi A* , Pi D* ) being optimal strategy are as follows:  Pi A  {Pi A },s.t .RA ( S , Pi A* , Pi D* )  RA ( S , Pi A , Pi D* ) ;  Pi D  {Pi D },s.t .RD ( S , Pi A* , Pi D* )  RD ( S , Pi A* , Pi D ) . The criterion function shows that the state was affected by revenue function in each subgame influenced by the past actions in MG-MTD. It can be known from the literature [28], that if either player takes Markov strategy, the other player will also take optimal Markov strategy. The optimal strategy of MG-MTD refers to the Markov strategy combination, which achieves Nash equilibrium in each subgame. In other words, for every player, if optimal strategy is {Pi n* } , the following condition should be satisfied with: k

t  Z 0 , U SP  RS ( P n* )   t  TS ( P, Si )U SPi n*

t

t

i 1

t

n*

(6)

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 Therefore, Nash equilibrium strategy exists in MG-MTD, which satisfies (6). D. The design of optimal strategy selection algorithm Since the solving of MG-MTD is PSPACE problem [29], Directly using Shapley algorithm [30] has high computational complexity. Therefore, the optimal strategy selection of MG-MTD can be equally converted to the problem of solving the optimal value of nonlinear programming. Theorem 2 proves the conversion equivalence property. For a given MG-MTD model, if the deterministic and stable * Markov strategy Pf is its optimal strategy, and the corresponding stable revenue U * is its optimal revenue, the optimal strategy and revenue solving of MG-MTD model can be transformed to the nonlinear programming second (NLP2) of Pf * and U * where Pf  {Pf ( Pi n )|n  N , Si  S , Pi n  P n } , U  {U in |n  N , Si  S} .

 U  R (Pf )   S Tt (Si , Pf , St ) Objective function: min n N S S n i

n i

t

i

n n n n Tt ( Si , Pf , St )U tn ;  n  N , Si  S , Pi  P , s.t. U i  Ri ( Pf )    St



Pi n P n

process, which means constraint condition (1) establishing. On the other hand, for each player in the game, its optimal stable n n n strategy satisfying (7). According to Ui =Ri (Pf )   Tt (Si , Pf , St )U t , St

the objective function value of nonlinear programming is 0. * Therefore, Pf and U * are the optimal value of NLP2. Based on the analysis above, optimal strategy selection algorithm of MG-MTD is designed. The time complexity of this algorithm is mainly in step 5, which is O(k 2 (m  l ) 2 ) . Besides, the storage consumption mainly concentrates on the storage of intermediate results during the solving of optimal strategy and revenue, which is O(k 2 ml ) . The details are as follows: Input: Multi-layer network resource graph Output: Optimal defensive strategy BEGIN 1. Initialize parameters in MG-MTD; //The space of network state is S  {S1 , S2 ,...,Sk } ,  is discount rate 2. Construct offensive and defensive strategy set P A  {P1A , P2A , and P  {P , P , Pl } ; 3. Obtain network D

D 1

D 2

PmA }

D

state

transition

possibility

T ={P(a j | Si ) ,P(d j | a j ) ,P(Si | d j )} ;

Constraint conditions:

 n  N , Si  S ,

8

Pf ( Pi )  1 ; n

4. Obtain revenue RA and RD of selected strategies of attacker and A D defender {Pi , Pj } ;

Tt ( Si , Pf , St ) ;  Uin  Rin (Pf )    5. Construct objective function min n N S S S i

n  n  N , Si  S , Pi n  P n , Pf ( Pi )  0 . Theorem 2. The necessary and sufficient condition of stable * Markov strategy Pf and its corresponding stable revenue U * are the optimal strategy and optimal revenue of * ( N , S , P,T,R,U ) is Pf and U * being the optimal value of

nonlinear programming satisfying the constraint conditions (1) – (3). * Proof: Sufficiency: If Pf and U * is the optimal value of n n NLP2, U i  Ri ( Pf )   Tt (Si , Pf , St )=0 can be proved by St

Theorem 1. It can be concluded from the constraint that for every player n  N , its optimal strategy and optimal revenue should satisfy constraint (1). Besides, the optimal strategy should satisfy the following condition:

6. Let

St

(7)

Therefore, for each player n, when the strategies of other n players are selected, strategy Pf must obey the optimal strategy * of Markov process. In other words, Pf is the optimal strategy

and U * is the corresponding revenue. * Necessity: If Pf and U * exist under the condition of discount rate  , it can be concluded by the stable of Markov

* strategy Pf , the constraint conditions (2) and (3) of NLP2

establish. Since each player obeys Markov decision process with discount rate  under the condition of optimal strategy, n  N , if the strategies of other players are determined, the strategy of player n is the optimal strategy in Markov decision

nN Si S

St

t

,

calculate

optimal

strategy according to constraint n n n n n ( ) 1 P P  U  R ( P )   T ( S , P , S ) U   t t t , f i i i f f condition i , Pf ( Pi )  0 ; S t

Pi n P n

7. Output optimal strategy of MG-MTD; END TABLE I COMPARISON OF EXISTING MTD GAME MODEL Game theory Dynamicness Unioptimal Literature type versality solving Complete mono-phase Good Simple information [18] dynamic mono-state game Multi-phased Mono-state Poor Detailed [20]

[25]

Si  S , U in =Rin ( Pf )+  Tt ( Si , Pf , St )U tn

min   U in  Rin ( Pf )    Tt ( Si , Pf , St )  0

Our paper

game

multi-phases

Markov game Markov game

Multi-states multi-phases Multi-states multi-phases

Application condition MTD

MTD

Poor

Simple

IP hopping

Good

Detailed

MTD

The comparison among existing MTD game models is shown in Table I. Since MTD confrontation is in multi-phases and multi-states, the approach in [20] uses multi-phased game, which can describe the MTD hopping process better than the approach in [18]. However, neither [18] nor [20] can describe the network multi-states transition caused by offensive and defensive strategies, in which the models are based on the assumption of mono-state. The approach proposed in [25] uses Markov game to describe MTD confrontation process for the first time. The disadvantages of [25] are that the usage of attacker and defender to network resources is not converted to the change of attack surface and exploration surface, leading to the poor universality. Other than that, the hopping cost is not

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 taken into consideration, and the detailed optimal strategy selection method is not given, leading to unavailability in guiding MTD hopping accurately in any specific condition. In comparison, in terms of the model construction, our paper uses Markov dynamic model to describe the non-cooperative, multi-phases, and multi-states in MTD confrontations. Besides, we convert the usage to network resources of attacker and defender to the change of attack surface and exploration surface, thus improving the universality of proposed game model. On the other hand, in terms of optimal strategy selection, by comprehensively analyzed the effect to hopping strategy selection of hopping cost and benefit, the optimal strategy selection algorithm is designed, which makes MG-MTD can be applicable to different specific conditions.

A. Case study of MG-MTD The case study is used to validate the effectiveness of the proposed MG-MTD and the designed optimal strategy selection algorithm. The typical network topology is shown in Figure 6 [6,25]. There are four hosts H1 , H 2 , H 3 and H 4 in the network, and their basic configuration information is shown in Table II. The connectivity which are limited as shown in Table Ⅲ by configuring the access control policy. Besides, Table Ⅳ illustrates the network resource vulnerabilities obtained by Nessus scanner. Assume the attacker has Root access in Attack Host, which is the starting point of network attack. The attack goal is to get the important information in Linux database server.

Fig. 6 Experimental topology TABLE II NETWORK HOST CONFIGURATION Host System information H1: internet server Windows NT 4.0 H2: intra-domain server Windows 2000 SP1 H3: client Windows XP Pro SP2 H4: Linux database Red Hat 7.0 TABLE Ⅲ FIREWALL POLICY Host Attacker H1 H2 H3 H4

Attacker H1

local —

IIS Local

— All

— All

H2





All

H3



IIS

Loca l All

H4





All

No. A B C

IV. CASE STUDY

9

Loca l All

— Squid LICQ Squid LICQ — Local

TABLE Ⅳ NETWORK RESOURCE VULNERABILITIES Host resource port vulnerability H1 IIS service 80 IIS buffer overflow H1 ftp 21 ftp rhost overwrite H2 ssh 22 ssh buffer overflow

D

H2

rsh

514

rsh login

E

H3

Netbois-ssn

139

Netbios-ssn nullsession

F

H4

LICQ

5190

LICQ remote-to-user

G

H4

Squid proxy

80

Squid port scan

H

H4

Mysql DB

3306

local-setuid-bof

The MG-MTD model is constructed and the optimal strategy is selected as follows: (1) Initializing parameters The network state set consists of S  {S1 , S2 ,..., S9 } . S1 is the initial network state; S 2 is the network state where the attacker gain user privileges by exploiting vulnerabilities in H1 ; S3 is the network state where the attacker gain root privileges by exploiting vulnerabilities in H1 ; S 4 is the network state where the attacker gets access to intra-domain server by exploiting vulnerabilities in H 2 ; S5 is the network state where the attacker gains user privileges of intra-domain server by exploiting vulnerabilities in H 2 ; S6 is the network state where the attacker gains root privileges of client by exploiting vulnerabilities in H 3 ; S7 is the network state where the attacker gets access to Linux database by exploiting vulnerabilities in H 4 ; S8 is the network state where the attacker gains root privileges of Linux database by exploiting vulnerabilities in H 4 ; S9 is the network state where the attacker gains root privileges of Linux database by exploiting vulnerabilities in H 4 . Besides, the discount rate in MG-MTD is  =0.7 [25]. (2) Constructing strategy space, obtaining network state transition possibility and offensive and defensive revenue matrix

TABLE Ⅴ OFFENSIVE AND DEFENSIVE STRATEGY

S2

S3

P {Overflow attack, Data destroy,Privilige gaining}

P3A {Privilige gaining, Injection, Non}

S1 A

P1 {Overflow attack, Data destroy,Non}

P {Patch upgrade, ASD1  time, ASD1} D 1

A 2

D 2

P {Patch upgrade, ASD1 +ASD3 , ASD1 }

S4 A 4

P {Data theft,scanning,Non}

P3D {Patch upgrade, ASD3 , ASD3 +time } S6

S5 A 5

P {Data destroy,Information theft,Non}

A 6

P {Scanning,Injection,Non}

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235

10

P5D {ESD1  ASD4 , ESD1 , ASD4 }

P4D {Data deleted,Service close, ASD1 +ASD3 } S7

P6D { ASD2 , Install monitor,ASD5 } S9

S8 A 8

A 7

A 9

P {Overflow attack,Priviliage gaining,Non}

P {Scanning,Non,Non}

P {Injection,Non,Non} P9D {Install monitor, ESD2  ASD5 , Non}

D 8

D 7

P {Patch upgrade, Non, Non}

P {Install monitor,Service close, Non}

TABLE Ⅵ THE POSSIBILITY OF NETWORK STATE TRANSITION S1

S2

S3

D1 1

T ( S1 , P1 , P , S2 )  0.33

T ( S2 , P , P2D , S4 )  0.8

T ( S3 , P3A , P3D , S1 )  0.85

T ( S1 , P1 , P , S3 )  0.45

T ( S2 , P , P , S6 )  0.11

T ( S3 , P , P , S9 )  0.23

T ( S1 , P1A , P1D , S1 )  0.7

T ( S2 , P2A , P2D , S7 )  0.4

A1

A2

D2 1

2

3

A1 2

A2 2

1

D2 2

3

3

1

A2 3

D2 3

3

S4

S5

S6

T ( S4 , P4A , P4D , S5 )  0.9

T ( S5 , P5A , P5D , S6 )  0.9

T ( S6 , P6A , P6D , S5 )  0.37

T ( S5 , P , P , S3 )  0.85

T ( S6 , P , P , S9 )  0.92

1

1

1

A2 5

1

D2 5

1

1

A3 6

D3 6

S7

S8

S9

T ( S7 , P , P , S8 )  0.6

D1 8

T ( S8 , P , P , S2 )  0.95

T ( S9 , P , P9D , S8 )  0.52

T ( S8 , P8A , P8D , S8 )  0.88

T ( S9 , P9A , P9D , S9 )  0.2

A1 7

D1 7

A1 8

2

2

A1 9

1

1

1

TABLE Ⅶ NETWORK OFFENSIVE AND DEFENSIVE REVENUE MATRIX

S2

S1 Offensive revenue 15 20 17  15 20 15     0 0 0 

Defensive revenue  -15 -20 -17   -15 -20 -15     0 12 5 

Offensive revenue  20 72 37  10 50 30     20 40 20 

Defensive revenue  -15 -23 -50   -44 -25 -53   0 10   0

Offensive revenue 80 46 35 80 46 35    0 0 0 

Defensive revenue  -15 -36 -36   0 0 0    0 0 0 

Offensive revenue  22 45 45  5 20 15     0 0 0 

S4 Offensive revenue 15 23 50   44 25 53    0 0 0 

Offensive revenue 15 20 33 10 15 30     0 0 0 

Defensive revenue  -80 -46 -35  -80 -46 -35    12 10 4 

Offensive revenue  25 17 8   43 35 8     0 0 0 

Defensive revenue  -22 -45 -45  -5 -20 -15    0 0 0 

Offensive revenue  29 33 5   0 0 0    0 0 0 

S5

S7 Offensive revenue 15 36 36  0 0 0    0 0 0 

S3 Defensive revenue  -20 -72 -37   -10 -50 -30     -20 -40 -20 

S6

position, and space is 212. On the other hand, ESD  {ESD1 , ESD2 } means that the hopping method is to enlarge exploration surface. ESD1 ={fingerprint, 256} means that selected ESD is system fingerprint, whose value range is 256. ESD2 ={data storage, 216 } means that the selected ESD is data storage position, whose value range is 216. The hopping triggering method is proactive by default, whose hopping period

Defensive revenue  -25 -17 -8  -43 -35 -8   0 7   9

S9

S8

Table Ⅴ shows the offensive and defensive strategy in each ASD  {ASD1 , ASD2 , ASD3 , ASD4 , ASD5 } network state. means that the hopping method is attack surface shift. ASD1  {IP, C class} means the selected ASD is IP address, and the value can be selected from C class IP address. ASD2  {port, 64512} means the ASD is port information, and the value range is 64512. ASD3  {protocol,5} represents that the ASD is protocol type, and the value range is 5. ASD4  {fingerprint, 128} means that the ASD is system fingerprint, and the value range is 128. 12 ASD5  {data storage, 2 } means that ASD is data storage

Defensive revenue  -15 -20 -33  -10 -15 -30     0 4 11 

Defensive revenue  -29 -33 -5  0 10 0     0 10 0 

is fixed. Besides, ASDi  time and ESDi  time means that the hopping triggering method is reactive with varying hopping period. Offensive and defensive revenue matrix is shown in Table Ⅴ. At the same time, the network multi-states transition is given based on multi-layer network resource graph, which is shown in Figure 7, in which attackers are presented by dot, and defenders are presented by triangle. Besides, the network multi-states transition possibilities are given in Table Ⅵ and the offensive and defensive revenue matrix are given in Table Ⅵ. The effectiveness evaluation of offensive and defensive strategy is obtained by using the method used in literature [31], and the quantitative revenue matrix is calculated by our formal research, so as to guarantee the uniform of quantitative results.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235

11

concluded that the attacker mainly selects strategies {P1A , P1A } , 1

2

and the strategies selected by the defender are {P1D2 , P1D3 } . Although strategy P1D2 uses varying hopping period, P1D3 uses fix hopping period, both the defensive benefit and cost of using strategy P1D are higher than those in strategy P1D . After 3

2

comprehensively analyzing the offensive and defensive confrontation, defender prefers to select strategy P1D3 to defend under the condition that the attacker might select strategy “not attack”. Besides, when the defender select strategy P1D to defend, the network state maintains to the current one with possibility 0.7, which is T ( S1 , P1A2 , P1D3 , S1 )  0.7 . However, 3

Fig.7 Network multi-states transition relationship

(3) Calculating and selecting the optimal strategy As shown in Theorem 1, optimal strategy exists in MG-MTD. According to Theorem 2, the optimal strategy selection can be equally transformed into the optimal value of NLP2. Function fmincon in Matlab is used to solve the NLP2 problem, and the optimal strategy and its corresponding revenue are shown in Table Ⅷ. Network state

S1 S2 S3 S4 S5 S6

S7 S8

S9

TABLE Ⅷ NETWORK STRATEGY REVENUE Offensive Defensive Offensive strategy strategy revenue [0.59, 0.3, 0.11] [0.06, 0.42, 0.52] 107.24

Defensive revenue -203.72

when the network state is S3 , the attacker mainly selects strategies {P3A1 , P3A2 } , and all the strategies of the defender are MTD hopping. Similar to the defensive strategy P1D2 in S1 , strategy P3D3 in S3 also deploys varying hopping period. The defensive cost increases 7 than that in strategy P3D2 , whose hopping period is fixed. On the other hand, since the ASV space of P3D2 and P3D3 is only 5, varying hopping period can greatly

[0.38, 0.3, 0.32]

[0.05, 0.86, 0.09]

101.19

-237.27

increase the defensive benefit. Defender prefers to select strategy P3D3 to defend in network state S3 . What’s more, when

[0.6, 0.4, 0]

[0, 0.22, 0.78]

79.46

-143.06

strategy P3D3 is selected in network state S3 , network state goes

[0.99, 0.01, 0]

[0, 0.3, 0.7]

95.62

-179.33

[0.5, 0.5, 0]

[0.71, 0.21, 0.08]

84.23

-153.15

back

[0.91, 0.06, 0.03] [1, 0, 0]

[0.87, 0.11, 0.02] [0.09, 0.46, 0.45]

88.03 186.78

-112.89 -87.98

[0.3, 0.69, 0.01] [0.96, 0.02, 0.02]

[0.27, 0.45, 0.38] [0.68, 0.29, 0.03]

216.35 116.64

-91.90 -102.86

B. The analysis of results (1) The analysis of revenue in each attack path There are three attack paths in this case, which are as follows: ① H1  H 2  H 3  H 4 , this attack path contains seven confrontation phases, which can be described as S1 ,{S2 , S3 },{S4 , S5 }, S6 , S9 . ② H1  H 2  H 4 , this attack path contains four confrontation phases with four times network state transition, which is S1 , S3 , S6 , S9 . ③ H1  H 4 , this attack path contains five confrontation phases, and its corresponding network state transition is S1 , S2 ,{S7 , S8 , S9 } . (2) The analysis of strategy selection ① Varying hopping period MTD has higher defensive benefit and cost than fix hopping period MTD. Hence, when the hopping space is limited, the increase of defensive benefit of varying hopping period MTD is much more than those in fix hopping period MTD, which greatly improves the hopping defensive benefit. By analyzing Table Ⅴ-Ⅷ, in network state S1 , it can be

S1

to

with

possibility

0.85,

which

is

T ( S3 , P , P , S1 )  0.85 . Therefore, MTD with varying hopping period has high defensive benefit than those in MTD with fix hopping period, but it has higher defensive cost at the same time. When the space of ASV is limited, varying hopping period can increase the difficulty of attackers to scanning from both the temporal and spatial dimensions, leading to better defensive effectiveness. The above conclusion is consistent with the conclusions drawn by literature [16,14]. ② Multi-elements hopping has higher defensive benefit compared with single element hopping. But it is necessary to hopping collaboratively among different elements rather than simply multi-elements superposition. In network state S 2 , the attacker selects to launch attacks A1 3

D3 3

with mixed strategy {P2A1 , P2A2 , P2A3 } , while the defender can select strategy from {P2D1 , P2D2 , P2D3 } to defend. Although both defensive strategies P2D2 and P2D3 are MTD hopping, the defender selects P2D with higher possibility. The reason is that 2

strategy P2D2 selects IP address and protocol to collaboratively hopping, which greatly increases the defensive benefit than those by using strategy P2D3 . Besides, known from Table Ⅶ, in network state S1 and S3 , the defender selects IP address as single hopping element in P1D , and selects protocol as single 3

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 hopping element in P3D , but the defensive benefit is much less 3

than that in S 2 by using strategy P2D2 . Similarly, when network state is S 4 , the defender prefers to select strategy P4D3 , which is also collaborative hopping of IP address and protocol. The defensive benefit is higher than that in network states S1 and S3 . Moreover, shown from the benefit matrix in network state S 4 , the increase of cost by using multi-elements collaborative hopping is much less than the increased defensive benefit of it. The fundamental reason lies that MTD hopping with multi-elements can increase the difficulty of attack reconnaissance with super-linear growth. In an ideal condition, when there are k elements used in MTD hopping collaboratively, the attack surface value range of each of them is m1, and the exploration surface value range of each of them is m2, the exploration space of the attacker to the targeted network system is O((m1+m2)k ) if the hopping period is fixed. What’s more, if the hopping period is varied, the exploration space of the attacker to the targeted network system is O(t(m1+m2)k ). As a result, the attack cost will also increase with the exponential growth. Therefore, within the permitted network performance overhead, the effective combination of hopping elements, hopping method, and hopping period will tremendously improve the defensive benefit, which is consistent with the result in [17, 32]. ③ Mixed hopping method not only achieves higher network security than those by only shifting attack surface or expanding exploration surface, but also increases the performance overhead. Therefore, only when the hopping space is limited, MTD with mixed hopping method can increase the defensive benefit effectively. In network state S5 , the attacker selects mixed strategy

{P5A , P5A } to launch attacks, while the defender selects mixed 1

2

defensive strategy {P5D1 , P5D2 , P5D3 } . The analysis shows that the defender mainly selects strategy P5D1 and P5D to deploy since 2

all the defensive strategies are MTD hopping. P5D1 uses mixed change method, and P5D2 uses exploration surface expansion as hopping method. The hopping space of P5D is two times bigger 2

than the hopping space of strategy P5D3 , and the hopping space of P5D1 is three times bigger than the hopping space of strategy

P5D . As a result, strategies P5D1 and P5D dramatically increase the reconnaissance space of the attacker, leading to the increase of reconnaissance time complexity. The result is consistent with the result in [12], which increases the difficulty of attack scanning by deploying honeypots. However, in network state S9 , although defender can select 3

2

strategies from {P9D1 , P9D2 , P9D3 } , defender prefers to select “install monitor” rather than others. The reason is that the cost of using strategy P9D is too high, while the benefit of using 2

strategy P9D2 is almost the same with P9D1 . From the rationally

12

defender aspect, the defender prefers to select the strategy P9D1 , which is consistent with common sense. ④ Under the condition of correctly analyzing network threats, MTD defensive mechanism can achieve higher security defensive revenue than existing defensive mechanisms. In the network state S7 and S8 , defender only uses existing defensive strategies, such as close service and patch upgrade. Shown from Table Ⅷ, in the phase of the MTD confrontation, the attacker can achieve dramatically higher benefit than those in other network states. At the same time, in the network state S6 , the defensive benefit by using strategy P6D3 is greatly less than that by using strategy P6D1 or P6D2 . The fundamental reason lies that the attacker mainly gains privileges by using scanning or injection attacks, which leads to further attacks. On the other hand, the defensive strategy P6D3 selects stored data as hopping elements, which is hard to have any impact on the attacking implementation. Other than that, it produces more defensive cost than the “install monitor” strategy. Hence, network security cannot get benefit from deploying MTD hopping in blindness. Aimed at different network threat, the hopping elements should be selected accurately so as to achieve continuous, diversity change of protected network attributes, thus increasing the difficulty of attack cost and complexity. Therefore, the asymmetric situation between attack and defense can be mitigated, which is consistent with the conclusion in [3]. In summary, MG-MTD uses Markov decision process to describe the network multi-states transition in the process of MTD confrontation. The dynamic game theory is used in MG-MTD to describe the multi-phases of MTD offensive and defensive confrontation. The process of MTD confrontation can be precisely described by combining Markov decision process with the dynamic game. Besides, the usage of network resources by the attacker and the defender is converted to the change of attack surface and exploration surface, which improves the universality of MG-MTD model. In terms of optimal strategy selection, by taking the defensive cost into consideration of revenue function the selected strategies from MG-MTD have more practical significance. By calculating and obtain the offensive and defensive strategy in case study, the effectiveness of MTD is proved. In addition, the defensive benefit and cost of MTD hopping are related to the selection of hopping period, hopping method and hopping elements. They should be selected appropriately according to different network threats so as to achieve moderate defense. V. CONCLUSION AND FUTURE WORK Moving target defense is a revolutionary technique to change the situation of attack and defense. How to select the optimal hopping strategy has become the key problem in current research. To cope with the problems of not accurately describing the MTD confrontation by existing game models in current researches, thus having difficulties in analyzing and selecting the optimal strategy effectively, optimal strategy

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235 selection for moving target defense based on Markov game is proposed. In terms of model construction, the game theory categories of MTD confrontation is analyzed. By taking the non-cooperative, dynamic and Markov features of MTD into accounts, MTD model based on Markov game is constructed. Markov decision process is used to describe the transition among network multi-states in the process of MTD hopping, and dynamic game theory is used to describe the multi-phases of MTD hopping. At the same time, the basic principles and hopping factors of MTD are expounded from the view of attack surface and exploration surface at first. Based on it, the usage of network resources by attackers and defenders is converted to the change of attack surface and exploration surface, which improves the universality of MG-MTD. In terms of the optimal strategy selection, by taking the defensive cost into consideration of revenue function the selected strategies from MG-MTD have more practical significance. Based on the formal definition of MG-MTD, the algorithm of selecting optimal strategy and calculating its corresponding revenue is designed. By converting the optimal strategy selection to nonlinear programming problem, the designed algorithm simplifies the computational complexity. Finally, case study proves the effectiveness of constructed model and the practicality of the method to select optimal strategy. In spite of all the endeavors having been made, how to set more accurate and reasonable defensive strategy of MTD requires much more experiments. In addition, we still need to have a further research on how to combine MTD defense with other means of network defense so as to figure out a more comprehensive defensive strategy. ACKNOWLEDGEMENT This work was supported by the National Basic Research Program of 973 Program of China (2011CB311801); The National High Technology Research and Development Program of China (863 Program) (2015AA016106); Zhengzhou Science and Technology Talents (131PLKRC644) and "Strategic Priority Research Program" of the Chinese Academy of Sciences, Grants No. XDA06010701. Thanks for the valuable review comments of every experts and editor. REFERENCES [1] Jajodia S, Ghosh A K, Swarup V, et al. Moving target defense: creating asymmetric uncertainty for cyber threats[M]. Springer Science & Business Media, 2011 [2] Cai G, Wang B, Hu W, et al. Moving target defense: state of the art and characteristics[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(11): 1122-1153.. [3] Cybenko G, Hughes J. No free lunch in cyber security[C]//Proceedings of the First ACM Workshop on Moving Target Defense. ACM, 2014: 1-12. [4] Hamilton S N, Miller W L, Ott A, et al. The role of game theory in information warfare[C]//4th Information survivability workshop (ISW-2001/2002), Vancouver, Canada. 2002. [ 5 ] Liang X, Xiao Y. Game theory for network security[J]. IEEE Communications Surveys & Tutorials, 2013, 15(1): 472-486. [6] Lei C, Ma D, Zhang H. Moving Target Network Defense Effectiveness Evaluation Based on Change-Point Detection[J]. Mathematical Problems in Engineering Volume 2016 (2016), Article ID 6391502, 11 pages [ 7 ] Trustworthy Cyberspace: Strategic Plan for the Federal Cybersecurity Research and Development Program https://www.whitehouse.gov/site

13

s/default/files /microsites/ ostp/fed_cybersecurity_rd_strategic_plan_2011.pdf [8] Manadhata P K, Wing J M. An attack surface metric[J]. IEEE Transactions on Software Engineering, 2011, 37(3): 371-386. [9] Zhuang R, DeLoach S A, Ou X. Towards a theory of moving target defense[C]//Proceedings of the First ACM Workshop on Moving Target Defense. ACM, 2014: 31-40. [10] Hobson T, Okhravi H, Bigelow D, et al. On the challenges of effective movement[C]//Proceedings of the First ACM Workshop on Moving Target Defense. ACM, 2014: 41-50. [11] Zhuang R, Bardas A G, DeLoach S A, et al. A Theory of Cyber Attacks: A Step Towards Analyzing MTD Systems[C]//Proceedings of the Second ACM Workshop on Moving Target Defense. ACM, 2015: 11-20. [12] Clark A, Sun K, Bushnell L, et al. A Game-Theoretic Approach to IP Address Randomization in Decoy-Based Cyber Defense[C]//International Conference on Decision and Game Theory for Security. Springer International Publishing, 2015: 3-21 [ 13 ] Shin S, Xu Z, Gu G. CloudRand: Building Heterogeneous and Moving-target Port Interfaces for Networked Systems[J]. Network and System Security Laboratory Technical Report, Department of Computer Science & Engineering, Texas A&M University, 2011 [ 14 ] Jafarian J H, Al-Shaer E, Duan Q. Adversary-aware IP address randomization for proactive agility against sophisticated attackers[C]//2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2015: 738-746. [15] Wu Jiangxing, “Meaning and Vision of Mimic Computing and Mimic Security Defense,” Telecommunications Science, vol.30, no. 7, pp. 1−7, 2014 [16] Jafarian J H H, Al-Shaer E, Duan Q. Spatio-temporal Address Mutation for Proactive Cyber Agility against Sophisticated Attackers[C]//Proceedings of the First ACM Workshop on Moving Target Defense. ACM, 2014: 69-78 [17] Zhou H, Wu C, Jiang M, et al. Evolving defense mechanism for future network security[J]. IEEE Communications Magazine, 2015, 53(4): 45-51. [ 18 ] Manadhata P K. Game theoretic approaches to attack surface shifting[M]//Moving Target Defense II. Springer New York, 2013: 1-13. [19] Colbaugh R, Glass K. Predictability-oriented defense against adaptive adversaries[C]//2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2012: 2721-2727. [20] Zhu Q, Başar T. Game-theoretic approach to feedback-driven multi-stage moving target defense[C]//International Conference on Decision and Game Theory for Security. Springer International Publishing, 2013: 246-263. [21] Carter K M, Riordan J F, Okhravi H. A game theoretic approach to strategy determination for dynamic platform defenses[C]//Proceedings of the First ACM Workshop on Moving Target Defense. ACM, 2014: 21-30. [22] Prakash A, Wellman M P. Empirical game-theoretic analysis for moving target defense[C]//Proceedings of the Second ACM Workshop on Moving Target Defense. ACM, 2015: 57-65. [23] Vadlamudi S G, Sengupta S, Taguinod M, et al. Moving Target Defense for Web Applications using Bayesian Stackelberg Games[C]//Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 2016: 1377-1378. [24 ] Miehling E, Rasouli M, Teneketzis D. Optimal Defense Policies for Partially Observable Spreading Processes on Bayesian Attack Graphs[C]//Proceedings of the Second ACM Workshop on Moving Target Defense. ACM, 2015: 67-76. [25] Maleki H, Valizadeh M H, Koch W, et al. Markov Modeling of Moving Target Defense Games[J]. Journal of Cryptology, 2016 47-83. [26] Nilim A, El Ghaoui L. Robust control of Markov decision processes with uncertain transition matrices[J]. Operations Research, 2005, 53(5): 780-798. [27] Zhang Yong, Tan Xiaobin, Cui Xiaolin, et al. Network security situation awareness approach based on Markov game model[J]. Journal of Software, 2011, 22(3): 495-508. [28] Doraszelski U, Escobar J F. A theory of regular Markov perfect equilibria in dynamic stochastic games: Genericity, stability, and purification[J]. Theoretical Economics, 2010, 5(3): 369-402. [29] Lin Chuang, Wan Jianxiong, Xiang Xudong, et.al. Dynamic Optimization in Computer Systems and Computer Networks: Models, Solutions, and Applications[J]. Chinese Journal of Computers, 2012, 7: 1339-1357. [30] Shapley L S. Stochastic games[J]. Proceedings of the national academy of sciences, 1953, 39(10): 1095-1100.

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2016.2633983, IEEE Access

IEEE Access 2016-02235

14

[31] Feng Xuewei, Wang Dongxia, Huang Minhuan et.al. A mining Approach for Causal Knowledge in Alert Correlating Based on the Markov Property [J]. Journal of Computer Research and Development 2014, 51(11): 2493-2504. [32] Zhao Z, Gong D, Lu B, et al. SDN-based Double Hopping Communication against sniffer attack[J]. Mathematical Problems in Engineering, 2016, 1724-1739

2169-3536 (c) 2016 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Suggest Documents