Science Academy Transactions on Computer and Communication Networks Vol. 1, No. 1, March 2011 Copyright © Science Academy Publisher, United Kingdom
Science Academy Publisher
Potential Game Models for Efficient Resource Allocation in Wireless Networks Yenumula B. Reddy Dept. of Math and Comp Science, Grambling State University, Grambling, LA 71245, USA
[email protected]
Abstract – With increasing demand for data transfer requirements on wireless networks, spectrum became a scarce resource due to inefficient allocation and management. Recent research efforts diverted the problem towards dynamic spectrum access models for effective utilization of the unused or idled spectrum. The models include overlay/underlay techniques by designing the framework to improve the spectrum efficiently by using business and game theory models. Surveying of these models concludes that the key component for efficient utilization of the unused spectrum is the detection of the unused spectrum at any given time. Further, we found that by allocating the unused spectrum using appropriate techniques (business, game, and hybrid models) will produce better results. Among these models, the game models were identified as one of the powerful mathematical tools to detect and allocate the unused spectrum. In this paper, we first discussed the role of game models in wireless communications, player‟s strategy selection for better utility, and then proposed a correlated equilibrium algorithm for efficient allocation of spectrum. The simulations conclude that the mixed strategies are better than the pure strategies in resource allocation.
1.
Introduction
The primary requirement for efficient allocation of the spectrum is to sense, manage, mobile, and share the spectrum. Sensing, management, mobility, and sharing are the main functions of cognitive radio (CR). CR is a tool to wireless networks that regulate its reception parameters to communicate efficiently with minimal interference to the licensed users. It is a difficult task for CR to sense all N channels in the spectrum at the same time, since the target of CR is to track the available channels and opportunistically grab the appropriate channel using various strategies. Among the strategies, game models seem to be more promising way to sense the unused spectrum (available channels) and help in assigning appropriate channel to the cognitive user. Detecting spectrum holes (unused spectrum) without any errors (false alarms) and then efficiently allocating the unused spectrum is a critical issue. Once the unused spectrum is detected, the cognitive user (CU) decides to transmit on a selected high utility channel based on detection analysis (outcome). Trusting the detector, minimizing the interferences, and avoid the collisions are part of the resource (spectrum) allocation problems. If the probability of detection accuracy is extremely high, then the channel will be allocated using the channel allocation policies. The channel allocation policy must avoid the collisions and interferences. The dynamic spectrum allocation and exclusive use model was discussed in [1]. Spectrum holes are the channels (spectrum) not used by the primary user and available to the CU. The following methods are used to detect the absence (or
presence) of the primary user: Energy detectors (ED) [2] that measure the energy through signal strength indicator in the input wave over a specific interval. Matched filters (MF) [3] requires dedicated receiver for every primary user class. It has a history of the primary user signal at the physical layer and medium access control layers. Feature detection (FD) [4] uses the wireless device called cyclostationary signal processing to detect the presence of primary signals. FD is used in military operations and helps to detect the weak signals. Cooperative spectrum sensing (CSS) [5, 6, 7] uses the multiple CRs in high (tall and dense) building areas. It is categorized as (a) decentralized-uncoordinated techniques, (b) centralized-coordinated techniques, and (c) decentralized-coordinated techniques. Among these the techniques, (b) and (c) perform better. The decentralized-coordinated techniques have minimal overhead and poor performance. There are hybrid techniques to detect the spectrum holes. The entry and exit of the primary user with minimum false alarms was discussed in [8] and auto correction model was discussed in [9]. The auto correction model is an excellent fit for moderate and large cells. Further, the signal in microcellular environments is less accurate due to the contamination with multipath fading. Detecting and efficient utilization of the unused spectrum
Science Academy Transactions on Computer and Communication Networks will help the CUs. Further, the CR detects the presence of primary user (PU) with the help of ED in the spectrum space and further detects the geographical information of PU using collaborative communication [5, 6, 7]. Since the CRs are installed at secondary user (SU) level, the PUs require the appropriate protection in the presence of SUs. For an example: No-talk zone depending upon the CRs transmit power is required between PU and SUs. The no-talk zone must be outside the decoding area. The SUs must respect the protected area during the absence of PUs. With the above information, we conclude that the CR environment is a complex process with states called orient, plan, decide, act, learn, and observe. These states can be simulated to Markov game model with players‟ actions (various states, strategies, and actions) to produce a utility function. Since learning states (plan, decide, act, learn, and observe) are parts of the player‟s strategies, the player learns from history of actions to produce maximum outcome (utility). Further, the player chose different strategies to maximize the outcome. By changing any player‟s strategy leads to lower the utility than current selection of the strategy then we say the players are in an equilibrium state. To reach Nash Equilibrium, it is truly necessary to maximize the utility function. If the incentives of all players in the game can be expressed in one global function, then we say the game is a potential game. Potential games are of two types, ordinal and cardinal potential games. In cardinal potential games, change in the payoff of an individual has the same values as in the potential function. In ordinal potential game, only signs of differences have to be the same. Some of the game models are: Repeated game model: Players observe the history of actions and select strategies at each stage and predict the future actions of opponents. In repeated games, one player plays again with the same play and with every strategy the payoff is greater than minimax payoff till it reaches Nash equilibrium (NE). Single stage game or single-shot game is an example of repeated game. In a single-shot game, the player tries to use previous experience and do better till the player reaches the goal. Potential game model: In potential games, the incentives of all players are mapped into one function called potential function. The pure NE can be found by locating the local optima of the potential function. The decisions of players change according to the response. A repeated game can be a potential if it uses the history of better performance. Congestion games: Congestion game models are a specific class of potential game models that allocate resources among the selfish players. The strategy of one player is determined by many other players who overlap the same strategy. Myopic game model: In Myopic games, the system (CR) obtains the current state of information and reacts on it. It does not look into history of communication channel for selection. Zero sum game: In zero sum game, the gain or lose by
22
one player is exactly balanced by the other participant player. In zero sum game, if two players are bidding for a spectrum one player will win. The zero sum game models are more useful in intrusion detection problems and two player games. Nonzero sum game: In nonzero sum game, an optimal solution can always be found. It has some complimentary interests and some interests that lead to the overall benefit. Nonzero sum games are useful in congestion control. Cooperative games: Group of players with cooperative (coalition or coordination) behavior to achieve the goal. In cooperative game, the players stay close together for the overall benefit of all players. For example, the institutional agreement for binding of players in a game to benefit the game and institution. Non-cooperative games: The players in the game make independent decisions and any cooperation is self-enforcing. The precise agreement of these players is the Nash Equilibrium. In the current research, the potential game model is selected in detection and allocation of the spectrum to an appropriate user. The potential game model is a useful tool since the change in any player‟s payoff matches by change in the utility function. The remaining part of the paper describes the relevant literature and recent developments, potential games and correlated equilibrium in detection and allocation of the spectrum to the cognitive users, different ways of selecting the spectrum using game models, correlated equilibrium and no-regret algorithm, simulations and discussion of results, and finally the concluding remarks and future research.
2.
Review of Literature
The spectrum opportunity and misdetection (false positive) was discussed in [1, 10-12]. In [1], the authors provided the taxonomy of dynamic spectrum access (DSA), illustrated the spectrum opportunity detection, tracking, and the spatial spectrum opportunity sharing. Further, the authors discussed the regulatory issues in open spectrum access (OSA) and confusion of use of the cognitive radio as a synonym of dynamic spectrum access. The work in [10-12] discusses the detection and allocation of the unused spectrum. In [10], the authors proposed a game model with genetic algorithm to analyze the radio frequency and efficient allocation of spectrum. Further, the Drake‟s equation was introduced to improve the detection of the primary signal in [11]. Efficient utilization of the spectrum with Q-learning was discussed in [12]. In [12], the Q-learning algorithm identifies the previously known signals and learns to detect the new signals. The signal characteristics show the identification of primary or secondary signals. Hoven [13] showed that signal detection and uncertainty in the receiver noise have close relationship. The detection can be improved through the use of pilot tone from the primary transmitter. Further, the cognitive radio depends on weak primary signals only if the location of primary receivers is unknown [14]. In [15], Wild et al proposed that the leakage power emitted by a local oscillator helps to detect the radio frequency (the presence of
Science Academy Transactions on Computer and Communication Networks primary signals). Cabric et al [16] specified the strength of the primary signal that improves significantly through matched filter, energy detector, and cyclostationary feature detection with the cooperation of CRs. In case of matched filters, a dedicated CR requires the history of the primary signal at both MAC and PHY layers. The energy detectors are highly susceptible to change noise levels. They do not differentiate between modulated signals, noise, and interference. The energy detectors do not work for spread spectrum signals. The detection of random signals is done by cyclostationary feature. In spite of all these features, the cooperative spectrum sensing is recommended if the neighboring CRs are programmed to collect the required information to detect the spectrum hole. Spectrum bidding and efficient allocation of unused spectrum use the economic, game theory, stochastic, case based reasoning, Markov, and hybrid models [17-20]. Auction-based dynamic spectrum allocation and lease the spectrum to CUs was discussed by [17]. The selected channel assignment to the highest priority user was discussed in [18] using the case-based reasoning and automatic collaborative filtering model. The congestion game model for maximizing the spectrum utilization by secondary users with minimal interference to primary users was discussed in [19]. The Hidden Markov Models using Baum-Welch procedure to predict the future sequences infrequency and use them in computing the channel availability was discussed in [20]. The spectrum bidding behaviors and pricing models that maximize the revenue and better utilization of the spectrum was discussed in [21]. The taxonomy of dynamic spectrum access in [1] clarifies the various terms used in literature. Spectrum overlay (opportunistic spectrum access) and spectrum underlay techniques are also used for efficient spectrum allocation. These techniques were discussed in [19, 22] for efficient spectrum allocation. In [20], the authors used Hidden Markov model to predicting the primary user and efficient use of unused spectrum. An algorithm for underlay and the quality of service (QoS) in code division multiple accesses (CDMA) with minimal interference was discussed in [23]. The market model for overlay was discussed in [24] where user imposes the spectral mask to generate better spectrum opportunities to secondary users. The model uses the market equilibrium (purchase power control for primary user) while controlling the interferences. The game theory issues, tools applied to wireless communications, and analysis of wireless resource allocation problems was discussed in [25-27]. In [25] the authors discussed the correlated equilibrium concept in stochastic games to solve the transmission problems. Further they discussed the correlated Q-learning algorithm that learns the Q-function values and calculates the correlated equilibrium policy for Markovian game. This concept explores the intelligent games and useful for future research in cognitive radios. Further the congestion game models and their applications are discussed in [19, 28]. Congestion game models are extremely useful to maximize the spectrum utilization. The utility of a specific player using a resource depends upon the other players that are using the same resource. The resulting payoff function depends upon the number of active users that creates congestion. Optimizing
23
the active users leads to better utilization of the resource (spectrum). In potential games current decision of the spectrum allocation depends upon the history of usage and allocation. In this paper, we formulate the problem using potential games with correlated equilibrium. The correlated concept was discussed in [29, 30, 31]. Ianni‟s [30] paper is useful in current research because it uses the correlated equilibrium with potential games. It concludes that the players‟ behavior and their mistakes cause the payoff function. Neyman‟s [31] conclusion of correlated equilibrium depends upon compactness of strategies, and potential is concave. 2.1. The Contribution The contribution includes the game models for wireless networks, selection of strategies by a player to maximize the utility function. The paper explains the correlated equilibrium (CE), Nash Equilibrium (NE), and role of CE for better selection of spectrum compared to the utility at NE. Further, the maximization of resource consumption when the system reaches CE is derived. The no-regret algorithm used to select the better channel through the regret bounds. It further helps the CU to predict the appropriate channel at any given time. The simulations show that the no-regret algorithm helps to generate better utility function and selection of appropriate channel.
3.
Potential Games and Correlated Equilibrium
Detecting spectrum holes and using dynamic spectrum access (DSA) techniques to improve the spectrum utilization by cognitive users (CU) is a key issue. The interest of an individual CU may conflict the overall network policies. The game models have a better solution to solve the individual and network interests. The interests include optimum power allocation to CUs with minimal interference. Further, the CU makes the decision using game models without knowing other players intentions, with the knowledge of the actions of other players (CUs), and history of players‟ actions. Chess game is one such a problem that a player knows the history of the other player‟s movements. In a card game, player does not know the opponent‟s cards but knows the history of player‟s actions in the game. In Sudoku game, current status on the board is known. Effect of filling of the next number on the board is unknown. In Sudoku, player does not know the history or future status, but careful plan is required. In games, Nash equilibrium (NE) is the stable state that a player can reach with maximum utility. Once the player reaches NE, changing of his strategy will lower the player‟s utility. The potential game model is extremely useful in cognitive radio applications particularly in efficient spectrum allocation. The incentive of all players (CUs) is mapped into a potential function that converges to local optima. If the game reaches NE, then change of any player‟s incentive disturbs the equilibrium state (NE). Therefore, the change of state and action at any given time changes the player‟s utility and can be represented through a probability function. The dynamics of the actions, state, and time require a specialized decision state that determine the equilibrium. If the current state does not produce NE due to the limitations, then we need to recommend mixed strategies. If none of the players
Science Academy Transactions on Computer and Communication Networks
24
will deviate from the recommended strategy through mixed strategies, then the distribution is called correlated equilibrium. A correlated equilibrium (reaches equilibrium above NE) is a strategy for each player chooses a strategy within his/her domain and no other strategy (pure or mixed) produces better utility than the suggested. Consider two players Alice and Bob to get a channel using the assigned strategies (Pure strategies). Let Alice have n strategies and Bob have m strategies. Let a i , j be the
Nayman [31] concluded that the game has unique correlated equilibrium if the strategy sets are compact and the potential to be strictly concave. This principle applies to a family of concave potential games where individual (CU) preference is high compared to the other members (other CUs) [32]. These users are in a group but selfishly achieve their required resources (spectrum). Let us consider the probability of resource consumption by Alice at time t in an equilibrium state is it with utility u it and probability of
Alice‟s utility when she plays i th strategy and Bob plays j th strategy. Similarly, bi , j be the Bob‟s utility when Bob plays
resource consumption with mixed strategy is i't with utility
i
th
th
strategy and Alice plays j strategy. If Alice plays i
th
strategy and receives best utility, then for any i if aij ai ' j , '
then we say the function reaches NE. Let Alice and Bob plays the mixed strategies. Let x n , where xi 0 and i 1....n for any mixed strategy played by Alice. Similarly, Let y m be the mixed strategy played by Bob for any y j 0 ( j 1....m ). Then, for any mixed strategy xi' we conclude that the game reaches correlated equilibrium, if
ij
xi aij y j
for a given
ij
x i
i
xi' aij y j
1 and
(1)
j
y j 1.
Similarly, it is true for Bob‟s mixed strategy. If i is in the best response to Alice and j is the best response to Bob, then
i
ij aij
i
(2)
ij ai ' j
where ij is the probability of Alice‟s i th strategy over Bob‟s j th strategy and
ij xi y j
(3)
Substituting equation (3) in (2) you get yj
xa i
i ij
yj
xa i
i i' j
(4)
equation. If a cognitive user reaches the CE, then the channel selected will be the most appropriate for the CU. The concept can be extended to multiple users contesting for the channel (spectrum). If multiple users are involved, we consider Alice is prime CU with other group of CUs as contestants for the resource. The equation (4) will be written as: i
i
i )ai
(x i
i
x i )ai '
(5)
The interpretation is that xi provides the trusted mixed strategy for i th user over other users.
( u i
t t i' i'
it uit ) 0
(6)
Equation (6) is equivalent to (5) except the maximization of resource consumption. The probability terms it and i't includes the strategies. Therefore, in correlated equilibrium the payoffs are measurable if the resource consumption is defined like the probability distribution and the probability distribution depends upon the mixed strategies.
4.
Game model for efficient Resource Allocation
Before we select the appropriate game model, let us examine all possibilities that the player can take. First, the player observes the strategy he plays, payoffs he obtains, and then he chooses the best strategy next time. Second, the player compares all possible strategies with relative payoffs and then selects the appropriate strategy. Third, the player observes the pure and mixed strategies of all players with matching payoffs, identifies the strategy for better payoff, and then selects the appropriate strategy. In the efficient resource allocation problem, the player needs to observe the strategies of all other players‟ selected strategies for better payoff. The game G with P users ( P p1 ...., p n ) , S strategies (S s1 .....s m ) , and generates represented as:
U utilities (U u1....u k ) is
G ( P, S ,U )
Then, we show that Alice reaches the equilibrium state, called correlated equilibrium (CE), which is above NE. The term y j will be cancelled since it appears both sides of the
(x , x
u it' . We assume the system reaches the correlated equilibrium if the following condition satisfies.
(7)
If the game is of the Type-1 (player with matching strategy), then for each player p i there is a relative strategy s i and payoff u i . In Type-2, a player selects the strategy
depending upon payoff (for example, a player pi selects strategy s j to achieve payoff u k , if u k generates better payoff). In the Type-3, the player p i selects a strategy s jk (mixed strategy) to produce payoff u jk .
The suffix jk
denotes the mixed strategy generated from strategies j and k which generates high utility (payoff). We will present below all the three types with appropriate examples. Type-1: A user pi selects strategy si to get utility u i at any given time. Further, if the user selects strategy s j then he gets utility u j and if u j ui , then the user stores the new
Science Academy Transactions on Computer and Communication Networks strategy with respect to utility u j . The new strategy value is useful for future selection. If the user pi selects the strategy s k and gets better than u k then the user ignores the previous
strategy u j and keeps the current. The process continues in selecting the other strategies by keeping the best strategy as current in the storage. This randomly selected strategy while keeping the best strategy (uses for his advantage) as current strategy is called „naïve play‟. The best example of Type-1 is „card play‟. Same card may not be the best in each game, but some cards (trump) are always helpful. Figure 1a shows the relationship between the user selected strategies with corresponding utility. The values in Table 1a are strategies selected by the user in each attempt and the relative utility. The Table 1a and Figure 1a shows the strategy at the fourth attempt has maximum utility. In Figure 1b, the strategy at ninth attempt has maximum utility (Table 1b corresponds to Figure 1b). It shows that in every attempt, the utility function changes depending upon the other CUs contesting for the same resource along with the strategy selected by the user. 1
0.95
0.9
Utility
0.85
0.8
0.75
0.7
0.65
1
2
3
4 5 6 7 Number of attempts by User
8
9
10
Figure 1a: Relation between the users‟ selected strategies with utilities 1 0.9 0.8
Utility
0.7 0.6 0.5 0.4 0.3 0.2
1
2
3
4 5 6 7 Number of attempts by User
8
9
10
Figure 1b: Relation between the users‟ selected strategies with utilities
Type-2: The user knows the strategies and relative payoffs (the user is an informed player). In Type-2, the strategies and corresponding utilities are represented in the table format. At any given time, the user pi ( P p1 ...., p n ) selects a strategy
si (S s1.....sm ) and compares with
matching utilities u i (U u1....uk ) and then selects the best valued strategy (refer to the Table 2a and 2b). The advantage of this model is that the user has control of looking at the table. The disadvantage is that the user needs to keep track of other users (opponents) trying for the same resource with same or different strategy to achieve the same resource. If the
25
matrix is static the problems are minimum, but in the case of dynamic game the table modifies with time. The table maintains is an issue in the dynamic games. The best examples are “prisoners‟ dilemma” and “battle of sexes”. In the battle of sexes, a two player game, the game is cooperative if they select the strategy that is advantageous for both players otherwise it becomes a non-cooperative game models (selfish). Consider Table 2a that represents the strategies and corresponding utilities. The user u i selects the strategy s i to receive the utility u i . To assign the best strategy to the user in the queue, the Table 2a needs to be organized in the descending order of utilities as in Table 2b. The best way is the first request first assign (FRFA). If we receive two requests at the same time, the assignment will depend on the previous history of the users. Let us consider the “prisoners‟ dilemma” problem as in Table 2c. In this example, two utilities depend upon the two users‟ decision and everyone gets equal or different share. The “battle of sexes” also a similar example and can be treated exactly same way as in „prisoners‟ dilemma‟. Type-3: The player observes the strategies of all other players and the corresponding utilities. Further, the player tries to play the combination of strategies (mixed strategies) to generate better utility than playing the pure strategies. The first two types (Type-1 and Type-2) reach equilibrium (NE) at some stage where there is no further advantage of changing the strategy. In Type-3, the mixed strategy, if the advantage is above the NE then we call it a „correlated equilibrium‟. The Type-3 is a learning game model to allocate resources efficiently. The algorithm for static correlated equilibrium in resource allocation is available in the next section. In a static game, each user p i aims to generate a rule to select the strategy s i from its action set (S s1.....s m ) to maximize the utility. Since each user controls its own actions, the best strategy selection (optimal policy) depends upon the other users in the game. If the user reaches a stage with no further change of policy (by changing the strategy, it lowers the utility), then we call it reached NE. In the current problem, the probability distribution is used to determine the correlated equilibrium. Refer to the mixed strategies x n in the previous section and equation (2). The interpretation is that Bob follows the instructions to his best interest along with other users in the game. It means that there is no deviation rule that could grant Bob a better expected utility other than x . In the dynamic correlation policy, we include the time and state, since state changes (or stay consistent) as time changes. At time t the player p i with strategy x i rather than s i . The mixed strategy i is defined as i (si , xi | s i ) ri (si , s i ) ri ( xi , s i )
where
xi
ri ( xi , s i )
is
the
function
x ( s )r ( s , s i
i
i
i
i ) are
(8) of t .
Note
that
functions of
t . In
si Si
mixed strategies, no added power is required, but it needs the learning by repeatedly playing the game. The learning part will be discussed through correlated equilibrium algorithm in the next section.
Science Academy Transactions on Computer and Communication Networks
5.
Select the MS and play. Calculate the utility Ums each time and store; Keep the best mixed strategy BMS and average of all mixed strategies AVm . Calculate the difference between best mixed strategy and best pure strategy ( BMS BPS )
Correlated equilibrium and no-regret algorithm
In this section, we examine the Type-3 model using noregret algorithm [33-35]. In Type-3 game model, the user knows his own and opponents‟ strategies with respective payoffs. Further, the user observes the opponents‟ actions and plays a mixed policy in each round and observes the reward. The no-regret algorithms are popular learning algorithms holds reward/punishment for all trials from time to time. The algorithm uses a set of policies (strategies) of the game and assumes predictions to play next time. The time factor makes the static policy into dynamic along with change of states. Let xt denotes the transmission policy (mixed strategy) generated through pure policies (strategies) of a user p , time t and state x. The current state of the user p and state of his opponents is represented as
xt , p , xt , p . In a multi-player game, the discounted value D of the user (playing with set policies) at the time t and state x is written as
D E n u p ( x, p) where is the discount factor, 0 1 ,
Calculate the difference between average-mixed strategy and average-pure strategy
( AVm AVp )
If δ>0, select the mixed strategy; If ε>0 select any mixed strategy above the AVp . End-of-algorithm The Algorithm-1 helps the selection of the best strategy in each trial. It consumes time because of the calculation, comparison of all utilities, and to select the best possible strategy. Figures 3a, 3b and Tables 3a, 3b show the selection of strategy (pure or mixed) in each trial in the game. The results conclude that mixed strategy is the best strategy in most of the times but not every time.
(9)
0.4 0.3
E is the error factor; Further, the user updates continuously
0.2
at each state (as time changes) and each user p , updates its value vector from time t to time t+1. The pay of a user p depends upon the mixed strategy and his opponents‟ activity. Therefore, the payoff of a user p with pure () and mixed
0.1
Utility
u is the utility, and
u p ( xt , p , xt , p )
-0.3 -0.4
(10)
-0.5
1
2
3
4
5
6
7
8
9
10
9
10
Trials
(11)
Figure 3a: Trial and Strategy selection 0.4
The discount factor is the difference of equations (10) and (11) is given by
0.3
0.2 0.1
(12) Utility
u p ( xt , p , xt , p ) u p (s xt , p , s xt , p )
0 -0.1 -0.2
() strategies is given by u p ( s xt , p , s xt , p )
26
If 0 with equilibrium state then the user achieves better value with mixed strategies. This stable state is called correlated equilibrium. At correlated equilibrium, the payoff of users is represented through a single potential function. The best strategy calculation is done through Algorithm-1.
0 -0.1 -0.2 -0.3 -0.4 -0.5
Algorithm-1 Set PS as pure strategy and MS as mixed strategy.
1
2
3
4
5
6
7
8
Trials
Figure 3b: Trial and Strategy selection
Let t1 ,...., t n denotes the rounds;
6.
In this paper, efficient allocation of the spectrum using pure and mixed strategies was discussed. The spectrum allocation is divided into three types. In Type-1, a user selects the strategy randomly and obtains the payoff. In random policy, user does not have any control on utility since the strategy selected was random. In Type-2, the player knows the strategy and corresponding outcome (utility). Therefore, the user compares the strategies and corresponding outcomes before the selection of a strategy. The problem arises if two
AVp is the average value achieved in last m rounds using pure strategies; AVm is the average value achieved in last m rounds using mixed strategies Select the PS and play. Calculate the utility Ups each time; Keep the best pure strategy BPS and average of all strategies AVp .
Conclusions and Future Research
Science Academy Transactions on Computer and Communication Networks users compete with the same strategy. If two users compete for the resource with the same strategy at the same time, the problem will be solved using the history of the allocation. In Type-3, the player observes the strategies of all other players and plays the pure or mixed strategies that have better utility in the selection of a strategy. In general, if the player plays with pure strategy, the player obtains maximum utility at NE. Using the mixed strategy the system generates better utility than NE and is stable at above NE called correlated equilibrium. The simulations conclude that the selection of mixed strategies is better than pure strategies.
Acknowledgment The research work was supported by Air Force Research Laboratory/Clarkson Minority Leaders Program through contract No: FA8650-05-D-1912. The author wishes to express appreciation to Dr. Connie Walton, Grambling State University, for her continuous support.
References [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9] [10]
[11]
[12]
[13] [14]
[15]
[16]
Q. Zhao and B. Sadler., “A Survey of Dynamic Spectrum Access: Signal Processing, Networking, and Regulatory Policy”, IEEE Signal Processing Magazine, 79, 2007. B. Wild and K. Ramachandran, “Detecting Primary Receivers for Cognitive Radio Applications”, First IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks (DySPAN 2005), Nov. 2005. A. Sahai, N. Hoven, R. Tandra, ”Some Fundamental Limits on Cognitive Radio”, Proc. Of Alletron Conference, Monticello, Oct 2004. W. A. Gardner, “Signal Interception: A Unifying Theoretical Framework for Feature Detection”, IEEE Trans. On Communications, Vol. 36, no. 8, 1988 K. Hamdi and K. Lataief, “Cooperative Communications for Cognitive Radio Networks”, The 8th A. Post Graduate Symposium on the Conference of Telecommunications, Networking, and Broad Casting (PG Net 2007), June 2007. J. Unnikrishnan and V. Veeravalli, “Cooperative Spectrum Sensing and Detection for Cognitive Radio”, IEEE Global Telecommunications Conference (GLOBECOM '07) 2007. S. Mishra, A. Sahai, and R. Brodersen, “Cooperative Sensing Among Cognitive Radios”, IEEE International Conference on Communications (ICC '06) 2006. A. Betran-Martinez, O. simeone, and Y. Bar-Ness, “Detecting Primary Transmitters via Cooperation and memory in Cognitive Radio”, 41st Annual Conference on Information Sciences and Systems (CISS apos 07), 14-16 March 2007 Page(s):369 – 369, 2007. M. Gudmundson, “Correlation Model for Shadow Fading in Mobile Radio Systems”, Electronics Letters, Vol. 27, No. 3, 1991. Y. B. Reddy, N. Gajendar, and S. K. Gupta., “Application of Genetic Algorithms on Game Theory Model for Efficient Spectrum Allocation” ”, WORLDCOMP‟08-ICWN 2008, July 14-17, 2008, Las Vegas, NV, USA. Y. B. Reddy, “Detecting Primary Signals Using Time and Space Model”, WORLDCOMP‟08 (ICWN 2008), July 14-17, 2008, Las Vegas, NV, USA. Y. B. Reddy, “Detecting Primary Signals for Efficient Utilization of Spectrum Using Q-Learning”, ITNG 2008, April 7-9, 2008, Las Vegas, Nevada, USA N. K. Hoven., “On the feasibility of Cognitive Radio”, Master Thesis, UC Berkeley, 2005. A. Sahai, A., N. Hoven., R. Tandra., ”Some Fundamental Limits on Cognitive Radio”, 42nd Allerton Conference on Communication, Control, and Computing, Sept 2004. B. Wild, K. Ramachandran., “Detecting Primary Receivers for Cognitive Radio Applications”, Proc. IEEE DySPAN 2005, pp 124130, Nov 2005. D. Cabric, S. M. Mishra, R. Brodersen, and A. Wolisz, “Implementation Issues in Spectrum Sensing for Cognitive Radios”, Asilomar Conference on Signals, Systems, and Computers, PACIFIC GROVE, CA, October 29 - November 1, 2006.
27
[17] Y. Wu, B. Wang, K. J. R. Liu, and T. C. Clancy., “Collusion-Resistant Multi-Winner Spectrum Auction for Cognitive Radio Networks”, GLOBECOM, 2008. [18] Y. B. Reddy., “Efficient Spectrum Allocation Using Case-Based Reasoning and Collaborative Filtering Approaches”, SENSORCOMM 2010, July 18-25, 2010. [19] Y. B. Reddy and Heather Smith., “Congestion Game Model for Efficient Utilization of Spectrum”, SPIE, 2010. [20] M. Sharma, A. Sahoo, and K. D. Nayak., “Channel Modeling Based on Interference Temperature in Underlay Cognitive Wireless Networks”, ISWCS, 2008. [21] S. Gandhi, C. Buragohain, L. Cao, H. Zheng, and S. Suri., “A General Framework for Wireless Spectrum Auctions”, DySPAN 2007. [22] V. D. Chakravarthy., “Evaluation of Overlay/Underlay Waveform via SD-SMSE Framework for Enhancing Spectrum Efficiency”, Ph. D. Thesis, Wright State University, 2008. [23] L. B. Le and E. Hossain., “Resource Allocation for Spectrum Underlay in Cognitive radio Networks”, IEEE Transaction on Wireless Communications, Vol. 7, No. 2, 2008. [24] Y. Xie., “Competitive Market Equilibrium for Overlay Cognitive Radio”, course project report, Stanford University, 2009. [25] J. W. Huang and V. Krishnamurthy., “Game Theoretic Issues in Cognitive Radio Systems”, J. of Communications, Vol. 4, N0. 10, 2009. [26] G. He, M. Debbah, and E. Altman., “Game-Theoretic Techniques for Intelligent Wireless Networks”, Cogis, Paris, France, 2009. [27] Z. Ji and K. J. Ray Liu., “Dynamic Spectrum Sharing: A Game Theoretical Overview”, IEEE Communications Magazine, 2007. [28] M. Liu and Y. Wu., “Spectrum Sharing as Congestion Games”, 46th Allerton Conf. Comm. Control and Computing, Monticello, IL, Sept. 2008. [29] R. J. Aumann., “Correlated Equilibrium as an Expression of Bayesian Rationality”, Econometrica, vol. 55, no. 1, 1987. [30] A. Ianni., “Learning Correlated Equilibria in Potential Games”, CARESS Working Paper 98-05, 1998 Department of Economics University of Southampton Southampton S017 1BJ U.K [31] A. Neyman., “Correlated equilibrium and potential games”, International Journal of Game Theory, Volume 26, Number 2, 223227, 1997. [32] R. Deb., “Independent Preferences, Potential Games and Household Consumption”, January 2008, http://mpra.ub.unimuenchen.de/6818/1/Potential_Games2.pdf. [33] B. Banerjee and J. Peng., “Efficient No-regret Multiagent Learning”, National Conference on Artificial Intelligence (AAAI), 2005. [34] G. Gordon., “No-regret algorithms for Online Convex Programs”, NIPS 2006. [35] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire., “Gambling in a rigged casino: The adversarial multi-armed bandit problem”, 36th Annual Symposium on Foundations of Computer Science, November, 1995.
Yenumula B. Reddy is a Professor of Computer Science at Grambling State University, received his PhD in Computer Science from the Indian Institute of Technology, Delhi, India. His current research includes Trust-bases approaching Wireless Sensor Networks and Dynamic Spectrum Access using Grame models. He is program committee member of conferences and chair of International Symposium on Networking and Wireless Communication in connection with ITNG conference.