A third level of adaptivity for branch prediction - CiteSeerX

9 downloads 427 Views 400KB Size Report
To show the e ect of history length on branch predictor performance, Figure 1a plots ..... perl. It should be noted that dhlf-gshare achieves the best performance in ...
A third level of adaptivity for branch prediction Toni Juan

Sanji Sanjeevan

Juan J. Navarro

Depart. of Computer Architecture Univ. Politecnica de Catalunya 08034 Barcelona (Spain)

(antonioj,sangi,juanjo)@ac.upc.es

Abstract

Accurate branch prediction is essential for obtaining high performance in pipelined superscalar processors that execute instructions speculatively. Some of the best current predictors combine a part of the branch address with a xed amount of global history of branch outcomes in order to make a prediction. These predictors cannot perform uniformly well across all workloads because the best amount of history to be used depends on the code, the input data and the frequency of context switches. Consequently, all predictors that use a xed history length are therefore unable to perform up to their maximum potential. We introduce a method |called DHLF| that dynamically determines the optimum history length during execution, adapting to the speci c requirements of any code, input data and system workload. Our proposal adds an extra level of adaptivity to two-level adaptive branch predictors. The DHLF method can be applied to any one of the predictors that combine global branch history with the branch address. We apply the DHLF method to gshare (dhlf-gshare) and obtain near-optimal results for all SPECint95 benchmarks, with and without context switches. Some results are also presented for gskewed (dhlf-gskewed), con rming that other predictors can bene t from our proposal.

1 Introduction Branch prediction is a key performance component for superscalar and deeply pipelined processors, where several wrong-path instructions can be in- ight before a branch is resolved. To reduce the number of lost cycles due to speculative execution of wrong-path instructions, branch prediction has evolved from static to more exible predictors. These predictors try to adapt to dynamic program behavior in order to improve their performance. A rst level of adaptivity has been the use of 2-bit saturating counters [10]. An additional level of adaptivity has been introduced using other sources of branch information such as the history of branch outcomes and the correlation between branches [14], [8], [15], [17], or even choosing among several predictors designed for di erent kinds of branch behavior [6], [3]. Some of the best current predictors are based on the two-level adaptive or correlated schemes proposed in [14] and [8]. They combine xed amounts of global history and program counter (PC) bits to generate an index to one or several pattern history tables (PHT) of 2-bit saturating counters. Several parameters in uence the performance of these predictors such as the size of the predictor tables, the way the branch history and PC bits are combined, the way the PHT is updated as well as the amount of history and PC information used.

Motivation

To show the e ect of history length on branch predictor performance, Figure 1a plots the misprediction rates for three two-level adaptive branch predictors: gshare [6] and the recently proposed  This work was partially supported by the Ministry of Education and Culture of Spain under grant CICYT TIC-0429/95 and by the CEPBA

1

(a)

ax

0.15 0.10 li

in

0.15 0.10 0.05 m

in

0.05

0.20 m

go

m

0.25

(12K bits)

0.20

go range li range

ax

Misprediction rate

0.25

(8K bits) (12K bits)

m

gshare agree gskewed

(b)

0.30

Misprediction rate

0.30

0.00

0.00 0

2

4

6

8

10

12

History length

gshare

agree

gskewed

(8K bits)

(12K bits)

(12K bits)

Figure 1: Misprediction rates of three di erent two-level predictors that use global history, using a 12-bit index (11-bit for gskewed), for go and li. a: E ect of the history length. b: Misprediction range for each benchmark and predictor. agree [11] and gskewed [7]. The benchmarks used are go and li from SPECint95. The area occupied by each predictor is 8Kbits for gshare, and 12Kbits1 for agree and gskewed. All three perform very much alike for li and even though their performance for go is di erent, all display the same kind of dependence on the history length. In Figure 1b we plot the same results as in Figure 1a, showing with shaded bars the range of misprediction rates when varying the history length for the three predictors. For all predictors, there is a signi cant range of variation in misprediction rate depending on the history length (e.g gshare on go varies from 20% to 27% and from 4% to 13% for li). Two interesting observations can be made from the misprediction ranges of go: First, even though the agree predictor has lower variation as a function of the history length, when optimal history lengths are compared, gshare and gskewed perform better. Second, gskewed achieves the best misprediction rate with a history length of four bits. However, its performance for more than half of all possible history lengths is inferior to that of the best gshare con guration, despite the fact that it occupies 50% more area. From Figure 1 we can conclude that  the history length used for a particular code has a signi cant impact on predictor performance,  determining the best predictor, for some codes, depends on the history length used,  for a given predictor, achieving the best prediction accuracy requires the use of di erent history lengths for each code. All predictors that use a xed history length are therefore unable to perform up to their maximum potential, and  the performance of two-level branch predictors can be further optimized by adapting the history length to each code. 1 agree has the same PHT as gshare (8Kbits) but has 4Kbits extra for the bias bit and gskewed has three banks of 2-bit saturating counters, each indexed with 11 bits. 12Kbits was the closest value greater or equal to 8Kbits that could be used.

2

24 0. 20

5

0.

19

5

0.10

07

0.00

2

1 04

0.05

0.

0.05

04

0.10

0.15

0.

0.15

0.20

0.

0.20

0

0.25

Misprediction rate

Misprediction rate

Optimized for go Optimized for li DHLF

7

go li

0.25

(b)

0.30

0.

(a)

0.30

0.00 0

2

4

6

8

10

go

12

li

History length

Figure 2: Misprediction rates of two SPECint95 benchmarks (go and li) using a gshare predictor with a 12-bit index. a: E ect of the history length. b: History length optimized for go, li and using our DHLF method (striped bar).

Dynamic selection of the history length

We present a new method that, when applied to existing two-level predictors, performs very close to the best case with xed history lengths on all SPECint95 benchmarks. Our proposal adds another level of adaptivity, trying to better t the number of history bits needed by each benchmark and input data at execution time. This method can be applied to any member of the family of predictors that combine global branch history with PC bits to form an index to one or several PHTs such as gshare [6], gselect [8], gskewed [7], agree [11] and bi-mode [5]. We call this method Dynamic History-length Fitting or DHLF. As an illustration, Figure 2a shows the same results as Figure 1a but only for gshare. The PHT is indexed using 12 bits of the branch address xor-ed with global history bits varied from 0 to 12. Increasing the history length improves the performance for li. Whereas for go, the misprediction rate reaches a minimum with a history length of 3 and as the number of history bits is increased, the performance rapidly degrades. The arrows indicate the optimum for each code. Figure 2b shows what happens when gshare uses history lengths optimal for go (3 bits) or li (10 bits). In each case, optimizing for one code results in poor performance in the other. Finally, the striped bar of Figure 2b shows the performance achieved with the DHLF method applied to gshare, that we propose and evaluate in this paper. Note that nearly optimal results are obtained for both benchmarks with our method. We will show that when context switches are considered, the history length is even more critical for performance. The DHLF method is able to obtain the best results even in this environment.

Simulation methodology

The simulations have been conducted using ATOM [2] to obtain a trace of conditional branches from all SPECint95 benchmarks using reference inputs. The benchmarks were rst instrumented with ATOM and then executed on a DEC 21164 workstation running Digital UNIX V4.0A. We rst looked at the results of simulating 10, 100, 200, 500 and 1000 million dynamic conditional branches for several benchmarks. Since the results were stable after 100 million, we carried out 3

n

2

2-bit counters

m Global History PHT Index

2n entries

Pattern History Table

Program Counter

Figure 3: Detail of the gshare implementation evaluated all our simulations up to 200 million conditional branches. Throughout the paper we will use the term `branches' to refer to `conditional branches'. In all branch predictors simulated, the branch history register and the PHTs are immediately updated with the true outcome of the branch instead of using the predicted outcome or waiting for the outcome to be known |it has been shown in [16] that this has little overall e ect on prediction accuracy. The study is carried out with gshare as described in [6]. Figure 3 shows how the m bits of global history are xor-ed with the m higher-order bits of the n low-order bits of the PC (after discarding the 2 lowest bits) to generate the index into the PHT. The PHT consists of 2n two-bit saturating counters initialized to saturated taken. Due to lack of space, we exhaustively study the e ect of di erent parameters on predictor performance using only two of the SPECint95 benchmarks |go and li. These two benchmarks were chosen since they represent two distinct types of variation in performance with changing history lengths. After analizing these results, we x some of the parameters and present the results for the rest of the benchmarks. Detailed results for all SPECint95 codes are included in Appendix B.

Paper organization

In section 2, we study the e ect of the history length on misprediction rates for gshare as a function of predictor size in the absence of context switches. Section 3 describes our DHLF method and evaluates it's performance. The e ect of context switches on predictor performance is studied and evaluated in section 4. In section 5 we apply our method to gskewed and present some results. Section 6 discusses related work and section 7 provides some concluding remarks.

2 E ect of history length on predictor performance To understand the e ect of history length on prediction accuracy, we simulate gshare on all SPECint95 benchmarks for PHTs indexed with 10, 12, 14 and 16 bits and history lengths ranging from 0 to the number of index bits. The misprediction rate on each benchmark is presented in Figure 4. Each curve represents a particular predictor size. All graphs are plotted within a window ranging from 0% to 20% of misprediction rate, except for go that is plotted from 12% to 32%. Since for all plots the size of the misprediction range shown is the same, the di erences in misprediction rates between any curve can be directly compared across the benchmarks. From Figure 4 we can identify three di erent behaviors observing how the code predictability evolves when the history length is increased: the predictability for compress and li improves as 4

Benchmark compress gcc go ijpeg li m88ksim perl vortex

Static 90% % of Cond. branches Branches 462 25 11.7 17650 2816 6.2 4962 645 8.7 1290 47 11.3 775 50 7.0 885 88 7.5 1586 65 10.0 4381 392 7.7

Table 1: Relevant characteristics of the SPECint95 codes: Number of static conditional branches, number of static conditional branches responsible of 90% of dynamic conditional branches and the percentage of conditional branches in the dynamic instruction stream. more history is used. ijpeg, m88ksim and especially perl show some irregular behavior for di erent history lengths. Finally, prediction accuracy for gcc, go and vortex improves with more history bits but quickly starts to degrade. These di erent behaviors depend on the number of static branches that account for most of the dynamic branches (see Table 1), the degree of correlation between branches and the predictor size. The range of predictor sizes studied |up to 128K bits| covers the predictor sizes for present and near future processors: from the MIPS R10000 that has 512 entries of 2-bit counters (1Kbit) indexed only with the PC [13] up to the recently announced DEC AXP 21264 that will have a hybrid predictor with an estimated area of 35Kbits [4]. To the best of our knowledge, except for the 21264, all processors use the equivalent of 16Kbits or less area for their predictors. When the index size of the predictor reaches 16 bits (PHT of 128Kbits), the optimal results for all benchmarks are achieved with history lengths close to the maximum. However, this requires having very big predictors, four times larger than the biggest one announced at present (DEC AXP 21264). In section 4 we show that in a more realistic environment, where context switches happen quite often and most of the PHT information is lost frequently, even the big predictors behave similar to the small and medium sized predictors presented in this section.

3 Dynamic history-length tting In the previous section it has been shown that each code requires a speci c amount of history to give the best results. All known implementations of two-level dynamic predictors that combine global history and PC bits have a xed amount of history length. Consequently, none of these predictors can give the best results across all benchmarks. We propose and evaluate a new implementation of gshare called dhlf-gshare that, instead of always xor-ing a xed number of history bits, is able to xor any number of history bits with the PC bits of the branch instruction. This predictor will try to dynamically nd the amount of history that performs best for each code and input data at execution time. It will do this by using the best history length required for di erent phases of the code execution. The extra hardware required to select the number of history bits to be xor-ed with the PC is very small. The Branch History Register (BHR) size has to be equal to the maximum history length envisaged. The number of entries of the PHT desired determines the number of PC bits that have to be used by the predictor (log2 [PHTentries]). All bits of the BHR are xor-ed with the PC but a decoder and a set of two bit multiplexers select the desired number of bits of the BHR that hold the past history information. Figure 5a depicts a normal BHR at the bit level and Figure 5b shows the modi cations required to select any number of consecutive BHR bits (from 0 to all of them). This additional logic is used in parallel with the xor and therefore does not 5

0.20

Index length

0.16

Misprediction rate

0.20

compress 10 12 14 16

go

0.20

0.16

0.28

0.16

0.12

0.24

0.12

0.08

0.08

0.20

0.08

0.04

0.04

0.16

0.04

0.12

0.00

0.00 0

2

4

6

8 10 12 14 16

0.12 0

2

4

History length 0.20

Misprediction rate

0.32

gcc

li

6

8 10 12 14 16

0.00 0

2

4

History length 0.20

6

8 10 12 14 16

0

0.20

m88ksim

perl

0.20

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00 2

4

6

8 10 12 14 16

History length

0.00 0

2

4

6

4

8 10 12 14 16

History length

6

8 10 12 14 16

History length

0.16

0

2

History length

0.16

0.00

ijpeg

vortex

0.00 0

2

4

6

8 10 12 14 16

History length

0

2

4

6

8 10 12 14 16

History length

Figure 4: E ect of history length on the misprediction rate for all SPECint95 benchmarks using a gshare predictor indexed with 10, 12, 14 and 16 bits. introduce any extra delay in the index generation. DHLF works on the basis of monitoring the mispredictions during program execution and changing the history length accordingly. We de ne an `interval' to consist of a xed number of consecutive dynamic branches. We call this number step. During the execution of the program the misprediction for each interval is computed using a xed history length. At the end of each interval the history length to be used for the next interval is determined based on the current number of mispredictions and the minimum value encountered so far.

3.1 Structure and operation

The components of the DHLF control consist of:  A misprediction table with as many entries as the number of bits of index to the PHT. Entry n holds the number of mispredictions that occured the last interval when a history length of n was used.  A pointer to the table entry that corresponds to the number of history bits currently in use.  A misprediction counter that counts the mispredictions for the current interval.  A branch counter that counts the number of predicted branches in the current interval. When this counter reaches a value of step it indicates the end of an interval. When the program starts execution, all entries of the misprediction table, the misprediction counter and the branch counter are initialized to zero. The number of history bits to be used is also set to zero. During each interval of step branches, the history length remains xed in order to determine the number of mispredictions for the current phase of execution with the current history length. 6

Number of History bits 0

1

2

3

4

5

6

Prediction 0 Prediction BHR

BHR

PC

PC

PHT Index

PHT Index (a)

(b)

Figure 5: Hardware at the bit level to generate a 6-bit index to a PHT. a: for gshare with a xed history length of 4 bits. b: for dhlf-gshare, 6 bits of PC can be xor-ed with any number of history bits between 0 and 6. At the end of the interval the current number of mispredictions is stored in its associated entry in the misprediction table and compared with the minimum number recorded in there. If the current number of mispredictions is less than or equal to the minimum in the misprediction table, the history length is not changed for the next interval. If it is greater, then the history length is changed. Finally, the misprediction and branch counters are reset before starting a new interval. Changing the history length could be done in two di erent ways. One way would be to directly set the history length to the one corresponding to the minimum in the misprediction table. Another possibility would be to move towards it, increasing or decreasing by one the current history length. The latter option has been chosen because it enables the testing of history lengths in-between that may not have been tried for some time and might even yield lower mispredictions. Finding the entry that contains the minimum number of mispredictions can be done easily. During each interval all entries of the misprediction table remain unchanged. At smaller periods (each step/index-bits for instance) the control can test one of the entries so that when step branches have completed, the entry of the misprediction table that has the minimum is already known. Note that all possible history lengths will be tested at least once because all table entries are initialized to zero. Each time the history length is changed, the index value generated for a given branch PC with the same pattern of history bits changes. This means that a di erent entry of the PHT will be used to make the prediction, introducing aliases in the PHT. For this reason, when the history length is changed, most of the state in the PHT is lost and must be regenerated before reaching a stable state. The increased amount of aliasing in the PHT immediately after a history length change introduces extra mispredictions that would corrupt the true performance of the current history length. The solution would be to allow some adequate warm-up time before starting to count the mispredictions for the current history length. After testing various values we chose this warm-up time to be equal to that of an interval, step. Consequently, the control treats the interval after a change in history length in a special way. During this interval, the misprediction counter and table are not updated. At the end of this interval no comparison is made between the current number of mispredictions and the minimum value in the misprediction table. Then, a normal interval begins with the same history length. 7

12

12

go

History length

10

10

8

8

6

6

4

4

2

2

0

0 0

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

12

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

12

li

10

History length

0.0

10

8

8

6

6

4

4

2

2

0

0 0

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

Branch number (in millions)

Percentage of branches

Figure 6: Evolution of the history length during the execution of go and li when DHLF is applied to gshare with an index of 12 bits and a step value of 16K. Also shown is the percentage of total time spent at each possible history length.

3.2 Tradeo s

The step parameter controls the number of branches between possible history length changes. Using a small step value allows the DHLF to react to small changes in the predictability of the code. In case the program has several phases that require the predictor to adapt to di ering requirements, it could even behave better than gshare with the best xed history length. Moreover, with a small step value the optimum history length can be determined faster. However, the step value has to be big enough to be able to count the mispredictions within a representative part of the code. Large PHTs require a large number of updates to reach a stable state. This implies that for larger predictor tables, larger step values will perform better. The extra mispredictions due to a history length change can reduce the bene ts of DHLF. Therefore, the DHLF control has to test all history lengths as many times as possible to nd the best one, but at the same time, as few times as possible so that the extra mispredictions are minimized. An alternate approach to reduce the number of mispredictions due to a change in history length is described in Appendix A. Figure 6 shows how the history length evolves over time during the execution of go and li, using dhlf-gshare. The PHT is indexed with 12 bits and the step value is 16K. For both benchmarks, there are several history length changes during the initial phase of execution. This stabilizes around the optimal static history length during the latter phase of execution. On the right of this gure is a histogram that shows the percentage of branches predicted at each history length.

3.3 DHLF area requirements

The DHLF control mechanism increases the area required to implement gshare. For an index of 10 bits and a step value of 16K, it requires a misprediction table with 11 entries that are each 8

0.20

m

Step value

ax

go

g av in

0.24

li

0.16

m

Misprediction rate

0.28

4K 8K 16K 32K 64K 128K

Misprediction rate

0.32

0.20

0.16

0.12

0.08

0.04

0.12

0.00 10

12

14

16

10

Index length

12

14

16

Index length

Figure 7: Misprediction rate of dhlf-gshare for several step values compared to the misprediction range of gshare with xed history lengths. The results are presented for go and li with 10, 12, 14 and 16 index bits. 13 bits wide (assuming that the misprediction rate will be lower than 50%). This works out to 11  13 = 143 bits extra, which is less than a 7% increase in area. For the case of an index of 16 bits, the extra area will be 17  13 = 221 bits, an increase in predictor area of less than 0.02%. For small PHTs (e.g 10 bits of index) the area required for the control can be minimized by storing the number of mispredictions divided by a power of two. We tested this by storing values divided by 2, 4, 8 up to 64, and there were no signi cant di erences in the results. Another way to reduce the area required to implement DHLF would be to reduce the number of history lengths allowed. For example, by using only even history lengths the number of entries in the misprediction table is halved with little e ect on DHLF performance.

3.4 DHLF evaluation

The parameter that has to be determined for the DHLF control is the step value. In Figure 7 we have plotted the results of dhlf-gshare for step values starting at 4K up to 128K applied to go and li. The shaded bars represent the range of misprediction values obtained using only gshare with xed history lengths. The maximum and minimum misprediction rates are obtained by running the simulation each time with a di erent history length, up to the number of index bits (see Figure 4). Also marked are the arithmetic mean values for the range of misprediction rates. On the shaded bars we have superimposed the misprediction rates obtained using DHLF for di erent step values. From Figure 7 it can be seen that the step value has no impact for all predictor sizes for li. In the case of go, as the predictor size increases, the e ect of the step value on performance is more noticeable. In general, large predictors will need a longer warm-up time after a history length change and hence will bene t from larger step values. In order to simplify the presentation we selected the step value to be 16K for the rest of the simulations even though this is not the best choice for large predictors. The DHLF method can e ectively overcome the signi cant dependence on history length that prevents current predictors from achieving the best performance. DHLF introduces a new parameter, step. As shown, the chosen value for step |above a certain threshold| does not a ect the overall performance. In Figure 8, we plot the dhlf-gshare results for all SPECint95 benchmarks, using a step value of 16K. As in Figure 7, we have superimposed the dhlf-gshare misprediction rate over the range computed for a gshare predictor. In almost all benchmarks, dhlf-gshare obtains near-optimal 9

0.20

compress

0.20

go

0.16

0.28

0.16

0.12

0.24

0.12

0.08

0.20

0.08

0.04

0.16

0.04

ijpeg

av g

m

0.08

in

0.12

Step value 16K

0.04 0.00

0.00 10

0.20

Misprediction rate

0.32

gcc

ax

0.16

m

Misprediction rate

0.20

12 14 Index length

16

0.12 10

0.20

li

12 14 Index length

16

0.00 10

0.20

m88ksim

12 14 Index length

16

10

0.20

perl

0.16

0.16

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00

0.00 10

12 14 Index length

16

0.00 10

12 14 Index length

16

12 14 Index length

16

vortex

0.00 10

12 14 Index length

16

10

12 14 Index length

16

Figure 8: Misprediction rate of all SPECint95 codes using dhlf-gshare with a step value of 16K compared to the misprediction range of gshare with xed history lengths (no context switches). results in comparison to using gshare with the best xed history length for each benchmark. In two particular cases (compress with a 14-bit index and m88ksim with a 12-bit index) the performance is even better than for any xed length gshare. This con rms our intuition that in some cases, the DHLF method can respond better to the history length requirements of di erent phases of the execution of the same benchmark. Only perl exhibits irregular behavior and optimal results are obtained for just one predictor size (14-bit index). The reason for this could be the non-uniform variance of history length requirements of this benchmark as shown by the spikes in Figure 4.

4 Considering context switches In real-world computing environments, context switches occur due to end of quantum, I/O, etc. It has been shown that the performance of even very accurate branch predictors degrades considerably when context switches are considered [3]. The main reason for this is that the information maintained by the PHT is lost periodically with every context switch. Large PHTs and long history lengths require longer warm-up times because more PHT entries are indexed by each static branch. Predictors with shorter warm-up times will have a higher prediction accuracy immediately after a context switch. Context switches introduce an additional restriction to these predictors that use xed amounts of history. The best history length for a given code can also change depending on the frequency of context switches. Since the frequency of context switches depends on many unknown factors, such as the load of the system at a given time, it would be impossible to design a single predictor that uses a xed history length and that always achieves optimum performance for even one benchmark. This highlights the importance of having the exibility to dynamically change the history length of predictors. The results presented in the next section show that the DHLF method is able to 10

nd the best history length independent of the code, input data and context switch frequency for all predictor sizes.

4.1 Simulation methodology

To simulate the e ect of context switches we ush the contents of the PHT each time a context switch occurs as in [3], reinitializing the PHT entries to saturated taken. We also tested reinitializing the PHT entries with random values and the results were very similar. We de ne context-switch distance as the number of branches executed between context switches. The context-switch distance depends on the percentage of branches in the code, the system con guration and its load at execution time. It could even change during the execution of a code under real conditions. However, to simplify the evaluation we study the e ect of xed context-switch distances, from 8K up to 256K. For SPECint95 codes this translates to between 40K and more than 3000K instructions between context switches (we have found that from 5% up to 12% of the instructions in the dynamic instruction stream are conditional branches). The contex-switch distances selected are similar to those used in [3].

4.2 E ect of history length on prediction performance

Figure 9 shows the e ect of context-switch distance on the misprediction rate for go and li, using the gshare predictor. The PHT is indexed with 16 bits while the history length varies statically from 0 up to 16 bits. Each curve corresponds to a di erent context-switch distance, ranging from 8K up to 256K. The gray curve shows the performance for the same predictor when context switches are not considered. The curves show the same dependence on the context-switch distance for all history lengths. As expected, the worst performance is obtained when the context-switch distance is the lowest. We have plotted Figure 9 with 16 bits of PHT index to highlight the e ect of context switches for big predictors. The arrows indicate the best history length for each context-switch distance. The best history length for go with the same input data varies from 2 up to 12 bits depending on the context-switch distance. Moreover, there is a large variation in performance depending on the context-switch distance and the history length (14% misprediction rate for go with no context switches compared to 36% for the same history length and a context switch each 8K branches |the latter data point is o the scale). In Figure 10 we present the e ect of context switches on misprediction rates for all SPECint95 benchmarks using gshare. As before, the index lengths studied are 10, 12, 14 and 16 bits and the history lengths are varied from 0 to the number of index bits for each curve. We x the contextswitch distance at 64K in order to simplify our presentation since it is an intermediate value. Unlike in Figure 4 (the non context-switch case) gcc, go and vortex now show the same behavior for all predictor sizes. As before, all codes have di ering history length requirements but now this aspect holds for all predictor sizes studied. It can also be seen that for some codes, increasing the predictor size brings no improvement because of the overhead introduced by context switches.

4.3 DHLF operation under context switching

The DHLF method applied to the case where there are context switches retains the same predictor control described in section 3. There are a few extra items to consider in order to allow dhlf-gshare to nd the best history length across context switches  the current value in the misprediction counter has to be discarded when a context switch occurs in the middle of an interval and is not stored in the misprediction table.  The misprediction table and the current history length must be saved each time a context switch occurs. This means saving from 143 bits for a 10-bit index up to 221 bits for a 16-bit

11

0.20

go

0.16 Misprediction rate

Misprediction rate

0.28

li Context-switch distance

0.32

0.24

0.20

0.16

0.12

8K 16K 32K 64K 128K 256K Inf.

0.08

0.04

0.12

0.00 0

2

4

6

8 10 12 14 16

0

History length

2

4

6

8 10 12 14 16

History length

Figure 9: E ect of history length on the misprediction rate of go and li using a gshare predictor indexed with 16 bits. The number of conditional branches between context switches varies from 8K up to 256K. Also shown is the misprediction rate when there are no context switches.



index, assuming a step value of 16K that requires 13 bits2 for the misprediction counter (i.e, the equivalent of saving four 64-bit registers). Additionally, the rst step branches after a context switch will not be considered to avoid the e ect of the PHT reconstruction, as is done immediately after a history length change.

4.4 Evaluation

Figure 11 shows the performance of dhlf-gshare for all SPECint95 codes using 10, 12, 14 and 16 index bits. The history length was dynamically adjusted every 16K branches as in previous gures. Once more, we have superimposed the dhlf-gshare results over the range of values obtained for gshare. The context-switch distance used was 70K because the behavior is almost the same as for 64K but is not a multiple of 16K, the step value used by dhlf-gshare. This allows us to test dhlf-gshare under negative conditions and shows that having a context switch within an interval does not reduce the performance of the method. From Figure 11 we see that near-optimal results are obtained for all benchmarks except for perl. It should be noted that dhlf-gshare achieves the best performance in most codes for all predictor sizes considered with the same step value.

5 Applicability to other predictors The DHLF method can be applied to any predictor that combines history and PC bits. As an example we present some results for one of the latest predictors, gskewed. The skewed branch predictor uses an odd number of PHTs and indexes each PHT using a di erent and independent hash function. All hash functions are computed from the same vector of PC and global history information. Predictions are read from each PHT and a majority vote decides the nal outcome. Details about the predictor and the hash functions can be found in [7]. We evaluated gskewed with three PHTs and a partial update policy on go and li. Since gskewed requires three tables we have studied the performance for indices of 9, 11, 13 and 15 bits. 2

assuming that the misprediction rate will always be lower than 50%

12

0.20

Index length

0.16

Misprediction rate

0.20

compress 10 12 14 16

0.20

go

0.16

0.28

0.16

0.12

0.24

0.12

0.08

0.08

0.20

0.08

0.04

0.04

0.16

0.04

0.12

0.00

0.00 0

2

4

6

8

10 12 14 16

0.12 0

2

4

History length 0.20

Misprediction rate

0.32

gcc

8

10 12 14 16

0.00 0

2

4

History length 0.20

li

6

6

8

10 12 14 16

0

0.20

m88ksim

0.20

perl

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00 2

4

6

8

10 12 14 16

History length

0.00 0

2

4

6

8

4

10 12 14 16

History length

6

8

10 12 14 16

History length

0.16

0

2

History length

0.16

0.00

ijpeg

vortex

0.00 0

2

4

6

8

10 12 14 16

History length

0

2

4

6

8

10 12 14 16

History length

Figure 10: E ect of history length on the misprediction rates of all SPECint95 benchmarks using a gshare predictor with 10, 12, 14 and 16 bits of index and context switches occurring every 64K conditional branches. This represents predictor sizes from 3K bits up to 192K bits, similar to the sizes used for gshare in previous sections. The shaded bars in Figure 12 show the range of misprediction rates for gskewed depending on the history length used to calculate the indices into the PHTs in the absence of context switches. As we saw in previous sections for gshare, gskewed also exhibits a signi cant range of misprediction values for a given predictor size due to the e ect of history length. We have superimposed the dhlf-gskewed results with a step value of 16K branches over the gskewed range. For li dhlfgskewed always achieves the optimal performance for all predictor sizes. Applied to go, dhlf-gskewed achieves near-optimal performance with small and medium sized predictors. For bigger predictors the performance obtained with dhlf-gskewed is quite good but not optimal. This is mainly because we use the same step value for all predictor sizes. Better accuracy is obtained for big predictors by using a larger step value |such as 32K or even 64K. In Appendix B, the dhlf-gskewed results for all SPECint95 benchmarks are shown. Figure 13 is similar to Figure 12 with the exception that we simulate context switches every 70K branches. As we pointed out before, context switching has a bigger negative impact on large predictors. Note that when context switches are simulated, dhlf-gskewed performs near-optimally for all predictor sizes in both benchmarks.

6 Related work Several studies have looked at the e ect of using di erent history lengths. Our DHLF proposal tries to nd the best history length to be used dynamically at execution time. To the best of our knowledge, only two proposals [1] and [12] have tried to adjust the amount of history used. 13

0.20

compress

0.20

go

0.16

0.28

0.16

0.12

0.24

0.12

0.08

0.20

0.08

0.04

0.16

0.04

ijpeg

av g

m

0.08

in

0.12

Step value 16K 0.04

Contex-switch 70K distance

0.00

0.00 10

0.20

Misprediction rate

0.32

gcc

ax

0.16

m

Misprediction rate

0.20

12 14 Index length

16

0.12 10

0.20

li

12 14 Index length

16

0.00 10

0.20

m88ksim

12 14 Index length

16

10

0.20

perl

0.16

0.16

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00

0.00 10

12 14 Index length

16

0.00 10

12 14 Index length

16

12 14 Index length

16

vortex

0.00 10

12 14 Index length

16

10

12 14 Index length

16

Figure 11: Misprediction rate of all SPECint95 codes using dhlf-gshare with a step of 16K compared to the misprediction range of gshare with all possible xed history lengths. The PHT is ushed every 70K conditional branches to account for the e ect of context switches. In [1] the static branches are classi ed depending on the bias of their behavior. The highly biased branches require a few bits of history whereas the less biased branches require a large history length. They propose using a hybrid predictor with two components, one with a few history bits and another with a large number. Only two possible values for history length are considered and further, these values are xed. Their alternate proposal consists of pro le-guided static prediction for highly-biased branches and dynamic prediction for the rest.  The study in [12] extends the work of [1] and tries to determine the exact history length for each static branch at compile time. These proposals are di erent from our proposal in signi cant ways: Since they focus on each speci c branch instruction both studies require a complex pro ling step to determine the amount of history to be used for each static branch. Further, both proposals would require modi cations to the instruction set to be able to use the information gathered in the pro ling phase at execution time. Finally, neither one of these proposals will be able to react to di erent input data or system workloads. Other articles such as [6], [9], [7], [5], have studied the e ect of history length to nd the best static combination of PC and history or to better understand the behavior of their predictors. In general, almost all studies based on two-level adaptive branch predictors assume that the nal implementation will have a xed number of history bits and usually select as many history bits as index bits. As we have seen in Figures 4 and 10 this can be the worst case for benchmarks such as gcc, go or vortex. 

14

0.30

0.20

li

m

ax

go

0.16 Misprediction rate

av in

0.22

m

Misprediction rate

g

0.26

0.18

0.14

0.12

0.08

0.04

Step value 16K 0.10

0.00 9

11

13

15

9

Index length

11

13

15

Index length

Figure 12: Misprediction rate of go and li using dhlf-gskewed with a step value of 16K compared to the misprediction range of gskewed with all possible xed history lengths.

7 Summary Almost all recently proposed predictors combine, in a xed way, information from the branch address with the history of the branch outcomes to predict the direction of conditional branches. We have shown that the performance of this type of predictors, for di erent codes, displays signi cant variations depending on the history length used. These predictors that combine PC and history in a xed way are losing a large part of their potential performance because the best history length depends on several factors that are only known at execution time. Some of these factors are the code to be executed, the input data and the number of conditional branches that can be executed between context switches in time-shared environments. Before choosing one or another predictor it is more important to use the best history length for each code for any given predictor. We have presented DHLF, a method that nds the best history length for a given code at execution time, with and without context switches. The evaluation of DHLF has been carried out by applying it to gshare (dhlf-gshare). All SPECint95 codes were run with this new predictor and the results con rm that it is able to achieve performances very close to optimum, compared to the best xed history gshare con guration, for each code. DHLF can be applied to any predictor that combines global history with PC bits. As an example we show a few results with dhlf-gskewed where DHLF also obtains near-optimal performance for each speci c benchmark. DHLF has low area cost, does not a ect the predictor critical path and does not require pro ling nor instruction set modi cation. We believe that using DHLF on any one of the two-level branch predictors will yield better prediction accuracies across a variety of codes, input data and context switch frequencies.

References [1] Po-Yung Chang, Eric Hao, Tse-Yu Yeh, and Yale Patt. Branch classi cation: a new mechanism for improving branch predictor performance. In 27th International Symposium on Microarchitecture, pages 22{31, Nov. 1994. [2] Alan Eustace and Amitabh Srivastava. ATOM: A exible interface for building high performance program analysis tools. In Proceedings of the Winter 1995 USENIX Conference, pages 303{314, Jan. 1995. 15

ax g

Misprediction rate

av in

0.22

li

0.16

m

Misprediction rate

0.26

0.20

go

m

0.30

0.18

Step value 16K 0.14

0.12

0.08

0.04

Contex-switch 70K distance

0.10

0.00 9

11

13

15

9

Index length

11

13

15

Index length

Figure 13: Misprediction rate of go and li using dhlf-gskewed with a step value of 16K compared to the misprediction range of gskewed with all possible xed history lengths. The PHT is ushed each 70K conditional branches to account for the e ect of context switches. [3] Marius Evers, Po-Yung Chang, and Yale N. Patt. Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches. In 23d Annual International Symposium on Computer Architecture, pages 3{11, May 1996. [4] Linley Gwennap. Digital 21264 sets new standard. Microprocessor Report, 10(14), Oct. 1996. [5] Chih-Chieh Lee, I-Cheng K. Chen, and Trevor N. Mudge. The Bi-Mode branch predictor. In 30th Annual International Symposium on Microarchitecture, Dec. 1997. [6] Scott McFarling. Combining branch predictors. Technical Note TN-36, Western Research Laboratory, DEC, June 1993. [7] Pierre Michaud, Andre Seznec, and Richard Uhlig. Trading con ict and capacity aliasing in conditional branch predictors. In 24th Annual International Symposium on Computer Architecture, pages 292{303, June 1997. [8] Shien-Tai Pan, Kimming So, and Joseph T. Rahmeh. Improving the accuracy of dynamic branch prediction using branch correlation. In 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 76{84, Oct. 1992. [9] Stuart Sechrest, Chih-Chieh Lee, and Trevor Mudge. Correlation and aliasing in dynamic branch predictors. In 23d Annual International Symposium on Computer Architecture, pages 22{32, May 1996. [10] James E. Smith. A study of branch prediction strategies. In 8th Annual International Symposium on Computer Architecture, pages 135{148, May 1981. [11] Eric Sprangle, Robert S. Chappell, Mitchel Alsup, and Yale N. Patt. The agree predictor: A mechanism for reducing negative branch history interference. In 24th Annual International Symposium on Computer Architecture, June 1997. [12] Maria-Dana Tarlescu, Kevin B. Theobald, and Guang R. Gao. Elastic history bu er: A low cost method to improve branch prediction accuracy. In Proceedings of the 1997 IEEE International Conference on Computer Design, pages 82{87, Oct. 1997. 16

[13] Kenneth C. Yeager. The MIPS R10000 superscalar microprocessor. IEEE Micro, 16(2):28{40, Apr. 1996. [14] Tse-Yu Yeh and Yale N. Patt. Two-level adaptive training branch prediction. In 24th Annual International Symposium on Microarchitecture, pages 51{61, Nov. 1991. [15] Tse-Yu Yeh and Yale N. Patt. Alternative implementations of two-level adaptive branch prediction. In 19th Annual International Symposium on Computer Architecture, pages 124{ 134, May 1992. [16] Tse-Yu Yeh and Yale N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. In 25th Annual International Symposium on Microarchitecture, pages 129{139, Nov. 1992. [17] Tse-Yu Yeh and Yale N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In 20th Annual International Symposium on Computer Architecture, pages 257{266, May 1993.

17

Appendix A: Reverse gshare, reducing the negative e ect of dynamically changing the history length The extra misses introduced when the history length is changed dynamically can be reduced by almost 50% if the xor between history and PC bits is done di erently. The branch history register (BHR) is a shift register that holds the outcomes of the last few branches. The performance of the gshare predictor is not a ected if the BHR is shifted left or right. In [14] it is presented as a left shifting register and it has been depicted that way in almost all subsequent studies. Figure 14a compares the performance of gshare for go as a function of the history length when 10, 13 and 16 bits of PC are used. The gshare curve corresponds to a left shifting BHR whereas the reverse curve is obtained shifting the BHR to the right. As expected, no signi cant di erences were found between gshare and reverse gshare. (b)

(a)

Misprediction

0.30

10 PC bits

gshare

13 PC bits

0.25

0.20

0.15

Local Misprediction

ghsare reverse

0.3

0.2 reverse

16 PC bits

0

4

8

12

16

1020

History length

1040

1060

1080

1100

Branch number (in thousands)

Figure 14: (a)Misprediction rates for reverse gshare and gshare predictors on number of index bits. (b) The plot of local mispredictions for the two cases

go,

varying the

However, the extra mispredictions introduced when the history length is changed dynamically can be halved if the BHR is shifted to the right, what we have called reverse gshare. If while using gshare, the DHLF control decides to increase by one the history length (from b to b+1), the same branch PC with the same last b outcomes will most likely generate a di erent index and a di erent entry into the PHT. However, if the reverse gshare is used, then under the same assumptions, on average 50% of the time3 the same PHT entry will be used because the b last outcomes will be xor-ed with the same PC bits and only the new history bit introduced could change the nal index value. This is further illustrated in Figure 14(b), where the local misprediction rates for reverse gshare and gshare are plotted. Figure 15 illustrates the di erences between gshare and reverse gshare when the history length is changed from 4 to 5 bits. The same applies when the history length is reduced by one.

3

the exact value will depend on the percentage of dynamic taken branches

18

reverse gshare

gshare

4 history bits

5 history bits

abcdefg...

PC

abcdefg...

PC

4321

BHR

54321

BHR

abcdefg...

PC

abcdefg...

PC

1234

BHR

12345

BHR

Figure 15: E ect on index generation when the history length is changed from 4 to 5 bits. gshare: shifting the BHR to the left. reverse gshare: shifting the BHR to the right

Appendix B: Detailed Results Table 2 shows the maximum, minimum and mean values of the misprediction rates obtained when gshare was used while varying the history length statically. Table 3 shows the misprediction rates when the Step size is varied for dhlf-gshare { without context switches.

19

Benchmark compress

gcc

go

ijpeg

li

m88ksim

perl

vortex

Max Min Mean Max Min Mean Max Min Mean Max Min Mean Max Min Mean Max Min Mean Max Min Mean Max Min Mean

10 0.1407 0.0919 0.1128 0.1889 0.1350 0.1575 0.3085 0.2262 0.2597 0.1418 0.1250 0.1281 0.1297 0.0616 0.0831 0.0853 0.0582 0.0730 0.0782 0.0257 0.0559 0.1249 0.0606 0.0863

12 0.1407 0.0874 0.1058 0.1153 0.0984 0.1085 0.2714 0.1950 0.2223 0.1283 0.1125 0.1211 0.1295 0.0414 0.0676 0.0708 0.0488 0.0545 0.0668 0.0059 0.0352 0.0606 0.0301 0.0427

14 0.1407 0.0818 0.1017 0.1059 0.0736 0.0821 0.2165 0.1654 0.1863 0.1283 0.1096 0.1176 0.1294 0.0366 0.0611 0.0697 0.0365 0.0460 0.0614 0.0003 0.0234 0.0327 0.0194 0.0247

16 0.1407 0.0774 0.0985 0.1043 0.0566 0.0683 0.2101 0.1306 0.1536 0.1283 0.1031 0.1150 0.1294 0.0336 0.0571 0.0724 0.0330 0.0435 0.0614 0.0003 0.0199 0.0312 0.0130 0.0176

Table 2: Range of misprediction rates for all SPECint95 codes (no context switches) when run with gshare using xed, static history lengths

20

Benchmark Step Size 4K 8K Compress 16K 32K 64K 128K 4K 8K gcc 16K 32K 64K 128K 4K 8K go 16K 32K 64K 128K 4K 8K ijpeg 16K 32K 64K 128K 4K 8K li 16K 32K 64K 128K 4K 8K m88ksim 16K 32K 64K 128K 4K 8K perl 16K 32K 64K 128K 4K 8K vortex 16K 32K 64K 128K

10 0.10149 0.10141 0.10134 0.10142 0.10153 0.10184 0.13710 0.13599 0.13628 0.13703 0.13611 0.13903 0.22849 0.22767 0.22822 0.22954 0.23085 0.23141 0.12505 0.12535 0.12655 0.12682 0.12742 0.12653 0.06288 0.05964 0.06167 0.06198 0.06168 0.06261 0.05960 0.05960 0.05968 0.05976 0.06002 0.06064 0.05356 0.05359 0.05617 0.05356 0.05068 0.02606 0.06356 0.06247 0.06256 0.06231 0.06135 0.06136

12 0.08856 0.08860 0.08868 0.08885 0.08918 0.08985 0.10412 0.10143 0.10225 0.10331 0.09985 0.10133 0.20356 0.19838 0.19746 0.19797 0.19829 0.19953 0.11408 0.11599 0.11483 0.11350 0.11412 0.11472 0.04579 0.04458 0.04505 0.04482 0.04326 0.04524 0.05319 0.04889 0.04748 0.04642 0.04650 0.04650 0.00611 0.02256 0.02260 0.00613 0.02276 0.00652 0.03514 0.03397 0.03379 0.03195 0.03272 0.03451

14 0.07999 0.08014 0.08019 0.08045 0.08098 0.08203 0.08531 0.08416 0.08122 0.08195 0.08014 0.07792 0.18075 0.17521 0.17004 0.17108 0.17239 0.17459 0.11149 0.11060 0.11121 0.11376 0.11232 0.11462 0.03787 0.03800 0.03726 0.03839 0.03746 0.03914 0.04130 0.04579 0.04071 0.03671 0.03672 0.03700 0.00886 0.00309 0.00886 0.00859 0.00060 0.00081 0.02514 0.02097 0.02256 0.02172 0.02067 0.02144

16 0.07582 0.07672 0.07698 0.07725 0.07792 0.07932 0.08061 0.06934 0.07204 0.06897 0.06586 0.06210 0.15746 0.15543 0.14791 0.14666 0.14064 0.14402 0.10701 0.10707 0.10453 0.10515 0.10943 0.11137 0.03692 0.03654 0.03546 0.03654 0.03588 0.03642 0.03545 0.04475 0.03421 0.03112 0.03091 0.03028 0.00567 0.00563 0.00044 0.00049 0.00058 0.00081 0.01929 0.01642 0.01464 0.01446 0.01427 0.01686

Table 3: Misprediction rates for various values of Step 21

compress 16K 32K 64K

0.32

gcc

0.20

go

0.16

0.28

0.16

0.12

0.24

0.12

0.08

0.20

0.08

0.04

0.16

0.04

ijpeg

av g

m

0.08

in

0.12

Step value 16K 0.04

Contex-switch 70K distance

0.00

0.00 10

0.20

Misprediction rate

0.20

ax

0.16

m

Misprediction rate

0.20

12 14 Index length

16

0.12 10

0.20

li

12 14 Index length

16

0.00 10

0.20

m88ksim

12 14 Index length

16

10

0.20

perl

0.16

0.16

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00

0.00 10

12 14 Index length

16

0.00 10

12 14 Index length

16

12 14 Index length

16

vortex

0.00 10

12 14 Index length

16

10

12 14 Index length

16

Figure 16: Misprediction rate of all SPECint95 codes using dhlf-gshare with step sizes of 16K,32K and 64K compared to the misprediction range of gshare with all possible xed history lengths. The PHT is ushed every 70K conditional branches to account for the e ect of context switches. Figure 16 shows the e ect of varying the step size on misprediction rates for all benchmarks. The PHT is ushed every 70K conditional branches to account for the e ect of context switches. Figure 17 shows the e ect of statically varying the history length on the misprediction rate for all SPECint95 benchmarks using a gskew predictor indexed with 9, 11, 13 and 15 bits. Figure 18 shows the misprediction rate of all SPECint95 codes using dhlf-gskew with a step size of 16K compared to the misprediction range of gskew with all possible static, xed history lengths (no context switches). Figure 19 shows the misprediction rate of all SPECint95 codes using dhlf-gskew with a step size of 16K and context switches simulated every 70K branches, compared to the misprediction range of gskew with all possible xed history length under same conditions.

22

0.20

Index length

0.16

Misprediction rate

0.20

compress 9 11 13 15

go

0.20

0.16

0.26

0.16

0.12

0.22

0.12

0.08

0.08

0.18

0.08

0.04

0.04

0.14

0.04

0.12

0.00

0.00 0

3

6

9

12

15

0.10 0

2

4

History length 0.20

Misprediction rate

0.30

gcc

li

6

8 10 12 14 16

0.00 0

2

4

History length 0.20

6

8 10 12 14 16

0

0.20

m88ksim

perl

0.20

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00 2

4

6

8 10 12 14 16

History length

0.00 0

2

4

6

4

8 10 12 14 16

History length

6

8 10 12 14 16

History length

0.16

0

2

History length

0.16

0.00

ijpeg

vortex

0.00 0

2

4

6

8 10 12 14 16

History length

0

2

4

6

8 10 12 14 16

History length

Figure 17: E ect of history length on the misprediction rate for all SPECint95 benchmarks using a gskew predictor indexed with 9, 11, 13 and 15 bits.

23

0.20

compress

0.16

0.20

go

0.30

ijpeg

0.16

m

ax

0.26

0.08

av g

0.12

in

0.12

Step value 16K

0.04

0.20

12 14 Index length

0.04

16

0.04

0.14 0.10 10

0.20

li

0.08

0.18

0.00 10

0.12

0.22

0.08

0.00

Misprediction rate

gcc

0.16

m

Misprediction rate

0.20

12 14 Index length

16

0.00 10

0.20

m88ksim

12 14 Index length

16

10

0.20

perl

0.16

0.16

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00

0.00 10

12 14 Index length

16

0.00 10

12 14 Index length

16

12 14 Index length

16

vortex

0.00 10

12 14 Index length

16

10

12 14 Index length

16

Figure 18: Misprediction rate of all SPECint95 codes using dhlf-gskew with a step size of 16K compared to the misprediction range of gskew with all possible xed history lengths (no context switches).

24

0.20

compress

0.16

0.20

go

0.30

ijpeg

0.16

m

ax

0.26

0.08

av g

0.12

in

0.12

Step value 16K

0.04

0.20

12 14 Index length

0.04

16

0.04

0.14 0.10 10

0.20

li

0.08

0.18

0.00 10

0.12

0.22

0.08

0.00

Misprediction rate

gcc

0.16

m

Misprediction rate

0.20

12 14 Index length

16

0.00 10

0.20

m88ksim

12 14 Index length

16

10

0.20

perl

0.16

0.16

0.16

0.16

0.12

0.12

0.12

0.12

0.08

0.08

0.08

0.08

0.04

0.04

0.04

0.04

0.00

0.00 10

12 14 Index length

16

0.00 10

12 14 Index length

16

12 14 Index length

16

vortex

0.00 10

12 14 Index length

16

10

12 14 Index length

16

Figure 19: Misprediction rate of all SPECint95 codes using dhlf-gskew with a step size of 16K and context switches every 70K branches, compared to the misprediction range of gskew with all possible xed history lengths under same conditions.

25

Suggest Documents