Jan 24, 2007 - Similar requesting rate for each master is assumed. â¡ What if the requesting rates are not similar? â« e.g., 3 masters with the same tickets ...
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses Bu-Ching Lin, Geeng-Wei Lee, Juinn-Dar Huang, Jing-Yang Jou Department of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan
2007/01/24
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Outline Introduction Previous Works
fixed priority time division multiple access (TDMA) Lottery RT_lottery
Proposed Arbitration Architecture Experimental Results Conclusions
2
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Introduction(1/2) Shared bus is widely used in current SoC designs master – initiate communication transactions slave – respond to transactions initiated by masters arbiter – manage the usage of bus
3
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Introduction(2/2) Requirements in different applications complete transactions of all requests before the corresponding deadlines in real-time applications take at least a fixed fraction of total bandwidth in multimedia applications
Difficult to satisfy both real-time and bandwidth requirements simultaneously an innovative arbitration algorithm is required
4
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Previous Works Existing arbitration algorithms
fixed priority time division multiple access (TDMA) Lottery RT_lottery
5
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Fixed Priority Among the requesting masters, the one with the highest priority gets granted Pros simple, low hardware cost and easy to implement
Cons starvation problem – the masters with lower priority hardly get the service lack of control over real-time and bandwidth requirements 6
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
TDMA (1/2) Execution time is divided into time slots which are statically assigned to masters 2nd-level of arbitration is usually adopted to alleviate the wasted time slots
7
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
TDMA (2/2) Pros deterministic worst-case response latency reserved bandwidth for each master
Cons difficult to design time slot sequences in an unpredictable system more slots Î more bandwidth and shorter latency what if a master with LOW bandwidth requirement but needs SHORT response latency?
8
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Lottery Arbiter grants a master stochastically from contending requests i.e., a weighted random mechanism
Tickets (weight)
Request Map
1
M1
2
M2
3
M3
T[ 0 ] = M1 T[ 1 ] = M3
Y
T[ 2 ] = M3 N Y
Rand[ 0, 8 ) = 5
Lottery Manager
T[ 3 ] = M3 T[ 4 ] = M4 T[ 5 ] = M4
Grant M4
T[ 6 ] = M4 4
M4
Y
T[ 7 ] = M4 T[ 8 ] = XX T[ 9 ] = XX
1 + 3 + 4 =8 9
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Why Lottery is not suitable in SoC ? Similar requesting rate for each master is assumed What if the requesting rates are not similar? e.g., 3 masters with the same tickets assignment, one with the requesting
rate varies from 80% to 40%, the rate of the other 2 is fixed at 40% similar requesting rate
Requesting Rate of M1
80% 70% 60% 50% 40% diverse requesting rate
Ö ticket ratio ≠ bandwidth ratio Ö weight tuning is required
M1
49% 46% 42% 38% 33%
M2
24% 26% 28% 30% 33%
M3
24% 26% 28% 30% 33%
10
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Latency and Bandwidth in Lottery Response latency and bandwidth allocation both controlled by the number of tickets e.g., 3 masters have similar traffic behaviors 50 45
5
40 bandwidth (%)
average latency (cycles)
6
4 3 2
35 30 25 20 15 10
1
5
0
0
1:1:1
2:1:1
3:1:1
4:1:1
ticket ratio
more tickets
5:1:1 10:1:1
1:1:1
2:1:1
3:1:1
4:1:1
5:1:1
10:1:1
ticket ratio
Ö shorter response latency Ö more bandwidth allocation 11
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Summary of Lottery Pros good control over bandwidth allocation in network switching applications fair average response latency
Cons no hard real-time consideration no independent controllability over response latency and bandwidth allocation
12
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
RT_lottery A 2-level arbitration algorithm dealing with real-time and bandwidth requirements simultaneously The proposed architecture 1st level – real-time handler handles the hard real-time requirements 2nd level – Lottery with tuned weight reserves the bandwidth allocation for each master
13
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Real-Time Handler Similar to earliest deadline first scheduling (EDF) the request with earliest deadline and below the warning line gets granted
Deadline the time limit for a master to complete a request missing the deadline is regarded as the real-time violation
Warning line the worst case of scheduling the contending requests
14
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Weight Tuning A ticket redistribution mechanism to meet the required bandwidth by simulation M2 M3
M1
M4
M5
Initially, Td=Tm/2 If any master in Smore or Smet -> Sless
Td = Td / 2
Tm’=Tm-Td
Tl’=Tl+Td
Tm M3
Mmost
M1
M2
M5
Smore
Sless
Tl Mleast
M2
Smore
M5
Sless
M4
M4
Smeet
Smeet 15
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Fail Case of Weight Tuning Fail to meet bandwidth requirements due to diverse masters’ requesting rate e.g., the requesting rate of each master is 80%, 30%, 30%,
respectively and each of them requires at least 30% bandwidth
over-allocated bandwidth even with few tickets Ticket Assignment of M1, M2 and M3 100:100:100 60:120:120 44:128:128 34:133:133 28:136:136 16:142:142 8:146:146 2:149:149 M1 M2 M3
60 19 19
58 20 20
57 21 21
56 21 21
Ö weight tuning is not a panacea !
55 21 22
55 22 22
54 22 22
53 23 23 16
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Summary of Previous Works Pros fixed priority
simplicity area efficiency
Cons no real-time consideration no means for bandwidth control
TDMA
deterministic worst-case latency no hard real-time guarantee reserved bandwidth allocation no precise bandwidth control
Lottery
reserved bandwidth allocation fair average latency
no real-time consideration no precise bandwidth control
hard real-time guarantee
limitation of Weight Tuning
RT_lottery
17
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
RB_lottery Architecture 3-level arbitration algorithm real-time handler – handles the hard real-time requirements Lottery with tuned weight – reserves the bandwidth allocation for each master bandwidth regulator –provides fine-grained control over bandwidth allocation
18
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
An Example of RB_lottery Bandwidth regulator monitors the bus traffic record the transactions of each master temporarily block the requests from masters that have already got the required bandwidth in a period Tickets (weight)
1
3 4
Block signals Grant
M1 M2 M3 M4
Y N Y Y
1
Real-Time Handler
2
Request map
0 3 4
1 1
T[ 0 ] = M1
0
1
3
0
0
Bandwidth Regulator
T[ 1 ] = M3 T[ 2 ] = M3
1
Lottery Manager
Rand[ 0, 4 ) = 2
T[ 3 ] = M3 T[ 4 ] = XX
Grant M3
T[ 5 ] = XX T[ 6 ] = XX T[ 7 ] = XX T[ 8 ] = XX T[ 9 ] = XX
19
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
The Implementation Observation window (w) — the execution time is divided into windows of size w cycles block the requests of over-served masters temporarily
Bandwidth register — the allocated bandwidth in the current window required bandwidth = 30% allocated bandwidth
30%
w
30%
w
30%
w
30%
w
execution time 20
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
RB_lottery Algorithm Flow Start
6
1
Proceed to the next window?
Yes
Release all the blocked masters
No
1. 2.
2
Real-Time Handler
3. 3
Block the request of current granted master
5
No
Yes Meet requirement?
Any master gets granted?
4
No
Yes
Record the transaction amounts of the current granted master
Lottery Manager
4. 5. 6.
check whether a new window starts grant the most urgent master stochastically grant an unblocked master record the transaction cycles check the allocated bandwidth reset all the blocked signals and proceed to the next window 21
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Experimental Environment Transaction level model in SystemC different arbitration algorithms are used in the experiments, such as fixed priority, lottery, TDMA + Lottery, RT_lottery, RB_lottery
22
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
3 Types of Masters Example : beat = 4, interval = 15, Rcycles = 10
D type (D for Dependency) request
grant
finish
2
5
9
request
24
cycle
D_R type (D for Dependency, R for Real-time) request
grant
finish
2
5
9
request
grant
finish
2
5
9
request
24
12
cycle
ND_R type (ND for No Dependency, R for Real-time) request
12
17
cycle 23
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Experiment Setup Behavior of masters type
beat/prob.
interval/prob.
Master 1
D
8/50
16/50
6/10
7/20
8/40
9/20
10/10
Master 2
D
1/50
4/50
10/10
11/20
12/40
13/20
14/10
Master 3
D
8/50
16/50
6/10
7/20
8/40
9/20
10/10
Master 4
D
1/50
4/50
10/10
11/20
12/40
13/20
14/10
Master 5
D_R
8/50
16/50
10/10
11/20
12/40
13/20
14/10
Master 6
D_R
1/50
4/50
10/10
11/20
12/40
13/20
14/10
Master 7
ND_R
8/50
16/50
65/10
66/20
67/40
68/20
69/10
Master 8
ND_R
1/50
4/50
85/10
86/20
87/40
88/20
89/10
Heavy-Traffic
Light-Traffic
4 D type masters, 2 D_R type masters and 2 ND_R type masters in the simulation system Half of masters are heavy-traffic 24
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Performance Comparisons (1/2) Fail cases of different arbitration algorithms 100 random required-bandwidth combinations for each workload 102400 simulation cycles for each combination Workload (%)
Fixed Priority
Lottery
60 65 70 75 80 85 90 95
100 100 100 100 100 100 100 100
100 100 100 100 100 100 100 100
TDMA + Lottery 95 98 100 100 100 100 100 100
RT _lottery
RB _lottery
0 0 0 10 18 37 55 74
0 0 0 0 0 1 12 44 25
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Performance Comparisons (2/2) Hardware comparisons
Gate counts
Fixed Priority
Lottery
215
4296
TDMA + Lottery 4917
RT _lottery
RB _lottery
5134
5814
Summary real-time capability
bandwidth capability
fixed priority
no consideration
poor
Lottery
no consideration
good but weight tuning is required
TDMA+Lottery no guarantee RT_lottery
always hold
RB_lottery
always hold
good only in low loaded bus (workload < 60%) good but still fails in high loaded bus (workload < 75%) good even in extremely high loaded bus 26
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Observation Window Comparisons Fail cases in different size of observation window of RB_lottery
100 random required-bandwidth combinations for each workload 102400 simulation cycles for each combination the size of observation window ranges from 128 to 2048
4 11 25 37 42 57
1 1 11 25 31 44
0 0 4 10 24 33
0 0 2 7 20 32
0 0 0 7 14 28
40 30 20 95 93 91 89 87 85
10 0 20 48
512 1024 2048
10 24
256
51 2
128
50
25 6
85 87 89 91 93 95
The size of observation window
12 8
Workload (%)
fail pattern number(s)
60
observation window
workload(%) 27
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Conclusions RB_lottery is proposed to provide hard real-time guarantee fine-grained bandwidth control
The observation window in the bandwidth regulator the larger size of observation window, the better controllability over bandwidth requirements
28
A Precise Bandwidth Control Arbitration Algorithm for Hard Real-Time SoC Buses
Thank you!
29