Merciful whose blessings provided me the ability to finish my master thesis. I am very thankful ... their prays to our Allah to help me. ..... from the East port then the.
CONGESTION MITIGATION USING FLEXIBLE ROUTER ARCHITECTURE FOR NETWORK-ON-CHIP A THESIS Submitted to the Graduate School of Electronics, Communication and Computing,
Egypt-Japan University of Science and Technology (E-JUST) In Partial Fulfillment of the Requirements for the Degree
of Master of Science
in Electronics and Communications Engineering By Mostafa Said Sayed February 2012
Congestion Mitigation Using Flexible Router Architecture for Network-on-Chip Presented by Mostafa Said Sayed For The Degree of Master of Science in Electronics and Communications Engineering Supervision Committee:
Signature
Prof. Mohamed El-Sayed Ragab Dean of School of Electronics, Communications and Computer, E-JUST, Egypt Dr. Victor Goulart Associate professor, Kyushu University, Japan
………………
Dr. Ahmed El-Mahdy Assistant professor, E-JUST, Egypt
………………
Examination Committee:
………………
Approved
Prof. Mohamed El-Sayed Ragab Dean of School of Electronics, Communications and Computer, E-JUST, Egypt Prof. Mohamed Rizk Alexandria University, Egypt
………………
Prof. Amin Shoukry Chairperson, Computer Science Engineering Department, E-JUST, Egypt
………………
Prof. Hossam Shalaby Chairperson, Electronics and Communication Engineering Department, E-JUST, Egypt
………………
Dr. Masoud El Ghoneimy Associate Professor , Department of Electronics and Communications Engineering, E-JUST, Egypt
………………
Vice President for Education and Academic Affairs Prof.: Ahmed Abo Ismail
………………
SUMMARY Congestion problem is one of the main topics in Networks-on-Chips (NoC) and it is heavily treated in literature. Some techniques tend to increase the amount of buffering (buffer depth or the number of virtual channels) available; however this increase in buffers increases the power and area overhead. We introduce a new Flexible Router architecture that can improve the performance of the overall network using the same amount of buffering available but in an efficient way. Therefore there is no need to increase the size of buffers or to use extra virtual channels. The idea is simple; if there is a request to a busy buffer the router will store the incoming packet in any other suitable free buffer in the router. Although such kind of flexibility improves the performance of the NoC, some problems that might arise due to flexibility could decrease such an improvement. The first problem we face is deadlock but we could prevent it by making an analogy between the buffers of the Flexible Router and the buffers of the standard Base Router working under some deadlock-free routing algorithms in terms of what packet directions that can each buffer can have. Another kind of deadlock that we called buffer-to-buffer (b2b) deadlock could also occurred. Such kind of deadlock is discussed in details and we show how to recover from it when it occurs. Flexibility also leads to a small side-effect which is receiving some packets out of order. Such side-effect is also addressed and it is found to be limited and small because it needs some conditions to occur. Finally, discussion about area overhead over the standard Base Router is also presented and it is found to be reasonable.
i
ACKNOWLEDGEMENT First of all I am thankful to Almighty Allah, the Most Beneficent, and the Most Merciful whose blessings provided me the ability to finish my master thesis. I am very thankful also to my parents for their motivation during my difficult hours and their prays to our Allah to help me. I would like to thank my supervisor Professor Victor Goulart, who helped and guided me through my work in master. He was patient, kind in his advises, and always cares about my work. Really his favors cannot be forgotten. All thanks to Professor Mohamed Ragab, the dean of electronics, communication and computer engineering school. This man treats us as a father and he always looks for our benefit and many problems I face have been solved by him. Last but not least, many thanks to my wife who stands beside me during all my problems and she always sacrifices her own happiness for my own happiness.
ii
TABLE OF CONTENTS ………………………………………………………………….
i
ACKNOWLEDGEMENT
……………………………………..……………..
ii
TABLE OF CONTENTS
................................................................................
iii
SUMMARY
LIST OF TABLES
…………………………...……………………………. iv
LIST OF FIGURES
………………………………………………………… v
ABBREVIATIONS
………………………………………………………… vii
1. INTRODUCTION ………………………………………………………… 1 1.1. Motivation
……………………………………………………………...
1.2. Structure of the thesis
1
…………………………………………………..
4 …...……………………… 5
2. BACKGROUND AND RELATED WORKS
2.1. The emergence of NoC communication architecture
…………………
5
2.2. Background topics ……………………………………………………..
14
2.3. Related works
17
………………………………………………………...
……… 21 …………………… 21
3. THE PROPOSED FLEXIBLE ROUTER ARCHITECTURE 3.1. Architecture components and Signals' descriptions 3.2. Basic operation of the Flexible Router 3.3. Deadlock problem
………………………………....
23
……………………………………………………
24
3.4. Buffer-to-buffer deadlock design problem
……….……………………
28
3.5. Out of order received packets side-effect
…………………….………
29
……...…………………
31
………………………………………………………...
31
4. SIMULATION PLATFORM AND RESULTS 4.1. Simulation setup
4.2. Simulation results and analysis
………………………………………..
32
4.3. Area overhead of the Flexible Router
………………………………..
35
4.4. Out of order received packets study
………………………………..
36
5. CONCLUSION AND FUTURE WORK
………………………………..
38
5.1. Conclusion 5.2. Future work REFERENCES
…………………………………………………………... 38 …………………………………………………………... 39
……………………………………………………………….. 41
ARABIC COVER PAGE
……………………………………………………… 7
ARABIC SUMMARY
………….…………………………………………... 9
iii
LIST OF TABLES Chapter 2. Table I
Signals’ functions and their connections in the base router.
11
Chapter 3. Table I
Signals’ functions and their connections in the Flexible router.
23
Chapter 4. Table I
Summary of the results.
34
Chapter 4. Table II
Effect of buffer-to-buffer problem on both XYFR and WFFR.
35
Chapter 4. Table III
Area comparison.
35
iv
LIST OF FIGURES Chapter 1. Figure 1
Example showing how congestion occurs.
1
Chapter 1. Figure 2
Increasing FIFO depth leads to decrease blocking.
2
Chapter 1. Figure 3
2
Chapter 1. Figure 4
Example shows a situation that increasing FIFO depth is not sufficient to decrease congestion. VCs solution for congestion.
Chapter 1. Figure 5
The direct relation between router area and FIFO depth.
3
Chapter 1. Figure 6
FIFO area (highlighted) within router area.
3
Chapter 1. Figure 7
Flexible Router idea.
4
Chapter 2. Figure 1
5
Chapter 2. Figure 2
The increasing number of components per chip every two years. Point-to-point communication architecture.
Chapter 2. Figure 3
Bus communication architecture.
7
Chapter 2. Figure 4
NoC communication architecture.
8
Chapter 2. Figure 5
Basic router model of NoC.
9
Chapter 2. Figure 6
Base Router architecture.
10
Chapter 2. Figure 7
Input port of the Base Router.
11
Chapter 2. Figure 8
Output port of the Base Router.
11
Chapter 2. Figure 9
13
Chapter 2. Figure 10
Internal connection example between the East input and output ports. External connection of the Base Router.
Chapter 2. Figure 11
Deadlock between four routers.
15
Chapter 2. Figure 12
Possible turns in XY routing algorithm.
16
Chapter 2. Figure 13
Possible turns in WF routing algorithm.
17
Chapter 2. Figure 14
Possible traffic situations and the limitations of static organization of VCs. Flexible Router architecture.
20
Chapter 3. Figure 1 Chapter 3. Figure 2
2
6
13
22
Chapter 3. Figure 3
The East input port as an example of the input ports of the 22 Flexible Router. Some Deadlock examples that might arise in full Flexible 25
Chapter 3. Figure 4
Router. Possible turns in XY routing algorithm.
26
Chapter 3. Figure 5
Possible packet directions in the Flexible Router.
27
v
Chapter 3. Figure 6
Possible packet directions in WFFR based on WF routing. 27
Chapter 3. Figure 7
B2b deadlock problem.
28
Chapter 3. Figure 8
30
Chapter 4. Figure 2
Example of two successive packets sent from the same source and going to the same destination but stored in different FIFOs in some router along their path. Performance characteristics comparisons between XYFR, WFFR, and Base Routers. Lagging distance.
Chapter 4. Figure 3
Lagging distance at the edge injection rate in XYFR.
37
Chapter 4. Figure 1
vi
33 36
ABBREVIATIONS AMBA
Advanced Microcontroller Bus Architecture
E
East
FF
Flip Flop
FFC
FIFO Flexibility Controller
FIFO
First-In-First-Out
HOL
Head of Line
HS
Hotspot traffic pattern
IPC
Intellectual Property Core
IP
Internet Protocol
ITRS
International Technology Roadmap for Semiconductors
L
Local
LUT
Look-up Table
N
North
NI
Network Interface
NN
Nearest-Neighbor traffic pattern
NoC
Network-on-Chip
OCP
Open Core Protocol
PE
Processor Element
pkt
packet
req
request
S
South
SAF
Store-and-Forward
SoC
System-on-Chip
UBS
Unified Buffer Structure
UNI
Uniform traffic pattern
VC
Virtual Channel
ViChaR
Virtual Channel Regulator
W
West
WF
West-First routing algorithm
WFFR
West-First-Based Flexible Router
XY
XY routing algorithm
XYFR
XY-Based Flexible Router
vii
CHAPTER 1 INTRODUCTION 1.1. Motivation Congestion is one of the main challenges in Network-on-Chip. Congestion occurs when there are multiple requests or a contention to the same resource or in general if the number of requests is more than the number of resources. In networks, this kind of problems usually occurs if multiple packets inside one router are requesting the same output port, as shown in Fig. 1. In this figure packets P1, P2, and P3 are requesting the East output port of router R1. As there is only one output port, R1 will chose only one of these requests to use the port and the other will be blocked increasing the delay of the blocked packets. This situation will be more problematic if the requested buffer was busy. For example if the selected packet is P1 so it will be blocked also. Now all packets are blocked; one of them because the requested buffer was busy and the others because of contention. As observed, a blocking occurred to one packet may results in other blockings and this increases the delay of all affected packets due to this situation and may be extended to other packets in other routers as well (back-pressure [1]) causing much more congestion all over the network [2]. Congestion generally degrades NoC performance because it increases the overall average delay of packets and decreases the average throughput of processing elements (PEs) which is the average number of packets received per cycle.
P2
R1
P1
P1 is Blocked
Busy buffer
R2
P3
P2 and P3 are affected by the blocking (back-pressure) Fig. 1 Example showing how congestion occurs.
To solve this problem, two basic ideas are presented; First, by increasing the depth of the buffers used or by implementing VCs in the router.
1
In Fig. 2 the first solution is presented. Now the requested buffer can have more than one packet then all packets can be transferred one after another (e.g. P1 then P2 then P3). Although we could transfer all packets, in some situation this solution is not enough. For example (Fig. 3) suppose that P1 destination FIFO buffer is B1 of R1 and both P2 and P3 destinations FIFOs are B2 of R2 and B3 of R3, and suppose that B1 was complete busy (full) while B2 and B3 was not full, hence packets P2 and P3 cannot move to their destinations as they are blocked because P1 is blocked.
R P3 P2 P1
Fig. 2 Increasing FIFO depth leads to decrease blocking.
This situation can be solved by implementing virtual channels (VCs). Instead of increasing the FIFO depth as the previous solution does, we will add three extra buffers separately and then the router will treat them as they were four channels (a channel is the links between two neighbor routers to transfer a packet from one of them to the input buffer of the other one). These new four channels will use the same links so they are not completely physical channels (links + buffer) so the name virtual channels. Now P2 and P3 will move to their next step in R2 and R3 even if P1 is blocked (Fig. 4). R2
R2
Free FIFO
Free FIFO
B2
R
R1
P1
Full FIFO
P3 P2 P1
P3
B1
B2
R1 R
Full FIFO B1
P2
Free FIFO
Free FIFO
B3
R3
B3
R3
Fig. 3 Example shows a situation that increasing FIFO depth is not sufficient to decrease congestion.
Fig. 4 VCs solution for congestion.
2
The main two problems of those solutions are the area and power overhead, buffers occupy most of the router area [3,4] and consume a significant portion of the router power [5,6,7,8]. Actually, NoC routers are more resource constraint than those of computer networks, particularly in control complexity and in the amount of memory used, and hence they should be designed carefully possibly to avoid area or power overhead. In [3,4] it is observed that the area of the router increases linearly with buffer size. As shown in Fig. 5 if the FIFO depth increases from 2 to 3, the area of the router increases by 30% [4]. Fig. 6 also shows how the area of buffers consume large portion of the NoC. Concerning power consumption, FIFO buffers consume alone about one fifth of the total power consumed within the router [5]. ITRS1 predicts that future generations of high VLSI designs will operate in 10-20 GHZ range with the communication between cores in Gbit/s [9]. This requires router designs to work within a tight power budget.
Fig. 5 The direct relation between router area and FIFO depth (reprinted from [4]).
Fig. 6 FIFO area (highlighted) within router area (reprinted from [4]).
Although the impact of buffers on area and power was large, the improvement in performance was less degree because the use of buffers is not efficient. In [2] it was stated 1
ITRS : International Technology Roadmap for Semiconductors is an international body for guiding the semiconductor industry.
3
that for some topologies the efficiency in using buffer is very low; it was 20% for tree topology while only 10% for ring topology. Our proposed Flexible Router architecture avoids increasing FIFO depths or adding VCs to improve performance but instead it uses the available buffers of the router in an efficient way to increase the overall performance. As shown in Fig. 7, if there is a request to a busy buffer; the Flexible Router will store the incoming packet in any other suitable free buffer instead of waiting the buffer to be free. By applying such idea, we could improve the performance of the NoC by improving the efficiency of using the available buffers. The idea will be explained in more details in the incoming chapters.
1.2. Structure of the thesis The structure of this thesis is organized as follows: chapter 2 is the background and related works; Some NoC background and definitions are explained besides the most
Busy buffer
Request to a busy buffer
Busy buffer
Flexible Router
Using another free buffer in the router.
free buffer
Free buffer
Fig. 7 Flexible Router idea.
related works we have studied. In chapter 3 the proposed Flexible Router idea is explained together with the main architecture of the Flexible Router in two versions. Then some related problems arise due to flexibility like deadlock and how to prevent it is explained in details together with the main side-effects of flexibility. Finally in chapter 4, a comparison in terms of delay and throughput has been made between the Flexible Router with its two versions and the Base Router under four traffic patterns; Hotspot, uniform, NearestNeighbor, and Transpose II. The analysis of results is then explained. The area overhead is addressed also in this chapter besides the study of the out of order packets side-effects. Finally in chapter 5, conclusion and future work are stated.
4
CHAPTER 2 BACKGROUND AND RELATED WORKS 2.1. The emergence of NoC communication architecture Moore’s stated in [10] that the number of components that can be packed on a single chip is doubling every two years (Fig. 1). This vision concluded because technology everyday shrinks the size of the transistors more and more, so the number of transistors that can be packed on a single chip continues to increase, therefore more components can be integrated on a single chip. This advancement leads us to integrate a complete system on a single chip (SoC). SoC often consists of a heterogeneous mix of components that include one or more processing cores. SoC industry now is moving from single-core to multi-core and eventually to many-core architectures, containing tens to hundreds of cores on a single chip [11]. This highly integration of so many cores pushes designers to find suitable communication architectures ensuring high system throughput, light weight
Log2 of the Number of Components Per Integrated Function
communication, low power consumption and modularity.
Year Fig. 1 The increasing number of components per chip every two years (reprinted from [10]).
5
2.1.1. Common communication architectures Many different architecture approaches have been proposed to meet the challenges of complex SoC designs [12]. Some of the more common SoC architecture approaches and their respective trade-off can be categorized as follows: Point-to-point communication architecture This architecture shown in Fig. 2 is optimal in terms of bandwidth, latency, and power consumption. However it is not modular; if the application changes, it has to be redesigned. Another important disadvantage is that the number of links increases exponentially as the number of cores increases, so the area overhead becomes large as well as the problem of routing all these links becomes harder. This kind of communication architecture is only viable for smaller numbers of cores, but as the number of cores becomes larger, more design time and effort will be needed. As a result, the productivity will be decreased.
IPC DSP
Keybad Memory
Fig. 2 Point-to-point communication architecture (IPC is for Intellectual Property Core ).
Bus-based architecture Bus based communication architecture shown in Fig. 3, where all communications between cores are done through one shared bus. From the design point of view, bus architecture lends itself to be modular design approach, therefore design time decreased significantly and flexibility increased. On the other hand, buses have a lot of constraints in terms
of
clock
synchronization,
energy
consumption,
and
scalability.
Clock
synchronization is very difficult in deep-submicron technologies. As the process technology improves, wire delay becomes obvious. Bus-based communication structures in SoCs that are larger than 10 x 10 mm and operate at several hundreds of megahertz have
6
tight timing constraints, slow wires, and require tight clock skew control [13]. For energy consumption problem, in deep submicron technology, the energy consumption of wire increases quickly while its length increases [14,15]. Also as more units become connected to the bus, the power usage per communication event grows as well due to more attached units leading to higher capacitive load. In some cases, the energy consumed by communication has reached 40%-50% of the total energy consumption [16,17]. Scalability problem is the most important problem in buses because the use of single shared bus places a limit on the achievable system throughput and can significantly increase the worst case latency (number of cycles taken by the packet or flit to traverse the network from the source to the destination). Having multiple cores connected the probability of blocking (the probability that the bus is in use) becomes higher and also the average time to take the control of the bus increases. Therefore bus becomes an even more severe bottleneck as the system increases in size [18].
IPC
IPC
DSP
Bus
PE
Memory
Fig. 3 Bus communication architecture.
Network-on-Chip Interconnect New structured communication architecture called Network-on-Chip that is shown in Fig. 4 has been emerged as a solution for SoC designs. The basic concept is to communicate across the chip in the same way that messages are transmitted over the internet. That is, puts a packet switching network on the chip and send message back and forth between blocks [19]. Keeping in mind that there is no need to have the internet’s complexity on the chip due to the simplicity of the on-chip network (short distance, lower number of nodes). For a chip, each IPC block would simply be assigned IP address, and
7
the packets would find their way through the network from source to destination through the routers. NoC has several advantages over point-to-point or bus architectures: Modularity: Any IP block can be modified or reused with few modifications to the network. Scalability: As the system becomes larger and more complex, only more routers and links need to be added; that will not degrade overall system performance because new added routers and links increases the number of available paths in the communication network. Hence, NoC keeps its high performance as the system becomes larger [2]. Mapping Solutions from Computer Networks: Some problems that occur in computer networks also occur in NoC. Since there are large number of solutions deal with these problems that are already proposed for computer networks, these solutions can be mapped to NoC with some simplification due to the simplicity of the NoCs [19]. Processing Element Router
Network interface
Fig. 4 NoC communication architecture.
2.1.2. NoC components NoC communication architecture consists of two main components; routers (sometimes it is called switches) and network interfaces (NIs). Network interfaces As the bus communication architecture is the dominant architecture in nowadays SoCs, so most of IPCs are designed to be adapted or matched to some well known bus access protocol like open core protocol (OCP) or Advanced Microcontroller Bus Architecture (AMBA). According to that, when such IPs are reused with NoCs there must be some interface to convert from bus protocol into network protocol; such interface is called Network Interface, as shown in Fig. 4.
8
From the previous discussion, the NI is responsible for converting the blocks of data originated from the IPs connected to it into data packets before transmitting them into the NoC, and also to convert the incoming data packets from the NoC into its original shape to be matched to the bus protocol used. Routers Routers are the basic components in NoC architectures. NoC routers are responsible for traversing packets through the network in order to lead them to their destinations. Traditionally, NoC router consists of input buffers (in some cases there will be input and output buffers) to store the incoming packets and a routing function to redirect those packets to appropriate output ports in order to transfer them to another router along their ways to their destinations. Also, some arbitration logic must be used in case there were multiple packets requests to the same output port at the same time in order to choose only one of them. Usually each router is connected to one PE through the ejection and injection channels as shown in the router model of Fig. 5. Injection Channel
Ejection Channel
LC
LC
LC
LC
Crossbar Switch
LC
LC
LC
LC
Output Channels
Input Channels
LC
LC
Routing and Arbitration
Fig. 5 Basic router model of NoC.
2.1.3. Base Router architecture To understand well the operation of the Flexible Router, we should first explain the Base Router architecture and its operation. As shown in Fig. 6, it consists of five input ports and five output ports; connected together using the intermediate crossbar, each input and output port is associated with a 9
specific direction: East (E), West (W), North (N), South (S), and Local (L). The local input and output ports are connected to the network interface which is connected to the processing element.
Flow_ctrl Data_E
Flow_ctrl Data_W
East East output output port port
East input port
West input port
Crossbar (5x5)
West output port
Flow_ctrl Data_E
Flow_ctrl Data_W
North input port
North output port
Flow_ctrl
South input port
South output port
Flow_ctrl
Data_S
Flow_ctrl Data_L
Local input port
Local output port
Flow_ctrl Data_L
Flow_ctrl Data_N
Flow_ctrl
Data_N
Data_S
Fig. 6 Base Router architecture.
The structure and the functionality of the input and output ports is explained as follows: Input port The input port consists of three main components (Fig. 7): Input controller: It is responsible for the inter-communication with the upstream router, the communication inside the input port itself with FIFO buffer and routing logic, and the internal communication inside the router with output ports. FIFO buffer: To store the incoming packets from the upstream router. Routing logic: To apply the routing algorithm to the header packet of the FIFO to determine a suitable output port according to the destination address within the packet. Output port As shown in Fig. 8, it also consists of three main components: Arbiter: Receives all incoming requests for the output port and then choose and grant one of them according to a pre-specified logic, there are many logic functions stated in details in [20], we chose Round-Robin because of its simplicity and fairness. Output controller: To handle the communication with the downstream router. MUX: Receives multiple packets from different input ports at its input and according to the selection arising from the arbiter one of them will be transferred to its output.
10
req_int_E req_int_W req_US
5
grant_US
5
Input controller
req_int grant_int
req_int_L grant_int_E grant_int_W
Routing logic
Arbiter
Output controller
req_DS grant_DS
grant_int_L 3 sel pkt_int_E pkt_int_W FIFO
pkt_US
pkt_int
pkt_DS
MUX
pkt_int_L
Fig. 7 Input port of the Base Router.
Fig. 8 Output port of the Base Router.
Signals' descriptions Table I explains the function and the connections of each signal in Figures (6,7,8). The internal connections are also presented in Fig. 9 and the external connections between two adjacent routers are presented in Fig. 10. TABLE I.
SIGNALS' FUNCTIONS AND THEIR CONNECTIONS IN THE BASE ROUTER
Signal name
Function
Connections
Flow_ctrl
Represents the communication
Between any two
signals between routers (req_US,
neighboring routers.
grant_US, req_DS, grant_DS). Data_E,W,N,S,L
Represent the data packets (or data
Data_E,W,N,S are
flits if wormhole switching is used)
between two neighboring
between routers.
routers, while Data_L is between the router and the PE connected to it.
req_US
grant_US
Request signal originated from the
This signal is the same as
upstream router to the downstream
req_DS originated from
router, requesting for transferring a
output port (i.e. they are
packet.
connected together).
Grant signal originated from the
This signal is the same as
downstream router to the upstream
the input signal grant_DS
router in response to the request sent
in the output port (i.e. they
from the upstream router.
are connected together).
11
pkt_US
Data packet originated from the
Also, this signal is the
upstream router.
same as the output signal pkt_DS in the output port (i.e. they are connected together).
req_int
grant_int
It is a five links signal; each link is
If this signal originates
connected to one of the five output
from the East port then the
ports (E,W,N,S,L). According to the
five links will be
routing logic one of these links will
connected to req_int_E in
be chosen to send a request signal to
all output ports of the
the output port connected to this
router and the same will be
link.
for all other input ports.
It is the response signal to the
If this signal originates
req_int. Also, it is a five links signal;
from the East port then the
each link is connected to one of the 5 five links will be output ports (E, W, N, S, L).
connected to grant_int_E in all output ports of the router and the same will be for all other input ports.
pkt_int
Carries the head packet of the FIFO
If this signal originates
and connected to all output ports.
from the East port then it will be connected to pkt_int_E in all output ports.
req_int_E,W,N,S,L
Request signals from East, West,
As discussed previously,
North, South, and Local input ports
any one of these signals is
respectively.
connected to one link of the req_int in the input port from which it is originated.
grant_int_E,W,N,S,L
Grant signals from any output port to Also, any one of these East, West, North, South, and Local
12
signals is connected to one
input ports respectively.
link of the grant_int in the input port from which the request is originated.
pkt_int_E_W_N_S_L
req_DS
Packets originated from East, West,
pkt_int_E is connected to
North, South, and Local
pkt_int in the East port and
respectively.
so does for the rest.
The request signal to the
As discussed previously,
downstream router.
this signal is connected to req_US in the downstream router (they are the same).
grant_DS
The grant signal from the
Also, this signal is
downstream router.
connected to grant_US in the downstream router (they are the same).
pkt_DS
The data packet to the downstream
Also, this signal is
router.
connected to pkt_US in the downstream router.
req_int (link 1) gnt_int (link 1)
req_int_E gnt_int_E
pkt_int
pkt_int_E
East input port
East output port
Other request and grant signal to and from other output ports The same packet to other output ports
Fig. 9 Internal connection example between the East input and output ports. Upstream router
Downstream router
req_DS gnt_DS
req_US gnt_US
Output port
Input port
pkt_DS
pkt_US
Fig. 10 External connection of the Base Router.
13
Basic operation of the Base Router When a request arises from one of the neighbor upstream routers, the input controller will check the FIFO buffer for available space. If the FIFO was full then the input controller will wait until at least one slot (one slot can carry one packet) becomes free. At this moment the controller will set the grant signal to the upstream router announcing that the packet is stored in the FIFO. After receiving the packet, the packet waits until it becomes the head of the FIFO and at this moment the destination address field in the packet is checked by the routing logic function to determine the appropriate output port. After determining a suitable output port, the input controller initiates a request to that specific output port and waits to be granted. Suppose that the output port has multiple requests from other input ports other than the request under consideration, so it will select one of them according to the arbitration logic used, so now the arbiter will select one of these requests and trigger the output controller to begin communication with the downstream router, the output controller in turn will initiate a request to the downstream router and waits to be granted. Upon receiving the grant signal, the arbiter raises the grant signal to the selected input port, so now this input port can check the head of its FIFO again and redo the previous steps to transfer another packet. Congestion problem As discussed, if there is a request at some input port and the FIFO of this port was full the request must wait until one or more slots become free in the FIFO, during this waiting period the request is blocked. As discussed in chapter 1 that such blocking may cause other blockings in the network (back-pressure) and the back-pressure in turn causes congestion which degrades the overall NoC performance by increasing the average delay and decrease throughput of the NoC.
2.2. Background topics 2.2.1. XY oblivious routing algorithm XY routing algorithm is one of the most common and simple routing algorithms, the algorithm is as follows: Xdef = Xdest - Xcur Ydef = Ydest - Ycur if (Xdef > 0) { send a request to the East output port }
14
else if (Xdef < 0) { send a request to the West output port } else if (Ydef > 0) { send a request to the North output port } else if (Ydef < 0) { send a request to the South output port } else { send a request to the Local port } // Xdest is the X coordinate of the destination router // Xcur is the X coordinate of the current router As shown the packet is always passing in the X direction until Xdef becomes zero then it is passing in the Y direction until Ydef becomes zero, and finally it will be directed to the destination Local port or PE. This algorithm does not take into consideration the state of the network in taking decisions and hence it is oblivious routing algorithm. 2.2.2. Deadlock Deadlock generally occurs if two or more actions are waiting to each other in a cyclic dependency manner therefore neither of them will take place. Fig. 11 shows a practical deadlock situation that could occur in NoC under full adaptive routing algorithms. In this example packet P1 occupies buffer B1 and it is going to East, but the problem is that it cannot transfer to buffer B2 because it is occupied by packet P2. The same situation is between (P2,B3), (P3,B4), and (P4,B1) therefore a complete dependency cycle takes place and neither of them can be moved.
B4
W
P4
P3
S
B1
B3
N
P1 P2
E
B2
Fig. 11 Deadlock between four routers.
2.2.3. Turn model To prevent deadlock some restrictions must be made to prevent complete dependency cycles between routers. This solution is based on turn model [21]; basically, turn model
15
used to determine whether the routing algorithm under consideration is deadlock free or not. For example, if we apply this model to XY routing, all what we should do is to determine all possible turns that could occur in XY routing. Possible turns in XY routing are from East to North or South (turns 1 and 8 of Fig. 12 respectively) and from West to North or South (turns 6 and 3 of Fig. 12 respectively). 4
3
8
7
1
2
5
6
Fig. 12 Possible turns in XY routing algorithm.
Now it’s clear that there are two turns forbidden in counterclockwise direction (turns 5 and 7 of Fig. 12), and another two turns in clockwise direction (turns 2 and 4 of Fig. 12) and hence complete cycles cannot occur in both directions making XY routing deadlock free algorithm. 2.2.4. Partially adaptive routing algorithms As mentioned in the previous section, to avoid deadlock there must be some forbidden turns. In XY routing there are two turns forbidden in counterclockwise and clockwise directions. One possible improvement that can be achieved is to prohibit one turn in each direction instead of two turns. Such change will increase the flexibility of the routing algorithm and make it more dynamic with the network state. According to the turn that will be prohibited a different routing algorithm will be created, all such combinations are stated in [21]. One of these created routing algorithms is called West-First (WF) routing. As shown in Fig. 13, in this algorithm turn 2 and turn 7 in clockwise and counterclockwise respectively are prohibited, and according to that the algorithm will be as follows: Xdef = Xdest - Xcur Ydef = Ydest - Ycur if (Xdef < 0) { send a request to the West output port } else if (Xdef > 0 and Ydef > 0) { if (West buffer of the East downstream router is free)
16
{ send a request to the East output port } else if (South buffer of the North downstream router is free) { send a request to the North output port } else { send a request the East or North output port } } else if (Xdef > 0 and Ydef < 0) { if (West buffer of the East downstream router is free) { send a request to the East output port } else if (North buffer of the South downstream router is free) { send a request to the South output port } else { send a request the East or North output port } } else if (Ydef > 0) { send a request to the North output port } else if (Ydef < 0) { send a request to the South output port } else { send a request to the Local port } // Xdest is the X coordinate of the destination router // Xcur is the X coordinate of the current router As stated in this algorithm if the destination is on the West or North West or South West the packet must go to West direction whatever the state of the network so there are no other choices, and hence the naming “West-First”. 4
3
8
7
1
2
5
6
Fig. 13 Possible turns in WF routing algorithm.
2.3. Related works Performance, power consumption and silicon area consumed by NoC architecture depend mainly on NoC routers (especially router buffers). Thus, the design of efficient, high performance routers represents a critical performance issue for the success of the NoC approach [22].
17
As discussed in the chapter 1 that router input buffers consume a significant portion of the silicon area; so it is important to reduce the number of buffers as well as their sizes. However the performance of the NoC is directly proportional to the number of buffers used per input port (VCs) and their sizes as well [4]. Such trade-off between performance and area has been studied in [4,23] where they tried to get higher performance by increasing the amount of buffering resources but only at the bottlenecks, i.e. input ports with higher congestion. Both [4,23] present static algorithms that identify the bottlenecks and then increase the buffer size [4], or the number of VCs [23] at these bottlenecks. In [4] an offline algorithm is presented that determines the probability of blocking of each input port buffer in the network and then chooses the one with highest probability of blocking and increase its length by one flit or packet (according to the switching technique used whether it is wormhole or Store-and-Forward respectively). After that the whole operation is repeated again with the new configuration data (because one buffer has changed so the configuration data may change as well) until the maximum allowed extra number of buffers to be added reached. In [23] a similar offline algorithm is also presented but now the increase will be in the number of VCs not in the buffer depth, also the criterion itself that used to increase the number of VCs is also changed. The criterion here is the bandwidth utilization of each input port. The bandwidth utilization for some input port i is determined by the following formula: 𝐵𝑊𝑢𝑠𝑎𝑔𝑒 =
𝐾𝜖 𝐿,𝑁,𝑆,𝐸,𝑊
𝜆𝑖𝑘
𝐵𝑊𝑖
Where λik is the average flow rate from input port i to output port k (note that λii = 0 for minimal routing algorithms), and BWi is the expected or average bandwidth available for input port i. This ratio determines to what extent each link uses its bandwidth; if it was ≤ 1 then the communication bandwidth required less than or equal the effective bandwidth and we do not need to add any extra VC, but if it was > 1 then this link is a communication bottleneck and we need to add other VCs to match the communication need through this link. The algorithm selects the link with the highest BWusage and assigns to it additional VC. The algorithm will iterate until the VC maximum budget reached.
18
Although both [4,23] got better performance with huge optimization in NoC buffers (sizes or numbers), the problem is that both of them uses static algorithms which give optimization for specific traffic pattern or application, so if the application changes the locations of bottlenecks could change as well and we will need to redo the operation again, and it is basically desirable to have a general purpose architecture that gives higher performance whatever the application is. In [24] this problem is studied in a different way, instead of adding more VCs or increasing the depth of some buffers at the bottlenecks they try to use the amount of buffering available in an efficient way. It was noticed that not only the amount of buffers that affects the performance of the network but also the organization of the network. For example, if the traffic load was low, a small number of VCs will suffice, and if the depth of each one was large they way be underutilized or used inefficiently. However for higher loads in wormhole switching increasing the number of VCs will be more effective than increasing the depth of each VC because of the head of line (HOL) blocking which occurs if the head flit of the packet in wormhole switching is blocked at some router and hence all the subsequent flits of the packets traversing the same path will be blocked; causing a train of blocked buffers in the network. According to this observation, fixed VCs organization will either be underutilized or underperformed under certain traffic conditions. Fig. 14 highlights these observations and the weakness of fixed static organization of VCs. These observations lead them to propose a new router architecture called Virtual Channel Regulator (ViChaR). ViChaR uses the concept of unified buffer structure (UBS) [25], which enables each input port to dynamically allocate the available VCs according to the incoming traffic, so it appears to have a variable number of available VCs. Therefore as the traffic increases the router will have more VCs with less depth and vice versa. Although such kind of flexibility increases the overall performance of the system, but still the implementation of VCs results in an area and possibly power overhead as discussed in chapter 1, besides the extra logic used to dynamically use them as ViChaR proposes. Finally, as stated this architecture is viable only if there are VCs implemented otherwise it is not applicable.
19
T D D
T D D
Efficient!
H VC0
H VC1
Remaining Slots are not utilized
VCv
T D D All packets are served
Two packets blocked due to shallow VCs
Inefficient!
T D D
Efficient!
H VC0 Both packets H VC1
accommodat ed by deep VCs
Buffer slots are fully utilized
T D D
Inefficient!
T D D H VC0
T D D H VC1
T D D H VC0 Two packets accomodated by deep VCs
T D D H VC1
T D D H Remaining packets blocked due to lack of VCs!
Buffer slots are fully utilized
H VCv
T D D H
(a) Light traffic Many/Shallow VCs
(b) Heavy traffic Many/Shallow VCs
(c) Light traffic Few/Deep VCs
(d) Heavy traffic Few/Deep VCs
Fig. 14 Possible traffic situations and the limitations of static organization of VCs.
20
CHAPTER 3 THE PROPOSED FLEXIBLE ROUTER ARCHITECTURE The goal of this work is to design a new router architecture that can improve performance avoiding the problems arises in the related works discussed previously, so we propose a new router architecture that gets the benefit of all router buffers whether there are VCs or not whatever the application is, so we can just use the available input buffers already exist but in an efficient and a flexible way. If there is a blocking to a request because the requested buffer is busy at any input port, the Flexible Router will try to allocate any suitable free buffer in other input ports in the router to store the incoming packet, thus the contention problem can be solved without waiting for the original requested busy input port buffer to be free, therefore we can avoid the main limitation of the Base Router and expect better performance. Although the Flexible Router seems to be similar to the ViChaR router, there are two main differences between them. First, ViChaR idea is applicable only if there are VCs implemented, while the Flexible Router idea can be applied if there are VCs or even there is just a single FIFO buffer per input port. Second, in ViChaR the contention can be solved using the VCs of the requested input port under consideration only, while the Flexible Router can use all the available free buffers or VCs of the router not only the buffers or VCs of the requested input port to solve the contention.
3.1. Architecture components and signals' descriptions As shown in Fig. 1, the design is similar to the Base Router except for some added functionality and modules to the input ports (The modified modules are shown with dashed frames and the added signals with dashed arrows). As shown there are also five input and output ports but now the input ports are different. 3.1.1. Input port Fig. 2 shows the East input port as an example. It consists of three basic modules: FIFO Flexibility Controller (FFC): This new added module is responsible for finding any suitable free FIFO in the router in order to store the incoming packet. It is also responsible for communicating with the output ports to transfer the received packets to their downstream routers according to the routing algorithm used.
21
FIFO buffer: The FIFO here can receive packets not only from the directly connected upstream router as the Base Router does, but also from other input ports as well. Routing logic: Apply the routing algorithm to determine the packet direction in order to choose the appropriate output port. Flow_ctrl Data_E Data_W Data_N Data_S Data_L
East output port
Data_E
West output port
Data_W
North input port
North output port
Data_N
South input port
South output port
Data_S
Local input port
Local output port
Data_L
East input port
Flow_ctrl Data_E
Crossbar (5x5)
West input port
Flow_ctrl
Flow_ctrl
Data_L
Flow_ctrl Data_E
Flow_ctrl
Data_L
Flow_ctrl Data_E
Flow_ctrl
Data_L
Flow_ctrl Data_E
Flow_ctrl
Data_L
Communication with all other router FIFOs
Fig. 1 Flexible Router architecture. req_US gnt_US req_FFCE_FIFOW req_FFCE_FIFON req_FFCE_FIFOS gnt_FFCE_FIFOW gnt_FFCE_FIFON gnt_FFCE_FIFOW
req_int_E
5
gnt_int_E
Routing logic
FIFO
pkt_int_E
req_FFCW_FIFOE req_FFCN_FIFOE req_FFCS_FIFOE gnt_FFCW_FIFOE gnt_FFCN_FIFOE gnt_FFCS_FIFOE
pkt_E pkt_W pkt_N pkt_S
FIFO flexibility controller (FFC)
5
Fig. 2 The East input port as an example of the input ports of the Flexible Router.
3.2.3. Output port It is exactly the same as the output port of the Base Router shown in Fig. 8. 3.2.4. Signals' descriptions Table I explains only the new added signals to the Flexible Router.
22
TABLE I.
SIGNALS’ FUNCTIONS IN THE FLEXIBLE ROUTER
Signal name
Function
Connections
req_FFCE_FIFOW,N,S
To send request signals to
Connected to FIFOW,
FIFOW, FIFON, and FIFOS
FIFON, and FIFOS
where FIFOE,W,N,S are the
respectively
East, West, North, South FIFOs respectively, in case of FIFOE has no free slots to store the incoming packet. grant_FFCE_FIFOW,N,S
The grant signals sent from
Originated from FIFOW,
FIFOW, FIFON, and FIFOS
FIFON, and FIFOS
respectively in response to
respectively.
req_FFCE_FIFOW,N,S. pkt_E,W,N,S
Data packets.
Originated from the East, West, North, and South neighboring upstream routers respectively.
req_FFCW,N,S_FIFOE
Request signals from all other Originated from FFCW, FFCs (West,North,South) of FFCN, and FFCS the router. respectively.
grant_FFCW,N,S_FIFOE
Grant signals to all other FFCs (West,North,South) of the router.
Connected to FFCW, FFCN, and FFCS respectively.
3.2. Basic operation of the Flexible Router Suppose that the East neighbor upstream router initiates a request (req_US) to the East input port of the router under consideration. The FFCE (the FFC of the East input port) will search first in FIFOE for at least one free slot, if there was, the FFCE will grant (grant_US) the request and the packet will be stored in FIFOE. If there were no free slots in FIFOE, the FCCE will search for a free slot in all other FIFOs (FIFOW, FIFON, FIFOS) in the router (including the FIFOE because although it has no free slots at this time but may be it will have while searching). Then the FFCE will
23
initiate requests (req_FFCE_FIFOW,N,S) to those FIFOs and wait to be granted by one of them. Once it finds a free slot it grants back the request and the packet (pkt_E) transferred from the East upstream router. After that, the destination address field in the packet is checked by the routing logic function to determine the appropriate output port. Then the FFC initiates a request and send the packet (pkt_int_E shown in Fig. 2) to that specific output port and waits to be granted. Suppose that the output port has multiple requests from other input ports other than the request under consideration, so it will select one of them according to the arbitration logic used, so now the arbiter will select one of these requests and trigger the output controller to initiate a request to the downstream router and waits to be granted. Upon receiving the grant signal, the packet is transferred to the downstream router and the selected input port can process another packet, and so that the operation will be repeated. Now, the difference between the Flexible and the Base Routers becomes clear in terms of how to deal with contention. The Base Router will wait until the requested FIFO to have at least one free slot even there was other FIFOs in the router have free slots, while the Flexible Router will search directly for free slots in other FIFOs to solve the contention problem.
3.3. Deadlock problem 3.3.1. Problem description During the implementation of this idea, the main problem was Deadlock; because the router was designed for the first time as a complete Flexible Router without any restrictions on where to store the incoming packets, so if a request at any input port arises then the packet can be stored anywhere at any buffer (East, West, North, South) without any restrictions. Although this led to very high performance, it was susceptible to deadlock. The simplest deadlock situation is just between two routers. For example, as shown in Fig. 3(a), if all packets stored in R1 are going to East and at the same moment all packets in R2 are going to West then deadlock occurs. Also deadlock could occur between four routers as shown in Fig. 3(b). This figure shows the normal situation of deadlock that was explained in section 1.6, here we found that there are two dead cycles in clockwise and in counterclockwise direction.
24
E
E
E
R1
E
W
E
W
W
R2
W
W
(a) Deadlock between two routers
E
E
S
R1
S
W
E
E
S
W
R2
S
W
N
N
R4
E
W
N
W
W
R3
N
N
(b) Deadlock between four routers Fig. 3 Some Deadlock examples that might arise in full Flexible Router, (a) just between two routers, (b) between four routers.
3.3.2. Deadlock solution in the Flexible Router under XY routing To avoid deadlock some restrictions must be made in terms of where the router can store incoming packets. These restrictions are created based on the direction of the incoming packets. The idea is explained in more details as follows: We made an analogy between the buffers of the Base Router and the buffers of the Flexible Router in terms of what packet directions can each buffer have in the Base Router working under XY routing algorithm. For example, the East buffer of the Base Router can contain packets directed to North or South or West or Local, now according to the analogy the East buffer of the Flexible Router will be designed to accept packets directed to the same directions (North or South or West or Local), and the same concept will be applied to the rest of the buffers. According to this analogy the following restrictions were implemented in the Flexible Router: North buffer: Cannot contain packets going to North, or East, or West. South buffer: Cannot contain packets going to South, or East, or West. East buffer: Cannot contain packets going to East. West buffer: Cannot contain packets going to West. 25
This solution is based on the turn model [21] in chapter 2. Some turns are prevented to prevent complete cycles and hence to prevent deadlock. As discussed in chapter 2 that the possible turns that could occur in a Base Router working under XY routing are from East to North or South (turns 1 and 8 of Fig. 4 respectively) and from West to North or South (turns 6 and 3 of Fig. 4 respectively). 4
3
8
7
1
2
5
6
Fig. 4 Possible turns in XY routing algorithm.
Now it’s clear that there are two turns forbidden in counterclockwise direction (turns 5 and 7 of Fig. 4), and another two turns in clockwise direction (turns 2 and 4 of Fig. 4) and hence complete cycles cannot occur in both directions making the Base Router working under XY routing deadlock free router. Theorem: The Flexible Router working under XY routing with the same Base Router buffers' restrictions while working under XY routing is a deadlock free router. Proof: By applying the turn model on the Flexible Router working under XY routing, the same steps will be followed, so all possible turns that can occur in the Flexible Router working under XY routing algorithm will be excluded. From the analogy with the Base Router under XY routing, possible packet directions that each buffer can store in the Flexible Router are like the following (Fig. 5): North buffer: It can contain packets directed to Local or South. For packets directed to Local port they reached their destination and will be absorbed directly with the local port. For packets with South direction this is not a turn because it is a vertical movement, therefore packets in the North buffer cannot make turns. South buffer: Can contain packets directed to Local or North. This will be the same as the North buffer, and hence there are no turns originated from this buffer. East buffer: Can contain packets directed to Local or North or South or West. Also here both Local and West do not make turns because the Local will be absorbed and the West direction is just a horizontal movement, but for North and South movements these directions represent two turns; turns number 1 and 8 respectively in Fig. 4. 26
West buffer: Can contain packets directed to Local or North or South or East. Using the same concept as for the East buffer, this buffer contributes with turns 6 and 3 in Fig. 4 for directions North and South respectively. According to the above discussion, the Flexible Router could avoid four turns; 2 and 4 in clockwise and 5 and 7 in counterclockwise in Fig. 4. Therefore the Flexible Router is a deadlock free router under XY routing. North buffer L,S West buffer L,N,S,E
East buffer L,N,S,W
South buffer L,N
Fig. 5 Possible packet directions in the Flexible Router.
3.3.3. XY and WF based Flexible Routers The Flexible Router discussed in the previous sections now is not completely flexible, i.e. not all free slots can receive the incoming packets. As such restrictions are based on the analogy with the Base Router working under XY routing we will call this Flexible Router as XY-based Flexible Router (XYFR). The flexibility of the Flexible Router can still be improved. As stated in the previous sections, XYFR is analogous to the Base Router working under XY routing, now we will propose another more flexible version of the Flexible Router based on the analogy with the Base Router working under WF routing algorithm discussed in chapter 2, we will call this router WF-based Flexible Router (WFFR). WFFR will have the same restrictions of the WF routing algorithm as shown in Fig. 6(a,b). North buffer L,S,E West buffer L,N,S,E
4
3
8
7
1
2
5
6
East buffer L,N,S,W
South buffer L,N,E
(a) Possible packet directions in WFFR.
(b) Possible turns in WF routing.
Fig. 6 Possible packet directions in WFFR based on WF routing.
27
As shown there is one more added direction to the North and South buffers, which is the East direction, now if the incoming packet will be directed to East it can be stored in North or South or West buffers not only the West buffer as XYFR does and this adds more flexibility to the WFFR We can prove that WFFR is also a deadlock free router under XY routing using the same steps shown in the previous section.
3.4. Buffer-to-buffer deadlock design problem Another kind of deadlock can be occurred. We call this kind buffer-to-buffer (b2b) deadlock to differentiate it from the previous discussed one. To explain the problem, suppose the example in Fig. 7(a), where R1 and R2 are two neighbor routers. Suppose also that packet P1 in FIFO B1 is requesting R2 and FFC2 chose B2 for it. At the same time packet P2 in FIFO B2 of R2 was requesting R1 and FFC1 accidently chose B1 for it. Both requests now entering an infinite loops waiting that will be broken only if the grant arises. According to our design, the FIFO responds to the requests in a sequential order like the following pseudo code: if (Req1 & not_full) { store the pkt and check full or not; } if (Req2 & not_full) { store the pkt and check full or not; } if (Req3 & not_full) { store the pkt and check full or not; } if (Req4 & not_full) { store the pkt and check full or not; }
R1 B1 P1 to R2
Req. from R1 to R2
R2 B2 P2 to R1
Req. from R2 to R1
(a) B1 x
B2
P1
P2
Both requests will wait infinitely to be granted The slot that become busy by another request while waiting the grant
(b) Fig. 7 B2b deadlock problem.
28
x
Suppose now that both B1 and B2 has just one free slot and both requests from P1 and P2 are in lower order than other requests from other ports and there are other requests to B1 and B2. According to that B1 and B2 will respond to other requests and both of them will become full and neither P1 request nor P2 request will be granted, and both of them will depend on each other exactly like normal deadlock situations. This is clear in Fig. 7(b), to store P1 in B2, P2 must go out of B2, and to make P2 to go out P1 must go out of B1 to a free slot in B2, so deadlock dependency is occurred. To recover from this problem the only way is to check the status of the FIFO while waiting the grant and if the FIFO becomes full the FFC will break the loop and search for a new FIFO like the following pseudo code. While (grant = 0) { If (FIFO becomes full) { Cancel the request Break the loop } }
3.5. Out of order received packets side-effect The main side effect of our proposed idea besides the extra logic in the input ports is that PEs can receive packets out of order. During the operation of the Base Router if a request arises at any input port there is only one choice to receive the packet which is the FIFO buffer of this input port, and hence we will be always sure that the packets originated from the same source and directed to the same destination will use the same FIFO buffers along their path to the destination and hence they will reach in order. But in the operation of the Flexible Router the situation is different, if the FIFO buffer of the input port was full the router can store the packet in another FIFO, and hence at some situations the router can contain two or more packets going to the same destination from the same source. Now under some conditions these packets can go out from the router not with the same order they received with and therefore they can reach their destination out of order.
29
For example shown in Fig. 8, if we have two packets, packet 1 and packet 2 generated in this order from the same source and going to the same destination, due to flexibility, these two packets can be in the same router at the same time but in different FIFOs. Suppose that when P1 reaches the Flexible Router while the East FIFO has just one free slot at that time, so P1 is stored directly but as a tail packet and the East FIFO becomes full. When P2 requests the East FIFO which is full now, the router will search for another FIFO to store P2 in it. Suppose that the North FIFO is completely free at this time so the router stores P2 in it but as a head packet and hence it will go out of the router before P1. It is worth to say that this side-effect is limited and it is found to be small as what will be explained in the simulation chapter 4. Head packet may go out early
P2 P1
Flexible Router
FIFO slots
Tail packet may go out late
CHAPTER 4
Fig. 8 Example of two successive packets sent from the same source and going to the same destination but stored in different FIFOs in some router along their path.
30
CHAPTER 4 SIMULATION PLATFORM AND RESULTS 4.1. Simulation setup The objective is to detect the changing in performance parameters by changing the injection rate. Performance parameters we used are the average delay and throughput. The average delay is calculated by adding all packets delays and then dividing the result by the total number of packets. The throughput is the average packets received per cycle per PE (packets/cycle/PE), we calculate such parameter to each PE and then we add all these numbers and then we take the average according to the number of PEs used. We got our simulation results by creating 5x5 mesh topology using systemC. The FIFO size used was 5 packets except the local FIFOs which were implemented with large size (assumed to be infinite) to facilitate the calculation of the injection rate for both Base and Flexible Routers. The data link width between any two routers is equal to one packet so a complete packet can be transferred in one cycle. We simulate under XY routing algorithm, Store-and-Forward2 (SAF) switching technique, and four different traffic patterns; Hotspot (HS), Uniform (UNI), Transpose-II (T2), Nearest-Neighbor (NN). These four patterns are explained as follows: Hotspot: A higher portion of the traffic is directed to the hotspot node and the rest of the traffic is equally distributed between all other nodes [26], in our experiments 90% of the packets are directed to the hotspot node. Uniform: All the traffic is equally distributed between all nodes. Transpose-II: Node at (i,j) coordinates will send to node at (X-i-1,Y-j-1), where X and Y is the number of nodes along x and y directions respectively. Nearest-Neighbor: Each node will send only to the nearest or adjacent nodes to it. Each router is connected to a PE; this PE is just injecting packets (typically 1000 packets/PE) to other PEs according to the traffic pattern used, and receiving packets from other PEs too. The intermediate time distribution between two successive transmitted packets is chosen to be uniform distribution, i.e. the time period between two successive transmitted packets can be small or large or any value without any preference. 2
In SAF switching technique, a complete packet is stored then forwarded by the router.
31
4.2. Simulation results and analysis Average delay and throughput comparisons between the XYFR, WFFR and the Base Routers under HS, UNI, NN, T2 traffic patterns are shown in Fig. 1(a,b,c,d,e,f,g,h)
Average delay (Cycles)
respectively. The summary of results is in table I as a comparison to the Base Router. 3000 2500 2000 1500 1000 500 0
XYFR WFFR Base
0
0.002
0.004
0.006
0.008
0.01
0.012
Packets Injection rate (Packets/cycle/PE) (a) Delay under HS Throughput (Packets/cycle/PE)
0.04 0.03 0.02
XYFR WFFR Base
0.01 0 0
0.002
0.004
0.006
0.008
0.01
0.012
Packets Injection rate (Packets/cycle/PE) (b) Throughput under HS Average delay (Cycles)
5000 4000 3000
XYFR WFFR Base
2000
1000 0 0
0.02 0.04 0.06 0.08
0.1
0.12 0.14 0.16 0.18
Packets Injection rate (Packets/cycle/PE) (c) Delay under UNI Throughput (Packets/cycle/PE)
0.03 0.03 0.02
0.02
XYFR WFFR Base
0.01 0.01 0.00 0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Packets Injection rate (Packets/cycle/PE) (d) Throughput under UNI
32
0.16
0.18
Average delay (Cycles)
3000 2500 2000 1500 1000 500 0
XYFR WFFR Base
0
0.02 0.04 0.06 0.08
0.1
0.12 0.14 0.16 0.18
0.2
Throughput (Packets/cycle/PE)
Packets Injection rate (Packets/cycle/PE) (e) Delay under NN 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0
XYFR WFFR Base
0
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Packets Injection rate (Packets/cycle/PE) (f) Throughput under NN
Average delay (Cycles)
4000 3000 XYFR WFFR Base
2000 1000 0 0
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 Packets Injection rate (Packets/cycle/PE) (g) Delay under T2
Throughput (Packets/cycle/PE)
0.025 0.020 0.015
XYFR WFFR Base
0.010 0.005 0.000 0
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 Packets Injection rate (Packets/cycle/PE) (h) Throughput under T2
Fig. 1 Performance characteristics comparisons between XYFR, WFFR, and Base Routers, (a,b) delay and throughput respectively for HS, (c,d) delay and throughput respectively for UNI, (e,f) delay and throughput respectively for NN, (g,h) delay and throughput respectively for T2.
33
TABLE I.
SUMMARY OF THE RESULTS.
Traffic \ Flexible Router
XYFR
WFFR
HS
Better
Better
UNI
Better but close
Worse but close
NN
Better but very close
Better but very close
T2
Worse but close
Worse
For low injection rates both Base and Flexible Routers have nearly the same average delay and throughput. This is because at low rates the number of packets injected into the network per cycle is small, therefore the number of contentions will be small as well, and therefore the flexibility advantage will not be used a lot. Both XYFR and WFFR do not give high performance under some traffics. There are two reasons to comment on these results: There is a communication overhead in the FFC of the Flexible Router because the upstream router does not search for the free buffers by itself but using FFC, so the upstream router first initiates the request to the FFC, then FFC checks for the available FIFOs and waits to be granted by one of them. Upon receiving the grant, the FFC will grant the upstream router and the packet will be transferred. The b2b deadlock recovery consumes a significant amount of cycles with no value. Also it increases the average delay of the packets because it increases congestion as it may affect other packets waiting the packet under this problem to be transferred. The effect of this problem under different traffics is shown in table IV. The table shows the number of lost cycles due to this problem, the lost cycles is the cycles lost during waiting the FIFO and at the end the FIFO becomes full without granting the request so we lose some cycles while the packet still does not stored yet. This experiment is done under the highest injection rate for all traffics used because at this rate the number of packets is high and so the number of blocking will be high also. As stated in the table, the effect of this problem on WFFR is more than XYFR (except for NN) because the WFFR is more flexible than XYFR so the probability that this problem to occur is much higher in WFFR than XYFR. We can notice the analogy between the results in the table and the characteristics (delay and throughput) we got in Fig. 1. For example for T2 traffic, this problem occurred in
34
WFFR and so congestion will be increased while it does not occur in XYFR (although it is susceptible to it) and this gives an interpretation to Fig. 1(g,h) which shows that XYFR is highly outperforms WFFR. The same analogy can be done also to the rest of Figures. Although for NN this problem does not occur in WFFR but occurs in XYFR but the effect was not high as in NN long chains of blocked packets along many routers cannot occur as each node send to its neighbors only. TABLE II.
EFFECT OF BUFFER-TO-BUFFER PROBLEM ON BOTH XYFR AND WFFR
Traffic
Flexible
Total number of lost cycles due to
Ratio of lost cycles
pattern
Router
buffer-to-buffer deadlock recovery
(WF/XY)
Uniform HotSpot T2 NN
WFFR
51907
XYFR
34157
WFFR
235115
XYFR
284537
WFFR
158759
XYFR
0
WFFR
0
XYFR
34157
1.52 0.826 Inf. 0
The highest improvement was under HS traffic. Both XYFR and WFFR saturate at higher injection rates than the Base case. There is 11.4% increase in the saturation rate for both XYFR and WFFR. The saturation rate is the rate at which the delay (throughput) begins to increase (decrease) exponentially.
4.3. Area overhead of the Flexible Router Xilinx ISE 13.2 was used to perform the area estimation. Both Base and Flexible Routers were implemented on Virtex-5 FPGA. The target device was xc5vfx70t-1ff1136. As shown in table V the area overhead was reasonable, only 17.8% increase in Look-Up Tables (LUTs) and 11.7% increase in Flip Flops (FFs). TABLE III. AREA COMPARISON
FPGA resources LUTs FFs
Number of resources used Base Flexible 506 596 308 344 35
Percentage increase 17.8% 11.7%
4.4. Out of order received packets study We have studied this side effect to know how much it affects the Flexible Router operation. We suggest some criterion to judge this side-effect; this criterion is the lagging distance. Lagging distance is the difference between the correct and the incorrect order of the unordered packets Fig. 2 shows the lagging distance and how many packets reached with such lagging. We made this experiment under the highest injection rate possible because at this rate it will be probable to detect the highest lagging. We use Hotspot traffic pattern because in Hotspot traffic the probability that two successive packets from the same source will go to the same destination is high. We adjust the traffic so the hotspot node will receive 90% of the total sent packets and the rest of packets will be equally distributed among all other nodes. 1200
Number of out of order packets
1074 1000 800 600
506
400 200 0
150 1
2
3
56
24
23
6
6
0
2
1
4
5
6
7
8
9
10
11
Lagging distance Fig. 2 Lagging distance (the number of packets unorderly received for each lagging is written at the top of each column).
As we see only one packet reaches with maximum lagging (11), while the majority (1074) of the unordered packets reaches with the minimum lagging (1). Also we found that just 7.4% (1848 out of 25000) of the total number of received packets are out of order. It is worth to say that, this side-effect is not frequently occurs and need four conditions to occur: 1. Packets must be from the same source and are going to the same destination. 2. High injection rate of the packets. 3. The second later transmitted packet (packet 2 in the example of the previous section) stored in advanced FIFO position (closer to the head packet of the FIFO) than the first earlier transmitted packet (packet 1) in some router a long their path to the destination.
36
4. If they were in the same FIFO position, this unordering could occur if the arbiter chooses the second later transmitted packet before the earlier transmitted one according to the arbitration scheme used. Besides these four conditions there are some situations less susceptible to unordering of packets than others. For example in XYFR, if the destination has the same Y coordinate of the source, in such situation, packets will traverse horizontally toward the destination, and hence the probability of unordering become very small as it might occur only at the destination node. The reason is that; when the FFC receives a request and finds that the destination is to West or East, it has just one option for each direction; the East FIFO buffer for the West direction and the West FIFO buffer for the East direction, as stated in XYFR. Because of that, the second later transmitted packet can be stored only after the earlier transmitted one and always in the same FIFO (East or West according to the direction). When they reach the destination node, there they can be stored in different FIFOs and hence the unordering side-effect could occur. To show this side-effect at lower injection rates we redo the previous experiment but a lower rate. We choose the saturation rate to do this experiment. As shown in Fig. 3 the number of unordered packets decreases a lot. Now there are only 413 unordered packets which means about 22% of the unordered packets at the maximum rate (1848) and just 1.65% of the total number of sent packets (25000). Also the maximum lagging now becomes 3 which is very small compared to its value which was 11 at the maximum injection rate.
Number of out of order packets
500 400
402
300 200 100 0
1
10
1
2
3
Lagging distance Fig. 3 Lagging distance at the edge injection rate in XYFR.
CHAPTER 5
37
CHAPTER 5 CONCLUSION AND FUTURE WORK 5.1. Conclusion Congestion problem is a challenging problem in the area of networks especially NoC. Usually congestion is mitigated by increasing FIFO depth or by implementing VCs, but both present area and power overhead. It is found that a significant portion of router area or power consumption is due to buffers, so there is a trade-off between performance and both area and power overheads. Such trade-off between performance, area, and power has been studied in [4,23,24]. Both [4,23] introduces application dependent algorithms that identify the locations of bottlenecks in the network with two different criteria, then increases the FIFO depth or the number of VCs at these bottlenecks and then redo the algorithm again until the buffering budget finishes. Both ways shows better performance with huge optimization in the amount of added buffers but the problem is that if the application changes the location of bottlenecks could change and the whole operation should be redone. In [24] a flexible architecture to use the available VCs at the requested input port called ViChaR is introduced. This architecture makes the router to cope with the input traffic and change the number of VCs with it; if the traffic increases there will be an equal number of VCs with less depth and vice versa, so with the same amount of buffers the router has a variable number of VCs. Although this architecture increases the performance but the problem still in using VCs which introduces area and power overhead that we should avoid. We have introduced a new Flexible Router architecture that can mitigate the congestion problem of the network without introducing extra buffers or VCs. The flexibility idea is simple; if a request is blocked because the requested FIFO was full the Flexible Router will search for a free buffer other than the requested busy buffer to store the incoming packet. The flexibility leads to some problems like deadlock, b2b deadlock, and out of order received packets. We could prevent deadlock by making an analogy between the Flexible Router and the Base Router working under XY routing in terms of what packet directions each buffer can have. The b2b problem is just another version of deadlock and we could
38
design the Flexible Router to recover from this problem when it occurs. Regarding the out of order received packets, this side-effect was studied under two different injection rates and it is found to be small and limited. We have proposed two versions of the Flexible Router, one of them is based on the analogy with the Base Router working under XY routing and we called this version XYFR and the other one based on the analogy also to the Base Router but working under WF routing and we called it WFFR. Both Flexible Router versions XYFR and WFFR show higher performance than the Base Router under HS and NN traffic patterns. However both of them shows lower performance in case of T2, while for UNI only XYFR shows higher performance. We have two reasons to explain why there was not much improvement under NN and UNI and it was even worse than Base Router under T2: The communication overhead due to FFC; because the communication between the upstream router and the FIFOs of the downstream router is done through the FFC but if there was a direct access to these FIFOs this overhead could vanish. The second reason is that b2b problem causes some congestion because when it occurs the packet involved in the problem spends some cycles with no value and this may affect other packets behind this packet. Finally it is observed that XYFR is always outperforms WFFR although WFFR is more flexible; the reason is that; as WFFR is more flexible, hence it will be more susceptible to b2b deadlock therefore more cycles will be lost and congestion will occur in the network degrading router performance.
5.2. Future work
We can summarize the future work in the following points
We intend to eliminate or decrease the b2b problem using a new design to the FIFOs decrease dependency between buffers.
We intend to improve the output port controller of the Flexible Router in order to enables it to directly access the downstream router buffers so the FFC overhead could be eliminated.
We will look for a suitable way to decrease or even eliminate the out of order packets.
39
As both versions of the Flexible Router are deadlock free routers under XY routing, we will add a small modification to both routers and introduce them in a new fashion as a deadlock free routers that can work under full adaptive routing algorithm.
It is also planned to increase the flexibility of the flexible router to the maximum; so there will be no restrictions on where the router can store the incoming packets. To deal with deadlock we will implement a deadlock recovery scheme to recover from deadlock when it occurs.
40
REFERENCES [1]
Zhonghai Lu, “Design and Analysis of On-Chip Communication for Network-onChip Platforms” PhD Dissertation, School of Information and Communication Technology (ICT) KTH.
[2]
L. Benini and G. De Micheli, “Networks on Chips: Technology and Tools,” Morgan Kaufmann, 2006.
[3]
I. Saastamoinen, M. Alho, and J. Nurmi, “Buffer implementation for Proteo networkon-chip,” in Proceedings of the International Symposium on Circuits and Systems (ISCAS'03), vol. 2, pp. II-113 - II-116, 2003.
[4]
H. Jingcao and R. Marculescu, “Application-specific buffer space allocation for networks-on-chip router design,” in Proceedings of the IEEE/ACM International Conference on Computer Aided Design (ICCAD), pp. 354-361, 2004.
[5]
Y. Hoskote, S. Vangal, A. Singh, N. Borkar and S. Borkar, “A 5-GHz Mesh Interconnect for a Teraflops Processor”, IEEE MICRO, vol. 27, pp. 51-61, Sept.-Oct. 2007.
[6]
H. Wang, X. Zhu, L. Peh, and S. Malik, “Orion: A Power- Performance Simulator for Interconnection Networks,” in Proceedings of the 35th IEEE/ACM Annual International Symposium on Microarchitecture (MICRO-35), pp. 294-305, Nov. 2002.
[7]
N. Banerjee, P. Vellanki, and K. Chatha, “A Power and Performance Model for Network-on-chip Architectures,” in Proceedings of Design, Automation and Test in Europe Conference and Exhibition (DATE), vol. 2, pp. 1250-1255, Feb. 2004.
[8]
X. Chen and L. Peh, “Leakage Power Modeling and Optimization in Interconnection Networks,” in Proceedings of the 2003 International Symposium on Low Power Electronics and Design (ISLPED '03), pp. 90-95, Aug. 2003.
[9]
International Technology Roadmap for Semiconductors, http://www.itrs.net
[10] G. Moore, “Cramming More Components on to Integrated Circuits,” Electronics, vol. 38, no. 8, pp.114 ff, April 1965. [11] J. D. Owens, and W. J. Dally, “Research Challenges for on-Chip Interconnection Networks,” Micro IEEE, vol. 27, pp. 96-108, Sept.-Oct. 2007. [12] T. Bjerregaard and S. Mahadevan, “A survey of research and practices of networkon- chip,” ACM Computing Surveys (CSUR), vol. 38, no. 1, pp. 1--es, 2006. 41
[13] A. Ivanov, and G. De Micheli, “The Network-on-Chip Paradigm in Practice and Research,” IEEE Design & Test of Computers, vol. 22, pp. 399- 403, Sept.-Oct. 2005. [14] M. Horowitz, R. Ho, and K. Mai, “The future of wires,” in Proceedings of the IEEE, vol. 89, pp. 490-504, Apr. 2001. [15] K. Lahiri, “On-Chip Communication: System-Level Architectures and Design Methodologies,” PhD Dissertation, University of California, 2003. [16] R. Mehra, L. M. Guerra, and J. M. Rabaey, “A partitioning scheme for optimizing interconnect power,” IEEE Journal of Solid-State Circuits, vol. 32, pp. 433-443, Mar.1997. [17] D. Sylvester, and K. Keutzer, “A global wiring paradigm for deep submicron design,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 19, pp. 242-252, Feb. 2000. [18] C. R. Hilton, “A Flexible Circuit-Switched Communication Network for FPGA Based SoC Design,” PhD Dissertation, Faculty of Brigham Young University, 2005. [19] R. Saleh, “An Approach That Will NoC Your SoCs Off,” IEEE Design & Test of Computers, vol. 22, pp. 488, Sept.-Oct. 2005. [20] W.J. Dally and B. Towles, “Principles and practices of interconnection networks,” Morgan Kaufmann, 2004. [21] C.J. Glass and L.M. Ni, “The turn model for adaptive routing,” in Proceedings of the 15th Annual International Symposium on Computer Architecture, pp. 278-287, 1992. [22] J. Hu and R. Marculescu, “DyAD: smart routing for networks-on-chip,” in Proceedings of the Design Automation Conference (DAC), pp. 260-263, July 2004. [23] T.C. Huang, U.Y. Ogras, and R. Marculescu, “Virtual channels planning for networks-on-chip,” 8th International Symposium on Quality Electronic Design (ISQED'07), pp. 879-884, March 2007. [24] C.A. Nicopoulos et al., “ViChaR: A dynamic virtual channel regulator for networkon-chip
routers,”
39th
Annual
IEEE/ACM
International
Symposium
on
Microarchitecture (MICRO'06), pp.333-346, Dec. 2006. [25] Y. Tamir and G. L. Frazier, “High-performance multiqueue buffers for VLSI communication switches,” in Proceedings of the 15th Annual International Symposium on Computer Architecture (ISCA), pp. 343-354, 1988.
42
[26] M. Rezazad and H. Sarbazi-azad “The Effect of Virtual Channel Organization on the Performance of Interconnection Networks” in Proceedings of 19th IEEE International Symposium on Parallel and Distributed Processing, pp. 8, April 2005.
43
تمهٍم االسدحبو عٍ طزٌك انًٕجّ انًزٌ نهشبكبت انًٕجٕدِ عهى انشزائح
رسبنة عهًٍة يمذية انى كهٍة انذرسبت انعهٍب كهٍة انُٓذسّ انجبيعة انًصزٌة انٍبببٍَة نهعهٕو ٔانتكُٕنٕجٍب كبستٍفبء جشئً نًتطهببت انحصٕل عهى درجة
يبجستٍز انعهٕو انُٓذسٍة فى ُْذسة االنكتزٍَٔبت ٔاالتصبالت
يمذية يٍ يصطفى سعٍذ سٍذ عبذانزحٍى فبزاٌز 2012 44
تمهٍم االسدحبو عٍ طزٌك انًٕجّ انًزٌ نهشبكبت انًٕجٕدِ عهى انشزائح يمذية يٍ يصطفى سعٍذ سٍذ عبذانزحٍى
نهحصٕل عهى درجة يبجستٍز انعهٕو انُٓذسٍة فى ُْذسة االنكتزٍَٔبت ٔاالتصبالت انتٕلٍع
نجُة االشزاف ا.د /يحًذ انسٍذ رجب -اىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب
.................................
او.د /فٍكتٕر جٕالرت -جبٍؼت مٍ٘ش٘ -اىٍبببُ
.................................
او.د /احًذ انًٓذي -اىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب
.................................
نجُة انًُبلشة ٔانحكى عهى انزسبنة ا.د /يحًذ انسٍذ رجب ػٍَذ ميٍت ْٕذست االىنخشٍّٗبث ٗاالحصبالث ٗاىذبسببث ببىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب ا.د /يحًذ رسق
يٕافمٌٕ ................................. .................................
اسخبر بنيٍت اىْٖذسٔ بجبٍؼت االسنْذسٌٔ ا.د /ايٍٍ شكزي سئٍس قسٌ ْٕست ػيً٘ اىذبسببث ببىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب ا.د /حسبو شهبً سئٍس قسٌ ْٕذست االىنخشٍّٗبث ٗاالحصبالث اىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب او.د /يسعٕد انغًًٍُ اسخبر ٍشبسك بقسٌ ْٕذست االىنخشٍّٗبث ٗاالحصبالث اىجبٍؼٔ اىَصشٌٔ اىٍبببٍّٔ ىيؼيً٘ ٗاىخنْ٘ى٘جٍب
َبئب رئٍس انجبيعة نهتعهٍى ٔانشئٌٕ انكبدًٌٍة أ.د .أحًذ عبذ انًُعى ابٕ إسًبعٍم
45
................................. ................................. .................................
يهخص ٍشنيت االصددبً ٗادذٓ ٍِ إٌ اىَشنالث ٗاىَ٘اظٍغ اىَخؼيقٔ ببىشبنبث اىَ٘ج٘دٓ ػيى اىششائخٗ ،قذ ػ٘ىجج ٕزٓ اىَشنئ بنثشٓ فً مثٍش ٍِ االبذبد اىسببقٔ .بؼط اى٘سبئو اىخً اسخخذٍج ىؼالج ٕزٓ اىَشنئ ٍِ قبو مبّج حضٌذ ٍِ مٍَت اىزامشٓ اىَخبدٔ داخو اىَ٘جٔ (س٘اء فً ػَق اٗ ػذد ٕزٓ اىزامشاث)؛ ٗىنِ ٗجذ اُ ٕزٓ اىطشٌقٔ حضٌذ ٍِ اىَسبدٔ ٗاىطبقٔ اىَسخٖينٔ ب٘اسطت اىَ٘جٔ .فً ٕزا اىبذذ ّذِ ّقذً حصٌٍَ جذٌذ ىَ٘جٔ ٍشُ ٌسخطٍغ اُ ٌضٌذ ٍِ سشػت اداء اىشبنٔ ببسخخذاً ّفس مٍَت اىزامشاث اىَخبدٔ ٗىنِ ببسخخذاٍٖب بطشٌقٔ ٍشّٔ ٗفؼبىٔ .ىٖزا اىسبب ال ٌ٘جذ دبجٔ ىضٌبدة ػَق اىزامشاث اىَخبدٔ اٗ دخى ىضٌبدة اػذادٕب بؼَو قْ٘اث افخشاظٍٔ .اىفنشٓ بسٍطٔ؛ ى٘ ٗجذ غيب ىخخضٌِ دضٍٔ ٍِ اىبٍبّبث ىزامشٓ ٍَخيئٔ الخشٕب فس٘ف ٌبذذ اىَ٘جٔ اىَشُ بذاخئ ػيى رامشٓ ٍْبسبٔ غٍش ٍَخيئٔ ٌٗخضُ فٍٖب اىذضٍٔ اىقبدٍٔ .ببىشغٌ ٍِ اُ ٕزٓ اىَشّٗٔ حسخطٍغ صٌبدة اىسشػٔ اال اُ بؼط اىَشبمو قذ حظٖش ّخٍجٔ ىيَشّٗٔ قذ حقيو ٍِ ٍِ صٌبدة اىسشػٔ اىَنخسبٔ .اٗه اىَشبمو اىخً ٗاجٖخْب ًٕ ٍشنيت جًٕد انشبكّ ٗىنْْب اسخطؼْب اُ َّْؼٖب بؼَو حْبظش بٍِ اىزامشاث داخو اىَ٘جٔ اىَشُ ٗاىَ٘جٔ االصيً اىزي ٌؼَو بْ٘ػٍِ ٍِ اّ٘اع اّظَت اىخ٘جٍٔ اىخً ىذٌٖب ٍْبػٔ ٍِ ٍشنيت اىجَ٘د ٍِ دٍذ ّ٘ػٍت احجبٕبث اىذضً اىخً حسخطٍغ مو رامشٓ اُ حخضّٖبّ٘ .ع اخش ٍِ جَ٘د اىشبنبث اغيقْب ػئٍ اسٌ :انجًٕد يٍ انذاكزِ انى انذاكزِ ٍِ اىََنِ اُ ٌذذد اٌعبٍ .ثو ٕزا اىْ٘ع ٍِ اىجَ٘د قذ ششح ببىخفصٍو ٗ ٍ٘ظخ ٍؼٔ مٍف اسخطؼْب اُ ٌخخيص اىَ٘جٔ ٍِ ٕزا اىْ٘ع ٍِ اىجَ٘د ى٘ دذد .اٌعب اُ اىَشّٗٔ قذ حؤدي ىؼشض جبّبً بسٍػ اال ٕٗ٘ اسخقببه اىذضٍٔ بطشٌٔ غٍش ٍشحبٕٔ .زا اىؼشض اىجبّبً قذ ششح اٌعب ببىخفصٍو ٗقذ بٍْب أّ ٍذذٗد ٗبسٍػ ٗرىل ألّٔ ٌذخبج ىبؼط اىظشٗف ٗاىششٗغ ىنً ٌذذد .أخٍشاٍْ ،بقشت اىضٌبدٓ فً اىَسبدٔ ّخٍجت صٌبدة اىَشّٗٔ قذ ػشظج ٗقذ ٗجذ اّٖب رٌبدٓ ٍؼق٘ىٔ ٗىٍسج مبٍشٓ.
46