CT30A7001 Concurrent and Parallel Computing. Exercise 5 ... Maximum degree
of concurrency ... In both cases the problem can be solved in seven steps.
CT30A7001 Concurrent and Parallel Computing
Exercise 5, answers
Assignment 1
•
Maximum degree of concurrency Amount of tasks that can operate concurrently
•
Critical path length Sum of the weights of the nodes along the critical path
•
Maximum available speedup Amount of tasks / Critical path length
•
Minimum processes to achieve maximum speedup = Can you solve the problem optimally using less tasks than maximum degree of concurrency
•
Maximum speedup if the number is limited Amount of tasks / number of steps to complete
A
B
C
D
Maximum degree of concurrency
8
8
8
8
Critical Path length
4
4
7
8
Maximum speedup
15/4
15/4
14/7
15/8
Minimum processes
8
8
3
2
Limited 2
15/8
15/8
14/8
15/8
Limited 4
15/5
15/5
14/7
15/8
Limited 8
15/4
15/4
14/7
15/8
Maximum limited speedup
Table 1: Metrics for task graphs
Assignment 2 The task graph is shown in Figure 1.
1
Figure 1: Task Graph for LU factorization In both cases the problem can be solved in seven steps. With four processes this is simple, as the maximum concurrency is four. With three tasks, you will need to look at the relationships and decide how to split the work between steps. The computation of tasks 3 and 7 can be postponed by one step and still reach the goal in seven steps. The exact orders can be seen in Table 2. Step
4 tasks
1
1
3 tasks 1
2
5, 2, 4, 3
5, 2, 3
3
8, 6, 7, 9
8, 6, 3
4
10
10, 7, 9
5
12, 11
12, 11
6
13
13
7
14
14
Table 2: Execution orders with 3 and 4 tasks
Assignment 3
S=
W Tp
As
increases,
p
=
W s Ws + W −W p W −Ws approaches zero. But no matter how large p
p
is,
S
cannot
W
exceed W . s
Assignment 4
•
a) 12 steps are needed (Sequential version that always searches the leftmost tree opens 12 nodes)
•
b) Only 5 steps are needed,
⇒ S =
12 5 , when the root node is opened
by one task, and afterwards the two tasks operate on the identical trees starting from the second level Speedup is greater than 2 since parallel algorithm performs less work. Sequential version opens 12 nodes, the parallel version opens only 9.
2