Experimental Evaluation of parallelism in real time execution. 1. Md. Rashedul Islam, 2 Aloke Kumar Shah, 3 Md. Rofiqul Islam. 1,2 Assistant Professor, Dept. of ...
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
Experimental Evaluation of parallelism in real time execution 1 1,2
Md. Rashedul Islam, 2 Aloke Kumar Shah, 3 Md. Rofiqul Islam
Assistant Professor, Dept. of Computer Science & Engineering, University of Asia Pacific, Dhaka, Bangladesh 3 School of Business & Informatics, Högskolan i Borås, Borås, Sweden
Abstract Parallel programming is a great challenging of modern computing. The main goal of that parallel programming is to improve performance of computer applications. A well-structured parallel application can achieve better performance over sequential execution on existing and upcoming parallel computer architecture. This paper describes the experimental evaluation of parallel application performance with thread-safe data structure in real time execution. Described different performance issues of this paper, help to make efficient parallel application for better performance. Before describing the experimental evaluation, this paper describes some methodologies relevant of parallel programming. The evaluation of parallel applications has been done by experimental result evaluation and performance measurements. The knowledge of proper partitioning, oversubscription, proper work load balancing, way of memory sharing will help to make more efficient parallel application.
Keywords: Parallel Processing, Data Parallelism, Task parallelism, Decomposition, Multithreading, Parallelism Performance.
1. Introduction All of the programs execute sequentially in serial computers and the performance of serial computer is not so good. Gradually the manufacturers had tried to improve the performance of those serial computers. Now a day, parallel architecture computers like different multi core computers are available near our hand. On the other hand, the workloads are increasing day by day. People are trying to solve much more complex problems using computers and computer programs for the benefit of the world. In present, people using the parallel computer or parallel application in different areas. Historically, parallel computing has been used to model difficult scientific and engineering problems found in different areas in the real world. Today, parallel computing is being used in the commercial sector for better services, to save time and increase the speed for processing of large amounts of data in sophisticated ways. For example Databases, data mining, Medical imaging and diagnosis, Oil exploration, weather forecasting, web based business services, Web search engines, Financial and economic modeling, Advanced graphics and virtual reality, particularly in the entertainment industry, Networked video and multi-media technologies etc. So, we should know parallel processing. Parallel processing is a way of dividing a large complex task among multiple processing units which operates simultaneously for achieving a common goal. This is a strategy by which we can do any complex task very quickly. The expected result is faster completion with the comparison of sequential processing. In the early days the computer program was serial, and it was a set of sequential instructions. The main purpose of parallelism is to decrease the execution time by maximum uses of CPU resources. In sequential processing one instruction can execute at a time and all other instruction execute one after another. On the other hand Parallel processing large instructions are decomposing in different parts and distributing the different parts to different processors for doing the small task simultaneously. Its increases efficiency, speed and performance. We can use parallel architecture computing on a single machine or different machine to perform the parallel execution. We use the parallel execution for executing independent tasks in parallel, overlap execution of dependent task; complete a job in less time, increase usage of CPUs. The parallel computer is becoming a high performance application platform. However, it is generally recognized that designing applications for parallel computers is hard [1][2][3]. We are using the
Research Notes in Information Science (RNIS) Volume10, October 2012 doi : 10.4156/rnis.vol10.6
42
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
resource of parallel architecture and evaluate the performance of parallel application using different parallel constructions in .NET Framework 4.0.
2. Parallel Program Design Methodology In general the sequential execution is used for programming. In sequential programming one instruction is execute at a time and it uses one processor. Sequential programming has been written for serial computers. In sequential programming, processing speed depends on the hardware that how fast it can processes the data. On the other hand, Parallel programming comes to overcome the processing speed of sequential programming. For a complex task it decomposes an algorithm or data into different parts and divides those tasks into multiple computers and does the tasks in multiple computers simultaneously. Parallel programming is very difficult because software engineers run away from parallel programming whereas modern general functions computer processor cores increasing more and more cores by force and by default. But these processors are not utilizing properly. There are several difficulties of parallel programming as follows:
Managing complex data dependency is Non-Trivial Managing data access conflicts without full access control Managing multiple core executions driving one insane Lack of Parallel library for general use Interacting With Hardware Legacy software is often used as the entry point to parallelization
2.1 Steps for Creating a Parallel Program In general, sequential program construction is not so complex. But making a parallel program is complex and involves many factors and dependency. In a sequential program there is no extra headache for partitioning the task or data. But in parallel program the whole work is partitioned in several subparts and different processors are responsible for executing one or more subparts. And different part of whole work execute concurrently. There are three important concepts for creating a parallel program such as task, processes and processors [26]. A task is the smallest unit of work done by the program. A whole work can be divided in several tasks and different tasks can execute concurrently in parallel program. On the other hand a process is an arbitrary entity. There are several cooperative processes can be exist in a parallel program, each process program subset of task. Finally every process executes its task on physical processors. For making a parallel program, there are the four steps [26]. The following figure-1 illustrates different steps of creating parallel program.
Figure 1. Steps of creating a Parallel program [26]
43
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
2.2 Threading A thread is a smallest unit of processing unit. In any multithreading application there are several threads and the operating system is mainly responsible for scheduling them. The implementation point of view thread and processes differs but a thread exists inside a process. In a process multiple threads can exist and they can share resources. In a multithreading application all threads are managed by a thread scheduler. A thread scheduler ensures CPU time allocation, waiting and blocking states of different thread. If you think about multithreading in a single processor then it is possible by Time Division Multiplexing. In general it seems as parallel but originally it is not. The processor switches between threads. But in multi core system multithreading [24] is implemented with genuine concurrency [5] and different threads can run in parallel in different core. One core runs one or two particular threads. By default, the explicitly created threads are foreground threads. Foreground threads have the ability to prevent the current application from terminating. After terminating all foreground threads the CLR will shutdown an application. Background threads are expended path of execution that can be ignored at any point in time. If all foreground threads have terminated, any and all background threads are automatically killed when the application domain unloads [6]. The following figure-2 shows how the worker thread act under main thread.
Figure 2. Threading concept in application (from [5])
2.3 Data parallelism When a large number of data value will be operated by a set of tasks then we can parallelize by multiple thread. Each thread will perform (same) set of task on subset data. This approach called data parallelism [6].
2.4 Task parallelism On the other hand tasks are partitioned in different threads. In Task parallelism [6] or Functional parallelism distributing the execution process or work in different computing nodes or threads.
2.5 Static & Dynamic Partitioning The .NET Framework 4.0 has a large collection of API to handle common needs for parallel programming. The .NET Framework 4.0 helps developers to make parallel application for solving large and complex problems in parallel architecture like multi-core and many-core machines. The new features of .NET Framework 4.0 support for key parallel patterns [7]. In this chapter the following discussion will provide depth knowledge about common patterns of parallel programming in .NET Framework 4.0 (especially C#) and supports of different APIs for developing parallel applications.
44
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
3. Performance Factors of Parallel Application 3.1 Amdahl's law Amdahl’s law is very essential rules for establishing theoretical for achieving maximum speed up of a parallel program. Amdahl’s law places a strict limit on the speed up that can be understandable by using multiprocessors [16]. The Amdahl’s law states that a small portion of the program which can’t be parallelized will limit the overall speedup available from parallelization [4]. Amdahl’s Law can be shortly expressed by using the following equation, where f is the fraction of time spent on serial operations and p is the number of processors [17]:
Speed up
(1)
From the above equation, we can see that when the number of processors increases, then the term (1-f)p approaches 0, resulting in the following:
lim
→
speed up
(2)
From the above speedup equation we can see that, the maximum potential speedup for a parallel program depends on how much a program can be parallelized.”
3.2 Load Balancing “Load balancing is defined as the allocation of the work of a single application to processors at run-time so that the execution time of the application is minimized” [18]. Load balancing minimize the execution time of the program by assigning task to each processor’s work proportion to its performance. There are two major categories of load balancing; static and dynamic load balancing [19].
3.3 Granularity “The granularity of parallel program is the average computation cost of a sequential unit of computation in the program”[20]. Granularity of parallel program is known as the ratio between computation and communication of different processors working on same problem. This definition based on overhead computation cost eliminates the specific to the run time. Further execution cost of a program is imposed by the creation and management of parallel threads. Such as a thread creation need operations like stack allocation. For achieving better parallel performance, it requires to maintain throughout the computation to proper use of available processor and to provide the possibility of overlapping computation and communication. The efficiency of different strategies for creating parallel threads has to consider the overhead for task creation, communication on the both ends and cleaning up completed threads. The performance of a coarse-grained computation is better for high latency machine.
3.4 Data Dependency Data dependency exists when the same storage locations is used by multiple processors in parallel programming application. It is important for parallel programming because data dependency is one of the primary inhibitors to parallelism and the result is multiple uses of the same location in storage by different tasks However, data dependency is very important to designing parallel programs and mainly it’s important since loops are probably the most well-known target of parallelization efforts.
45
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
3.5 Static and Dynamic Partitioning Partitioning is another important performance factor. Static partitioning and Dynamic partitioning [23] applicability is depends on the problem to be solved. Sometime static partitioning is good and sometime dynamic is good. If the workload per iteration is approximate same amount then static partitioning represents an effective and efficient way to partition our data set [7]. On the other hand if the workload of per iteration is not equivalent then the dynamic partitioning is better. Because of work nature and improper partitioning sometime the execution falls in load imbalance. And in this situation some parts are finish in short time and some part takes more time. If the total workload is small and the program creates many partition/thread automatically then synchronization between threads become an overhead for the CPU than the original work execution. But if the total workload is very large then the synchronization is negligible.
3.6 Deadlock Parallel applications performance can be hampered due to deadlock. Deadlock is a situation in parallel programming application where two or more process is unable to continue because one process is waiting for other process to do something.
3.7 Communication Patterns For some cases if the no. of processors is increases then the execution time attributable computation will decrease, but it will also increase the execution time attributable to communication. The required communication time is depends upon a given system’s communication parameters and bandwidths. Generally when number of processors increase to execute one task the communication frequently trend to increase while the message size will decrease, when more processor share same data set and every processor handle smaller amount of total data. “In many application, the friction of communication time considerably larger than the computational time so in that case it would be not viable to completely overlap computational time with communication time. As a result communication time will continue to be bottleneck in the performance of these applications”[22].
4. Experimental Result Evaluation & Performance Performance, this is the main concern on parallel program [10]. In parallel architecture and parallel application platform we are hardly expect the better performance than the traditional sequential programming. In general sense the expected performance depends on the available computer architecture. How much parallel architecture we have? On the other hand, the performance of parallel application [14][15] varies according to different patterns of parallel application and situation of using those parallel applications. For our experiment we have tried to build several parallel applications then we have executed all programs in two machines. Each machine has different multi core configuration. One computer has ((2*6)*2)=24 cores CPU with 8GB internal memory and another has Core 2 Duo CPU with 3.5GB internal memory. We have tested those programs in different ways and different types of data range. We have used different parallel extinctions of .NET 4 in C# language. We have got different result as well. At the time of result evaluation we have got some better performance and also we have achieved some bad experience. We are describing several experiments for evaluating performance.
4.1 Experiment 1: Sequential Vs Static Multithreading. In our 1st and very primitive example we have got a fantastic performance of parallel execution. In this program we are calculating simple equation
46
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
S=Cos(x)+1/Cos(x)+Sin(x)+1/Sin(x). We have calculated the equation in 2 ways. One is using normal For() loop. The loop iterate 100000000 times. Another is using multiple threads. In multi thread approach we divided whole task to several number of thread (same number of CPU core) and each thread assigned to each core. We have run the program in two multi core environments. First we have run the program in a computer with Core 2 Duo CPU. In sequential way it takes 53828 milliseconds. But in multi threading way it takes only 29952 milliseconds. The figure-3 shows the CPU uses of Core 2 Duo machine for sequential and parallel execution.
Figure 3. CPU Usage with Core 2 Duo CPU for (a) sequential execution (b) parallel execution From the above experiment we can see that in sequential way two cores were not properly used. But in parallel way two cores were used properly at the time of execution. Also sequential approach has taken more time. After that we have tested the same program in a machine with 24 cores. And we have got better performance. Total execution time was 5244 millisecond and all cores were worked simultaneously. Also we can see that all cores worked at the time of execution. This achieved performance addressed the first part of main research questions. The following figure-4 shows the performance of sequential, multithreading in 2 cores and multithreading in 24 cores. Sequential Vs Multithreading 60000 Time in Millisecond 40000 20000 0 Sequential
Multi Thread (2 Multi Thread (24 Cores) Cores)
Figure 4. Performance between sequential and multithreading There are some negative factors in the previous approach. So in our experiment we have introduced another approach where all thread merge in a pool called ThreadPool. Here we have got significant performance improvement in both environments. The figure-5 shows Performance difference for both environments.
47
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
Figure 5. Performance difference between Normal multithreading and using ThreadPool and 2 computers.
4.2 Experiment 2: Sequential Vs Dynamic Multithreading This is an experiment for data parallelism using dynamic multithreading. We have used a pattern Parallel.For. In this example we have calculate another mathematical equation R=1/Cos(p)*tan(x,y) and R=R + Cos(q) *1/cos(q) This calculation has been done 20,000 times. In Parallel.For the above work has been done 100 times by creating multiple threads and assigning to different core. In sequential execution in every loop execute in sequential. But in Parallel.For different thread execute randomly. At the time of using Parallel.For we also use 2 different approaches, Using Thread Pool by ThreadPool.QueueUserWorkItemand Task by Task.Factory. Again we execute this program in both environments. And our achieved output is as follows. Table 1. Execution time in millisecond sequential and Parallel.For with ThreadPool and Task.Factory. Approach Sequential ThreadPool Task.Factory 2 Cores
13011 (ms)
6723 (ms)
6624 (ms)
24 Cores
12825 (ms)
1737 (ms)
1672 (ms)
The figure-6 represents the data of table-1. Here we can see that parallelism shows great performance over sequential.
Time in Millisecond
Performance comparison for Parallel.For
15000 10000
13011
12825 6723 6624
5000
1737 1627
0 Sequential
2 Cores Parallel.For (ThreadPool)
24 Cores Parallel.For(Task)
Figure 6. Performance comparison of dynamic partitioning
4.3 Experiment 3: Sequential Vs Task Parallelism In previous, we have discussed about data parallelism. Now we will discuss about an example for task parallelism. In task parallelism mainly divides the whole task not data in different part and each part is executed in different thread by different cores. After assigning the
48
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
task part to thread, the operating system is responsible for scheduling the thread for execution. The Parallel.Invoke is the right API for task parallelism. In the following example the program will read word from a big text file and will perform the following three tasks on the file. A) Finding the largest work, B) Finding the most common words and c) Finding the number of occurrence of a specific word. In our experiment, we have done those three tasks in traditional sequential way and in. And finally we have got the better performance of task parallelism then sequential execution. In our experiment the sequential execution takes 191 milliseconds and Parallel.Invoke takes 129 milliseconds.
4.4 Experiment 4: Parallelism using multiple threads Again we are coming back to data parallelism using different number of threads and partitioning range between threads for finding prime numbers and finding performance for different thread resource. Here we are using PLINQ. Here is an example for finding prime numbers in a range 0 to 2000000. We are finding those prime numbers using simple sequential algorithm and multiple threads. We have executed in sequential way and using different no of thread 2,4,6,8… We have run this application in both environments (mentioned in previous) and we have got major performance. 4 3
2 Cores
24 Cores
2 1 0
Figure 7. Parallelism performance with several threads in different cores. Figure-7 shows the performance of parallelism using different number of threads. From the above experiment we can see the great performance of 2 threads over sequential in both environments. In computer with 24 cores, we increase the threads then performance is gradually increases. But in a computer with 2 cores we have got the better performance up to 4 threads. After that additional increment of threads does not show the better performance. Because the over subscription on CPU resources is not good. So much swapping between thread is an extra headache for CPU.
4.5 Experiment 5: PLINQ Performance with different data range Here again we are calculating prime numbers for different ranges of data. In previous example we have calculated prime numbers from 0 to 2000000 using different number of threads. But now we are using the full CPU resources for calculating prime numbers from different ranges such as 10, 100, 500, up to 150000. We are executing in sequential algorithm and parallel using PLINQ. When we have executed this program using 24 cores we have got different results of execution time. The following figure-8 on next page represents the time performance of sequential execution and parallel
49
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
execution. The X-axis represents the number range in which we have to find the prime number and the Y-axis mentions the required time of several range for serial and parallel.
Time in Millisecond
8000 6000 4000 2000 0 10
100
500
1000
5000
Serial
0
0
0
0
10
Parallel
3
7500 10000 20000 50000 21
36
132
10000 15000 0 0
732
2773
6022
0 1 1 13 12 12 18 84 Figure 8. PLINQ performance with different data ranges.
249
514
From the above experimental results we can see that, initially with the less range of data the parallel execution has less performance. Because the total works so small that it just miss use the parallel power. Also over subscription and thread synchronization is extra load then process execution. But for the large number of data set the PLINQ shows it performance. For large data range like 50000, 100000 or 150000 the PLINQ does not take more time but the sequential execution needs huge time.
4.6 Experiment 6: Linear Vs real speed-up Here we are enhancing the Experiment-1. In this experiment the program evaluates that simple mathematical equation in 100000000 times. This execution done by multiple threads (use different number of cores) also using the ThreadPool. We have been used different number of cores and recorded the result in both approaches and the result is in figure-9, 10.
Execution time in millisecond
35000 30000
Using Static Multi Thread
25000 20000 15000 10000 5000 0
2
4
6
8
10
12
14
16
18
20
22
24
Multi Thread 32036171331191210021 8987 7919 7247 6527 6549 5904 5502 5244 Figure 9. Static multithreading using different number of core.
50
Execution time in millisecond
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
40000 Using Tread Pool
30000 20000 10000 0
2
4
6
8
10
12
14
16
18
20
22
24
Tread Pool 32039 17108 11916 10325 8760 7888 7094 6330 5704 5303 4850 4514 Figure 10. ThreadPool example using different number of core. From the above experimental results of both approaches we have calculated the execution speed up according to execution times. The speed up of different cores has been calculated corresponding to the execution time of sequential execution of that discussed equation. Speedup is defined by the following formula [24]:
(3)
Where: p is the number of processors and T1 is the execution time of the sequential algorithm, Tp is the execution time of the parallel algorithm with p processors. Here the execution time of Sequential algorithm T1= 63268 Milliseconds. So according to the equation the speed up of Multithread approaches and for 2 cores Sp=(T1/Tp)=(63268/ 32036)=1.9749. In theatrically for 2 cores the speed-up supposed to be 2 times. The speed should increase linearly. But in real implementation the speed-up exactly not like linear. The following figure-11 shows the speed-up curve for linear and two approaches. The above all experiment and this experiment show different performance in different case and we think this achievement addressed our research questions. 30
Execution Speed up
25 Speed Up
20 15 10 5 0 Linear
2
4
6
8
10
12
14
16
18
20
22
24
2
4
6
8
10
12
14
16
18
20
22
24
ThreadPool 1.97473.69825.30956.12777.22248.02088.91859.994911.09211.93113.04514.016 Multithread 1.97493.69285.31136.31357.03997.98948.73029.69339.660710.71611.49912.065 Figure 11. Execution Speed-up comparison.
51
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
5. Discussion According to several experiments, in many cases the parallelism shows performance improvement over traditional sequential execution. But in some cases the parallel loop works slower then sequential execution. In parallel execution there is so much complexity which does not exist in sequential execution. Different execution shows better performance in different areas or situations. Beside the achievement of good performance there are some bad experiences as well. From those bad experiences here we are discussing some issues which should we keep in mind for improving performance of parallel application.
5.1 Inappropriate uses of threading Threading use the CPU resources. Creating, destroying and Scheduling between threads are cost factor. So, because of inappropriate implementations sometime multithreading will not show the better performance. If in an application there are heavy I/O waiting then many worker thread can’t work simultaneously.
5.2 Oversubscription of threads Sometimes the over subscriptions of thread over CPU resources slow down the execution. The benefits of parallelization are depends on the number of processors on the computer. For example, if you have 2 cores which is capable to handle 4 threads simultaneously and the program creates more than 4 threads then switching and synchronization between threads will be an overhead for the CPU. Processor has to expanse some time for that. In that case if you create fixed number of parallelism which the target system has enough processors to handle that number of threads then it will be better.
5.3 Unbalanced workload distribution In many case unbalanced workload distribution becomes an issue of execution speed slow down. Mainly in static work partitioning it happens. If a program create 3 threads in which 1st thread is responsible for 2 second job the 2nd thread is responsible for 3 second job but the 3rdon for 20 second job then the 1st and 2nd will complete their works and main program has to wait lot of time to finish.
5.4 Low workload If the system has enough parallel power but total number of data or work is not so high then each CPU will not have so much work to do. For large workload the time of synchronization between threads is negligible. But for small amount of data the synchronization cost of CPU can be so expensive (relatively) then the main execution. So the performance might be down.
5.5 Proper CPU use and operating system dependency Finally, our applications on an operating system are not able to 100% use of the CPU resources. The operating system is responsible to fair scheduling of different current threads in a system. When our program creates some threads for execution at the same time there might be some other threads in queue of operating system. So it is quiet impossible to engage the CPU resource 100%. In chapter-5 there are several experiments and from those experiments we have realized the achievements of parallel application. Except some exceptions the actual performances of parallel application over sequential program are proper CPU utilization and less execution time. In general we can say speed-up work execution. In theoretically the speed-up of execution in parallel should be linear but actually we don’t get it. Above discussed issues and other minor
52
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
issues we don’t get the speed-up as linear. In experiment-9 we have discussed regarding this matter.
6. Conclusion The parallel computing is a vast topic for the present as well as next generation computation in the world. There are so many complex computational problems in front of us. So this is a great challenge for us to overcome the problem of parallel computing making efficient parallel applications in multiprocessor environment and solve those world complex problems in a very efficient way. Performance is a main issue of parallel applications. The knowledge of proper partitioning, avoiding oversubscription, proper work load balancing, proper way of memory sharing etc. makes your parallel application’s performance high. In our Paper we have tried to measure the performance of parallel applications with different parallelism criteria On the basis of our experimental result we have discussed some important performance issues as well. So we think that our Paper will be helpful for the academia as well as professionals for further research or making big parallel applications in a practical field. This Paper work will help as a foundation for further research for solving the upcoming complex problems in the world.
7. References [1] Carriero, N., and Gelernter, D. (1988), How to Write Parallel Programs. A Guide to the Perplexed. Yale University, Department of Computer Science, New Heaven, Connecticut. [2] Chandy, K. M., and Taylor, S. (1992), An Introduction to Parallel Programming. Jones and Bartlett Publishers, Inc., Boston. [3] Darlington, J. and To, H. W. (1993), Building Parallel Applications without Programming. Department of Computing, Imperial College. United Kingdom. In Abstract Machine Models, Leeds. [4] Azali Bin Saudi,” PARALLEL COMPUTING”, Universiti Malaysia Sabah,April 2008. [5] Joseph Albahari; Ben Albahari“C# 4.0 in a Nutshell”, Fourth Edition, Chapters 21 and 22, Publisher: O'Reilly Media, Inc. January 26, 2010. [6] Andrew Troelsen, “Pro C# 2010 and the .net 4 PlatformFifth Edition (chapter 19)”, Copyright © 2010 by Andrew Troelsen, Apress, USA [7] Stephen Toub, “PATTERNS OF PARALLEL PROGRAMMING”, July 1, 2010, ©2010 Microsoft Corporation. [8] Y.-K. Chen, X. Tian, S. Ge, and M. Girkar. Towards efficient multi-level threading of H.264 encoder on Intel hyperthreading architectures. Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004. [9] D. Lea. Concurrent Programming in Java: Design Principles and Patterns. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1996. [10] S. MacDonald. “From Patterns to Frameworks to Parallel Programs.”, PhD Paper, Department of Computing Science, University of Alberta, November 2001. Available at www.cs.ualberta.ca/∼systems. [11] S. Siu, M. D. Simone, D. Goswami, and A. Singh. Design patterns for parallel programming, 1996. [12] Michael J. Flynn. “Some Computer Organizations and their Effectiveness.” IEEE Transactions on Computers, 21(9):948–960, 1972. [13] Smith, C. U. and Williams, L. G. (1993), Software Performance Engineering: A Case Study Including Performance Comparison with Design Alternatives. IEEE Transactions on Software Engineering, Vol. 19, No. 7. [14] Pancake, C. M. (1996), Is Parallelism for You? Oregon State University. Originally published in Computational Science and Engineering, Vol. 3, No. 2.
53
Experimental Evaluation of parallelism in real time execution Md. Rashedul Islam, Aloke Kumar Shah, Md. Rofiqul Islam
[15] Pancake, C. M., and Bergmark, D. (1990), Do Parallel Languages Respond to the Needs of Scientific Programmers? Computer Magazine, IEEE Computer Society. [16] Vivek Sarkar, ”Introduction to Parallel Computing”, Department of Computer Science Rice University. [17] Hahn Kim, Julia Mullen and Jeremy Kepner, ” Introduction to Parallel Programming and pMatlab v2.0”, MIT Lincoln Laboratory, Lexington, MA 02420 [18] Shahzad Malik,” Dynamic Load Balancing in a Network of Workstations”, Research Report, November 29, 2000. [19] Eric Aubanel, “Resource-Aware Load Balancing of Parallel Applications”, Faculty of Computer Science, University of New Brunswick Fredericton, NB Canada, February 2008 [20] Hans Wolfgang Loidl, “Granularity in large scale parallel functional programming”, University of Glasgow, March 1998. [21] DANIEL A. REED, ALLEN D. MALONY, AND BRADLEY D. McCREDIE, “Parallel Discrete Event Simulation Using Shared Memory”, IEEE. [22] JunSeong Kim and David J. Lilja,” Characterization of Communication Patterns in MessagePassing Parallel Scientific Application Programs”, october 1997. [23] David S. Wise and Joshua Walgenbach, “Static and Dynamic Partitioning of Pointers as Links and Threads Technical Report 437”, Computer Science Department, Indiana University Bloomington, Indiana 47405-4101 USA Copyright c 1996 by the Association for Computing Machinery, Inc. [24] Herbert Schildt, ”C# 4. 0 the Complete Reference (Chapter 23,24)”, Copyright © 2010 by The McGraw-Hill Companies. [25] Wikipedia, “Speedup”, http://en.wikipedia.org/wiki/Speedup, access 18-12-2010 [26] David E. Culler, Jaswinder Pal Shingh with Anoop Gupta, “Parallel Computer Architecture: hardware/Software Approach”, Copyright © 1999 Morgan Kaufmann Publishers, Inc. [27] Mark N. K. Saunders, Adrian Thornhill, (2003) "Organisational justice, trust and the management of change: An exploration", Personnel Review, Vol. 32 Iss: 3, pp.360 – 375.
54