The Spanish Parallel Programming Contest and its use as an educational resource
Francisco Almeida∗ , Javier Cuenca† , Ricardo Fern´andez-Pascual† , Domingo Gim´enez‡ , Juan Alejandro Palomino Benito§ ∗ Departamento de Estad´ıstica, Investigaci´ on Operativa y Computaci´on, University of La Laguna.
[email protected] † Departamento de Ingenier´ıa y Tecnolog´ıa de Computadores, University of Murcia.
[email protected],
[email protected] ‡ Departamento de Inform´ atica y Sistemas, University of Murcia.
[email protected] § Centro de Supercomputaci´ on, Fundaci´on Parque Cient´ıfico, Murcia.
[email protected]
Abstract—The first Spanish Parallel Programming Contest was organized in September 2011 within the Spanish Jornadas de Paralelismo. The aim of the contest is to disseminate parallelism among Computer Science students. The website and the material generated can be used for educational purposes. This paper comments on the organization of the contest and summarizes some training activities in which the material of the contest is being or can be used. Keywords-Programming contests; Online judge; Parallel programming; Parallel algorithms
I. I NTRODUCTION Programming contests are useful for increasing students’ interest in programming and for enhancing their programming abilities. A number of competitions devoted to parallel computing have emerged in the last years, e.g. Student Cluster Competition [1] and the Marathon of Parallel Programming [2]. The first Spanish Parallel Programming Contest [3] was organized in September 2011 in the Jornadas de Paralelismo. This paper comments on the organization of the contest and some possibilities in which the contest and the material generated can be used for training. The paper is organized as follows. Section 2 details the organization of the contest. In section 3 the material generated that can be used for education is commented on. Section 4 describes some training experiences. Future perspectives are discussed in section 5. II. T HE S PANISH PARALLEL P ROGRAMMING C ONTEST The participation is in teams of three students and a teacher, who acts as a coach. The teams must solve a number of programming problems in a limited time. A sequential solution of each problem is provided, and the teams generate parallel solutions with the aim of reducing the execution time. The programs are developed in C, and OpenMP and MPI can be used to develop shared-memory, message-passing or hybrid versions. Four nodes (32 cores) of the cluster Arab´ı of the Supercomputing Centre (SCC) of the Scientific Park Foundation of Murcia are used in the contest for the preparation and evaluation of the solutions, for a warm-up session and for the celebration of the contest. A job queue system ensures only one program runs at a particular moment. The tool Mooshak [4] is installed in a
virtual machine from which the solutions generated by the teams are sent to the queue. Some modifications have been made to Mooshak to adapt it to the contest: it is used in conjunction with the subcluster, and a form of obtaining the classification based on speed-ups has been added. The classification is based on the speed-ups, Sp = ts /tp , with ts the time obtained with the sequential solution, and tp the execution time with the program sent by the contestants. The mark assigned to a correct solution is max{Sp − 1, 0}. The teams can send a maximum of ten solutions to each problem without penalization. Each additional submission means one point penalization, and the mark for the problem is max{max{Sp } − 1 − max{s − 10, 0}, 0}, where s stands for the number of submissions and max{Sp } represents the maximum speed-up achieved. For a problem for which some team has a mark higher than 15, the marks of each team are linearly scaled so that the maximum mark is 15. For each problem a brief description together with an example input and an execution scheme and the sequential solution are provided. The execution scheme can not be modified. The I/O are performed with this program, which has a limit for the execution time and generates the solution in a file, which is compared with the output of the sequential program. The scheme is compiled and linked with a file sec.c. The resulting executable is run through MPI, with only one MPI process and one OpenMP thread. The file sec.c has a heading of the form: /*
CPP_NUM_CORES = 1 CPP_PROCESSES_PER_NODE 1 CPP_PROBLEM=mm */
indicating the number of cores to use, the number of MPI processes to run on each node, and the name of the problem. The teams modify this function to make it parallel and send the file with the new function and the modified heading. The number of nodes reserved is N U M N ODES = b(CP P N U M CORES − 1)/8c + 1, and the number of MPI processes N U M P RO = N U M N ODES ∗ CP P P ROCESSES P ER N ODE. The teams can participate in situ or online, with two classifications: one for teams participating in situ and the other for all the participants. Eight groups from six universities
participated in the first edition; four in situ and four online. Before the contest, there was a warm-up session to check the modifications made to Mooshak and the correct response of the system, and to allow the participants to get familiar with the mechanics of the contest. That session comprised two simple problems: mergesort and matrix multiplication. For the matrix multiplication, OpenMP, MPI and MPI+OpenMP programs were provided, so that the teams could gain experience of the behaviour of the system with different types of parallelism. III. R ESOURCES FOR EDUCATIONAL PURPOSES With the organization of the contest, and thereafter, materials of different types have been generated. We enumerate these and comment on their educational applications: • The subcluster: A subcluster of 4 nodes each with 8 cores of the SCC is used in the contest. It is being used for short periods in courses in collaboration with the SCC. • The modifications to Mooshak: It has been necessary to adapt Mooshak to be used in a cluster. The tools developed can be used for contests or practices, either directly or modified for the type of tests to be performed and systems to be used. • The problems: The problems generated are available on the web. They can be used in parallel programming courses, and solutions provided by the students can be compared with those in the contest and the web. The problems range from very simple problems used in warm-up sessions to more complex problems used in the contest, and which are conceived to cover different algorithmic schemes, computational complexity and programming and parallelisation difficulty. Five problems were proposed: A Multiplication of matrices with rectangular holes: Two square matrices with rectangles of zeros are multiplied. The sequential solution uses the zeros structure to accelerate the computation and does not optimize memory access. In a multiplication AB, matrix B is accessed by columns, which can be improved just by transposing matrix B. B Live game with variable neighborhood: This is a live game in which the neighborhood varies in different generations, with the neighbors being the positions at a given Manhattan distance. The sequential program follows an iterative scheme, and parallelisation is achieved inside each iteration. The computational cost and the memory access in each iteration have order O n2 , which makes it difficult to obtain efficient parallel versions. C Obtain values in given positions after sorting: An array of integers is given, together with a set of positions. The problem is to obtain the values
in these positions when the values are sorted. The sequential solution sorts the positions and obtains the values in those positions by applying a partition scheme recursively. The quicksort pivoting strategy is applied to obtain the element in the middle position. D Multiply four dense square matrices: Three typical and naive matrix multiplications are performed. It is possible to optimize the memory access, as indicated for problem A. Two of the multiplications can be performed in parallel, which would allow a better use of the cluster. E Knapsack problem with affinities: We have a number of knapsacks with a certain capacity each, and a set of objects with a certain weight and with affinities between the objects. The objective is to obtain the assignation of objects to the knapsacks with the highest total affinity. The sequential solution follows a backtracking scheme. One possibility is to parallelize the backtracking algorithm by generating a set of subproblems, and to assign a number of subproblems to different processes or threads. The problems are modifications of well-known problems, follow typical algorithmic schemes and can be parallelized with parallel schemes which are explained in parallelism books. The goal is to achieve a high speed-up, and for that it is necessary not only to solve the problems but also to optimize the sequential program and to adapt the parallel program to the computational system. Roughly speaking, we can estimate the ease of parallelism of each problem by multiplying the speed-up expected by sequential optimization, by the use of multiple nodes with message-passing and by the use of all the cores in a node. The maximum estimated speed-up is shown in table I. sequential message-passing shared-memory maximum speed-up
A 3 3 6 54
B 1 1.5 6 9
C 1.2 1.5 4 7.2
D 4 3.5 7 98
E 2 3.5 6 42
Table I E STIMATED MAXIMUM SPEED - UPS ACHIEVABLE , AND THE MAXIMUM WITH SEQUENTIAL , MESSAGE - PASSING AND SHARED - MEMORY OPTIMIZATION .
•
•
Programs: Some programs solving the problems are included in the website. A table of records is maintained. There is the possibility of sending solutions to be tested for a new record. Explanations: The solutions sent to be considered for the record table are reviewed and tested to be declared a new record. For each record, a new entry is included, with the obtained speed-up, the code, and an explanation on the improvement achieved.
•
• •
The contest: The contest itself is an educational resource. The participants can put into practice their knowledge about parallel computing, working in a real environment, collaborating with other colleges and searching for efficient solutions for the problems. The website: An updated website with all the information of the contest is available [3]. CUDA stuff: For the second edition, a CUDA competition is planned. So, similar material to that generated for MPI+OpenMP programming will be generated.
IV. T RAINING EXPERIENCES Some examples of use of the material generated for educational purposes are shown. Table II shows the use of each of the resources mentioned in the different experiences. A. The contest The whole contest, with the warm-up session and the four hours of competition, is an experience in which students with previous knowledge of parallel programming can participate and so enrich their knowledge in this field. The contestants can continue working in the problems of the contest after this has finished. The team with the best solution for each problem is asked to prepare a short explanation, which is reviewed by three members of the organizing committee. It is an experience of the classical publication process for students who are not familiar with it. B. Initiation to parallel computing Some of the resources are being used in a project of early initiation to parallel computing at the University of Murcia. This project is in the Early Adopters program [5]. The cluster used in the contest and the Mooshak tool are used for some practices, with the basic problems of the warm-up session and some additional basic problems. C. Parallel programming courses Some resources are being used in higher level courses with a practical orientation and in which OpenMP and MPI represents an important part. The problems of the contest are used, and the generated codes are tested for new records. A project with six Spanish universities began for the academic year 2011-2012. In the first semester there was participation of courses at the universities of La Laguna and Murcia: • University of Murcia: The problems are used for the practical works in the course Algorithms and Parallel Programming. All the students have followed courses on Concurrent programming and Parallel architectures. The course is devoted to general concepts of parallel programming, systems and paradigms, parallel programming environments and methodology and parallel algorithmic schemes. A first practical is organized to introduce students to OpenMP and MPI. In laboratory sessions the use of the two parallel programming environments is shown in the systems of the contest.
•
The students must develop versions of the mergesort algorithm. In a subsequent practical, each student solves two of the five problems: one of the two matrix multiplication problems and one of the other problems. For each problem OpenMP, MPI and MPI+OpenMP solutions are developed, the algorithms are theoretically analysed and the programs experimentally studied in two systems (one cluster with 5 nodes, and a sharedmemory system with four hexacores). The codes are run in the cluster of the contest, and the speed-up is compared with that in the record table. University of La Laguna: The problems of the contest are used in laboratories. The course involved is the Parallel Programming introductory course at the Computer Science degree. The problems of the contest are used at the end of the semester in the assessment process. Along the semester, the students have been trained in programming tools (MPI, OpenMP and mixed programming) in different parallel systems that combine shared memory and message passing architectures. Basic parallel algorithmic techniques are also part of the teaching matter. The laboratories are given based on small samples and test problems. At the end of the semester, the students are required to solve some of the problems of the contest as individual projects. The educational aim is to analyse the skills and knowledge acquired during the course with the extra motivation of improving the records of the contest. Two problems have been selected in terms of difficulty degree. First, problem A was assigned and then problem D. The students have a deadline to upload solutions according to the difficulty of the problem.
D. Master level The resources can be used in courses at Master level in a similar way as for parallel programming courses. During the first semester of 2011-2012 they have been used in a course on Parallel Programming and High Performance Computing at the University of Murcia. An initial practical is devoted to optimizing the three matrix multiplications in problem D, and the performance obtained is compared with that with a multithreading implementation of BLAS. The combination of OpenMP and BLAS parallelism is also considered. Next, problem A is used in a practical where the different types of parallelism and matrix computation techniques are used to obtain efficient implementations. E. Extracurricular courses Due to the increasing popularity of parallel computing, extracurricular courses are being organized by the Scientific Computing and Parallel Programming group of the University of Murcia in collaboration with the SCC. The courses are for researchers and professors of Murcia and neighbouring universities and for specialists working in
resource Subcluster Modif. Mooshak Problems Programs Explanations Contest Website CUDA stuff
Finished and on-going experiences Contest Initiation Paral. Prog. Master X X X X X X X X X X X X X
Planned and possible experiences Extra-curriculum CUDA Courses Autonomous X X X X X X X X X X X X X
X X X
X
X X
X X
X X
Table II U TILIZATION OF THE EDUCATIONAL RESOURCES IN DIFFERENT TRAINING ACTIVITIES .
business and industry. The courses are very successful, with approximately 20 participants per year, and with approximately the same number of professors, researchers, students and industry participants. The resources of the contest are used in the course in 2012: the modifications introduced in Mooshak for the cluster are commented on, and experiments are carried out with some basic problems. F. CUDA courses It is planned to include a CUDA competition in the contest in 2012. In the second semester of 2011-2012 there is a course on Multicore Architectures Programming in the Computing Engineering Degree at the University of Murcia, and the course will be used to prepare the competition. The problems in the first edition will be used in practicals, which will consist (partially) of the development of CUDA solutions. G. Other courses The use in courses at different levels of the material generated in the contests has been commented on. Some of the materials can be used in other similar courses. The modifications included in Mooshak can be used for other contests and practices where a cluster is used, and it is also possible to make additional modifications to adapt Mooshak to the particularities of the computational system in use. Some of the problems of the contest can be used for the practicals, and the programs and explanations included in the record table can be used as examples. H. Autonomous study The information on the website can be used for autonomous study by people with some basic formation in parallel computing. The problems can be used for practice, and solutions can be sent to be tested for the record table. The website is dynamic and new problems and programs will be included. V. C ONCLUSIONS AND PERSPECTIVES The paper shows the structure of the Spanish Parallel Programming Contest. In the first edition 8 teams participated, with three students in each. To enlarge the number of participants, online participation is allowed. Our aim is
to generate materials that are useful for training. The experience of using these resources in courses at different levels has been summarized, and some possibilities of use in other activities have been commented on. Some modifications will be included in the second edition of the contest: organization in English, CUDA competition, and new modifications to Mooshak. To promote and considerably improve the contest and its use for educational purposes a different organization is needed. The organizers are lecturers in different Spanish universities and staff of the SCC, who organize the contest and manage the website without any support other than their interest in parallel computing and the free time they dedicate. To improve the contest and to make it more visible and the resources richer, it would be necessary to devote staff and computational systems, and to have the support of some organization or company. ACKNOWLEDGMENTS Funded in part by the Spanish MCYT (TIN200806570-C04-02 and TIN2011-24598) and by the Fundaci´on S´eneca, Consejer´ıa de Educaci´on de la Regi´on de Murcia (08763/PI/08). The authors acknowledge the computer resources and assistance provided by the Supercomputing Centre of Fundaci´on Parque Cient´ıfico of Murcia. R EFERENCES [1] Student Cluster Competition, sc10.supercomputing.org/ ?pg=studentcluster.html.
http://
[2] Marathon of Parallel Programming, http:// regulus.pcs.usp.br/marathon/current/ index.html. [3] Spanish Parallel cpp.fpcmur.es.
Programming
Contests,
http://
[4] Mooshak, system for managing programming contests on the web, http://mooshak.dcc.fc.up.pt/ ˜zp/mooshak/. [5] Website of the Early Adopter project at the University of Murcia, http://www.um.es/earlyadopters/.