Solution of Parallel Computation Problem Based on “Space–Time ...

8 downloads 46 Views 219KB Size Report
interacting programs distributed among a group of ... program mechanisms based on the “space–time” .... PYTHON language in the context of the correspond.
ISSN 03617688, Programming and Computer Software, 2012, Vol. 38, No. 4, pp. 189–200. © Pleiades Publishing, Ltd., 2012. Original Russian Text © A.I. Ilyushin, M.A. Olenin, S.A. Vasil’ev, 2012, published in Programmirovanie, 2012, Vol. 38, No. 4.

Solution of Parallel Computation Problem Based on “Space–Time” Concept A. I. Ilyushin, M. A. Olenin, and S. A. Vasil’ev Faculty of Computational Mechanics, Department of Mechanics and Mathematics, Moscow State University, Moscow, 119992 Russia Keldysh Institute for Applied Mathematics, Russian Academy of Sciences, Miusskaya pl. 4, Moscow, 125047 Russia email: [email protected] Received January 16, 2012

Abstract—Construction of distributed systems by way of composition of program objects is considered. It is proposed to define topology of links between the objects by describing a “neighborhood” of each object in the form of a list of “formal neighbors.” Synchronization of evolution of the object and its neighbors is described in terms of “local time” of the object and its neighborhood. Results of solution of real problems on a super computer are presented. They demonstrate that it is possible to the reduce labor input required for the cre ation of distributed software systems to that of local programming. Keywords: Composition of program objects, object link topology, formal and actual neighbors, synchroniza tion of object evolution, local time of an object and its neighborhood. DOI: 10.1134/S0361768812040020

1. INTRODUCTION It is commonly recognized that programming of multiprocessor (multicore) computing systems is a laborconsuming task that can be performed by only highly skilled programmers. This point is justified, in particular, by the following citation from the interview of the computing pioneer John Hennessy, President of Stanford University [1]: “… when we start talking about parallelism and ease of use of truly parallel com puters, we are talking about a problem that is as hard as any that computer science has faced. I would be pan icked if I were in industry.” Let us try to find out what the problem is. Imagine a program system consisting of a great number of interacting programs distributed among a group of processors. In this case, unlike in the case of one sequential program running on one processor, there arise two basic problems: (1) The problem of establishing links between remote programs and managing these links. (2) The problem of partial ordering of actions per formed by these programs and synchronization of their interaction. It is for solving these problems in a “manual” mode that outstanding intelligence and significant labor input are required [2]. It turns out that the concurrent case can be reduced to the sequential one in the sense of the requirements to the programmer and the pro gramming process by introducing fundamentally new program mechanisms based on the “space–time”

concepts. The point is that any system consists of some parts with a particular set of links between them. This means that almost all applied programmers will pro gram about the same actions in order to control links between the parts of the program system. A similar sit uation existed, for example in calculation of arith metic expressions before FORTRAN came to exist ence. Each programmer programmed arithmetic expressions in assembler. A natural way out of such a situation is to “transfer” all routinely repeated actions into a system part. Such transfer can be done only based on a formal definition of some concept that describes the class of actions to be represented in an algorithmic form. For the sets in which relations between the elements are identified (in other terms, for each element, its neigh borhood in the form of a subset of elements related to the given one is defined), such a concept is that of a topological space. Ordered actions carried out in different parts of the system evolving in parallel are adequately described, from the point of view of the authors, in terms of the time concept. It is these two concepts that underlie development of a complex of software tools aimed at reducing the labor input of programming of concurrent processes to that of sequential programming. The practical basis for the presented development was solution of real problems of mechanics on modern supercomputers. In particular, several computational

189

190

ILYUSHIN et al.

models in the field of gas dynamics were calculated. However, the authors believe that the proposed pro gramming tools are based on concepts that are com mon to all applied systems. 2. BASIC IDEAS OF SOLUTION 2.1. Decomposition/Composition of Program System We start from the authors’ point of view of the gen eral scheme of development of distributed program system. When designing any program system, it is nat ural to begin with the definition of its overall charac teristics. When a distributed system is designed, it is required to decompose it and to identify parts that will presumably be run on separate processors. For exam ple, for computational models of physical systems, such decomposition is usually performed on the level of a discrete model [3]. This is followed by program ming of the identified parts. It is assumed that, at the decomposition stage, for each part, its interface with other parts is defined in the form of a set of functions performed by the given part and sets of functions that are performed by other parts and are used by the given one. After this, the program ming of each part can be done independent of other parts of the system. Finally, the parts are assembled into the target sys tem. The basic goal of this work is to develop system software means that allow one to efficiently program the assembling of independently created subsystems into the desired distributed system. 2.2. ObjectOriented Approach In this study, we rely on an ordinary objectoriented programming model augmented by certain new fea tures. The program system is built from a set of inter acting objects in the framework of some objectori ented programming environment. We use the scheme that assumes that several processes may simulta neously run in one object and that one process may pass through many objects, which are possibly located on different processors. It is assumed that the decomposition of the system into objects is performed either at the design stage upon creation of complex heterogeneous systems or be done automatically. Examples of systems that can be decomposed automatically are computational models of gas dynamics for relatively simple regions from the point of view of their geometry and processes occur ring in these regions. For each object, an interface is defined as a set of functions for calls. An object can interact with another object only by calling operations from the interface of this object. In the course of computation, the set of objects composing the model is mapped onto the set of the processors. The typical case is considered to be that

where one (possibly, virtual) processor is allocated to one object. 2.3. Formal/Actual Neighbors In order to combine a set of objects into as system, it is necessary to specify relations between them. For this purpose, we use the concept of “neighbor hood.” An object B is considered to be a neighbor of an object A if the evolution of object A requires its interaction with B. It is worth noting that the concept of neighborhood may be both unilateral and bilateral (the objects interact with one another). The set of all neighbors of a particular object determines the “exter nal world” for this object, with which it interacts. Recall that, for each object, a certain interface is defined, and that interaction proceeds only through calls of functions from this interface. Therefore, in programming a particular class, it is sufficient for the programmer to have a list of “formal neighbors” with their interfaces in order to describe interaction with other objects. “Actual” neighbors, the classes of which implement interfaces from the descriptions of formal neighbors, will appear only during run time. Thus, when programming each object, the applied programmer may rely on only local considerations about the “external world” for this object represented by the list of formal neighbors. 2.4. Topological Spaces of Objects and Organization of Links between the Objects A distributed system is built from parts that interact with each other. In the program implementation, rela tions between the parts are specified by links. In the traditional programming, all links for each part of the system are defined manually. This is a very cumber some task if the system consists of tens of thousands of parts with complicated topology of relations between the parts. For the objectoriented case, this is pro gramming of assignment of references to the neigh boring objects to local variables of the object. In addi tion to the difficulty of manual assignment of refer ences, there arises a system problem associated with the correction of links when the objects move from one computational node to another. This is required for balancing load of a multiprocessor computing system (MCS) by swapping in/out inactive or lowpriority objects (virtual object memory is an analogue of pag ing for traditional virtual memory). In manual link management, solution of this problem is too hard. In our approach, the use of the list of formal neighbors and transfer of the task of forming links to actual neighbors from the applied programmer to the system part allows us to easily solve this problem. For simple and clear specification of links between the objects, we introduce the concept of “space of objects.” From the point of view of a mathematician, the space of objects is an ordinary topological space.

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

SOLUTION OF PARALLEL COMPUTATION PROBLEM BASED

The automation consists in providing the user with a simple interface to work with link topology, and the problems of link creation and support of their actuality are implemented by mechanisms built in the proposed system. In our case, topology is defined by a introducing metric space and nearness function. Each object is made to correspond to a certain point in the metric space, and the nearness function determines whether there are links between pairs of points. The applied programmer is suggested to specify coordinates of each object in the metric space either statically before the computation starts or dynamically in the course of the computation. For example, it is required to assign integers to the vertices of an arbitrary graph and obtain placement in the onedimensional metric space. To define the near ness function, it is convenient to use a table of pairs of neighboring vertices (edges). For any pair of vertex numbers, this function answers the question of whether this pair is presented in the table. Another example is specification of a multidimensional lattice. For the space, a multidimensional space with integer coordinates is selected. For each point, the set of pos sible neighbors is easily identified: these are points that differ from the considered point in exactly one coordi nate by 1. The nearness function answers the question of whether the point belongs to this set. 2.5. Synchronization of Calls Based on Local Object Time Synchronization of computation is a complex problem in the multithread programming. To solve this problem, the authors suggest using a mechanism based on the “TIME” concept. In almost all fields of human’s activity, time is always used for ordering actions. In particular, time is used as synchronization means in numerous systems for modeling digital devices and discrete control sys tems. In our case, “TIME” is a marking of a sequence of actions in the considered object by numbers (in other terms, marking of sequential states of the object). Syn chronization is achieved through the use of a mecha nism that ensures interaction of a pair of neighboring objects only when their local times are identical. In the framework of such an approach, the pro grammer must not watch the neighboring objects and synchronize computations in them. It is sufficient just to increase the value of variable “TIME” locally in each object, and computations are synchronized by the described system automatically. 3. APPLICATION SYSTEM PROGRAMMING TOOLS Let us consider particular programming tools for applied distributed systems that are based on the PROGRAMMING AND COMPUTER SOFTWARE

191

Table Number of processors

Time, s

Efficiency, %

1 2 4 8 16 32 64

1377.28 689.39 344.81 172.62 87.48 50.72 31.86

100 99 99 99 98 84 68

abovediscussed ideas. It is assumed that computation is carried out in almost arbitrary existing environment of objectoriented programming augmented by some additional classes and an additional library of run time support. In this work, all example programs are in the PYTHON language in the context of the correspond ing programming system. This new environment with additional means for creating distributed programs will be referred to as the OST (Object–Space–Time) system. In the program fragments presented below as examples, the following notation is used: the italic type indicates that the given name is abstract and should be replaced by the programmer into a particular one in a particular applied program; the bold types means that the given name is fixed in the OST environment and should be the same in any applied program; the ordinary type is used if the given name is fixed in the environment of the given programming lan guage. 3.1. Determination of Link Topology in Application System When using the OST system, links between objects in an application system are specified by defining topology of some auxiliary space. Mathematical terms are used here because the concepts of a neighborhood of an object and system connectivity are defined in exactly the same way as the corresponding concepts in mathematics, and the program tools for defining links between objects can be viewed as a particular mecha nism of defining topological spaces. As a result, it is convenient to use a wellknown set of topological con cepts. Let us consider particular mechanisms of work with topologies of sets of objects in the proposed sys tem. 3.1.1. Topology definition. In this work, topology in some space is introduced by defining a neighborhood for each point in the space as a set of neighboring points, or, simply, neighbors. For objects composing

Vol. 38

No. 4

2012

192

ILYUSHIN et al.

the distributed system being created, the concept of a neighborhood corresponds to the concept of “external environment” of the object. In the OST system, the topology is defined by means of a class in which either a nearness function, which checks whether two points are neighbors, or a neighborhood description function, which specifies the set of point belonging to the neighborhood of the given point, are defined. Such separation makes it pos sible to easily describe different topologies. For exam ple, the neighborhood description function suits well for the description of regular structures, such as inte gervalued lattices. Vice versa, the nearness function is suitable for description of irregular structures, for example, in the case of interaction through the gravi tational field in celestial mechanics. 3.1.1.1. Defining object neighborhood by means of nearness function. Given two points A and B, the near ness function determines whether point B belongs to the neighborhood of point A. The neighborhood of a fixed point A itself can be constructed by the OST sys tem, for example, by searching all possible values of B and checking them by means of the nearness function. We will illustrate definition of topology on the example of a graph with numbered vertices. The graph of links that has no regular structure can be described only by means of a table that stores a list of neighboring verti ces for each vertex of the graph. The nearness function in this case checks whether there is an edge between a pair of vertices in this table, which is referred to as self.edges in the example below. In order to be able to store information describing, for example, graph edges upon defining the topology, the nearness function is not implemented as a single function but is placed into the topology description class. class applied_topology(ost.Topology.Abstract): #Table specifying graph edges #For each vertex p1 in self.edges[p1], list is stored of #vertices p2 that are connected to p1 by an edge self.edges = { p1: [p1_1,…,p1_k1] p2: [p2_1,…,p2_k2] … pM: [pM_1,…,p1_kM] } #Nearness function checks whether vertex p2 is presented #in the list of endpoints of edges leaving p1 def proximity(self, p1, p2): # operator in returns bool value of occurrence # in list self.edges[p1] of element p2 return p2 in self.edges[p1] 3.1.1.2. Defining object neighborhood by means of neighborhood description function. Given a point, the neighborhood description function returns the list of

coordinates of all points that belong to its neighbor hood. Let us consider the application of this function on the same example of specification of the link graph. class applied_topology(ost.Topology.Abstract): #Table specifying graph edges #For each vertex p1 in self.edges[p1], list is stored of #vertices p2 that are connected to p1 by an edge self.edges = { p1: [p1_1,…,p1_k1] p2: [p2_1,…,p2_k2] … pM: [pM_1,…,p1_kM] } #For vertex p, the neighborhood description function #returns list of all vertices to which the given p is connected def neighborhood(self, p): return self.edges[p] The decision on which of the suggested functions to use for defining topology of a particular distributed system is made by the applied programmer. In some cases, it is convenient to use the nearness function. Consider, for example, the twodimensional plane and neighborhoods in the form of circles. class applied_topology(ost.Topology.Abstract): # Function for calculating distance between points def distance(self, p1, p2): return sqrt((p1[0] – p2[0])**2 + (p1[1] – p2[1])**2) # Nearness function. Distance between points is less than radius def proximity(self, p1, p2): return self.distance(p1, p2) < radius In other situations, the use of neighborhood description function turns out more convenient and clear. For example, in the case of a twodimensional integer lattice, the set of points belonging to a neigh borhood can be described as follows: class applied_topology(ost.Topology.Abstract): def neighborhood(self, p): return [ [p[0] + 1, p[1]], # point on the right [p[0] – 1, p[1]], # point on the left [p[0], p[1] + 1], # point on the top [p[0], p[1] – 1] # point from the bottom ] It should be noted that, in the case where the near ness function is used in the OST system for determin ing a neighborhood of a particular point, it is required to carry out the neighborhood check by means of this function for all points. In the case of a static topology, this is not very essential. However, if the topology of

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

SOLUTION OF PARALLEL COMPUTATION PROBLEM BASED

links varies in the course of computation, it is recom mended to use the neighborhood description function in order to optimize computation time. In the OST system, there are standard topology classes included in the system. In these classes, the most frequently met topology types are implemented by means of the abovedescribed mechanisms. In the case of a nonstandard topology, the programmer may create his/her own topology class and use it in the cre ated application systems. 3.1.2. Local and global topologies. In this section, we discuss how to use the already available topology classes for specifying links between objects of the application system when developing a model. When creating a model in the OST system, the programmer may define a topology class either as a class that is common for all objects in the system or as a class for objects with a specific neighborhood topology. When the OST system starts (and during the run in the case of dynamical modification of links), it automatically determines and sets all the required links between the objects. Thus, instances of topology classes in the OST sys tem can be used in two ways: as global (specifying topologies for all objects) and local (specifying neigh borhood topology for one object) ones. An object of global topology is specified in the pro gram of initialization of the application system as parameter toplogy of the model being created. As an example, we present a part of a program where a two dimensional integer lattice is defined by means of the standard class ost.Topology.Mesh, which defines topology of integer lattices. # Definition of global topology objInit.topology = ost.Topology.Mesh(dimension = 2) In the OST system, if the global topology is not defined, the standard topology is used, in which the neighborhood of each object is empty. Then, all links are to be defined by local topologies. The object of local topology is determined upon object creation, for example, in the program of initial ization of the application system, as parameter toplogy of the object being created. Let us give an example of the neighborhood definition by means of the standard class ost.Topology.Neighborhood, in which a point neighborhood is described in the form of a constant set of coordinates of neighboring points. # Definition of local topology object.topology = ost.Topology.Neighborhood([[x1, y1], …, [xN, yN]]) In this example, a neighborhood of object object is defined, which consists of points [x1, y1], …, [xN, yN]. To illustrate the use of local and global topologies, we give an example of a program of model creation. # Definition of local topology objInit.topology = ost.Topology.Mesh(dimension = 2) # Object creation PROGRAMMING AND COMPUTER SOFTWARE

193

object1 = objectInit.createObject() # placement of an object to the point with the coordi nates [0, 0] objInit.set(object1, [0, 0]) object2 = objectInit.createObject() objInit.set(object2, [0, 1]) object3 = objectInit.createObject() objInit.set(object3, [1, 1]) object4 = objectInit.createObject() # Definition of local topology object4.topology = ost.Topology.Neighborhood([[0, 0]]) objInit.set(object4, [2, 2]) In this example, by means of global topology, a link between pairs of objects with the coordinates [0, 0], [0, 1] and [0, 1], [1, 1] is defined. By means of local topol ogy, a unilateral link for an object with the coordinates [2, 2] and object [0, 0] is defined. The use of other topology classes, including users’ ones, is similar to the above example. 3.1.3. Formal and actual neighbors. In this section, we consider a mechanism of interaction of objects of an application system with its neighborhoods. Upon creation of a model by means of the OST system, the user describes some set of object types. The type is determined by an application class based on only local considerations related to objects of the described type. Particularly, locality here means that the program mer describes internal functioning of an object of the given type in a completely traditional way, and the neighborhood is defined as a list of formal neighbors. The concept “list of formal neighbors” may be viewed as generalization of the concept of a list of for mal parameters for a subprogram. The elements of this list are object–stubs with given interfaces, from which one can call operations in the same way as in ordinary objectoriented programming. In the course of program run, the OST system sub stitutes links to actual neighbors, i.e., objects that fall into the neighborhood of the given object instance, for the object–stubs. To illustrate this, we give an example of call of func tion fun from a neighboring object with number i in the list of the neighbors. # Call of function fun of neighbor with number i self.neighbors[i].link.fun() To improve readability of a program, one can use synonyms for the neighborhood directions in a partic ular topology. For example, in the case of a two dimensional integer lattice, possible neighbors are specified unambiguously: the left, right, top, and bot tom neighbors. Below is a similar example of a func tion call from the left neighboring object: # Call of function fun from the left neighbor self.left.fun()

Vol. 38

No. 4

2012

194

ILYUSHIN et al.

To illustrate topology use, we present examples of classes of objects of application systems. # Example of object class illustrating work with the list of neighbors. class applied_object(ost.Object.Abstract): # In this example, technical part of the class # that is not related to the use of neighbors is omitted # An interface function # outputs the incoming message onto the screen def fun(self, message): print “Call function with,” message # Function performing calculation def run(self): # Loop in algorithm iterations while : # search for all neighbors for neighbor in self.topology.neighbors: # call function fun from each neighbor neighbor.fun() # Class of objects working via neighbor synonyms class applied_object(ost.Object.Abstract): # In this example, technical part of the class # that is not related to the use of neighbors is omitted # An interface function # outputs the incoming message onto the screen def fun(self, message): print “Call function with,” message # Function performing calculation def run(self): # Loop in algorithm iterations while : self.left.fun() For the userdefined topology classes, the OST sys tem has mechanisms for synonym specification. 3.1.4. Examples of standard topologies. In this sec tion, we consider standard topology classes available in the OST system. The integer lattice class ost.Topology.Mesh. This topology is characterized by simplicity of def inition of a neighborhood in the space of arbitrary dimension. The neighborhood of a point is defined as simple as a set of points such that exactly one coordi nate of any point differs by 1 from the corresponding coordinate of the point under consideration. When this class is used in a constructor, one parameter— space dimensionality—is specified. For small dimen sions (1, 2, or 3), synonyms of the “neighborhood directions” are provided. An example of use:

#Topology definition .topology = ost.Topology.Mesh(dimension = ) # Available synonyms of neighbors depending on dimensionality # dimension = 1—left, right # dimension = 2—left, right, up, down # dimension = 3—left, right, up, down, front, behind The ring class ost.Topology.Ring. This topology is a ringlike graph consisting of N points numbered from 1 through N. The neighbor hood of a point consists of the left and right neighbors. The ring topology is convenient to use in the problems involving cyclic transfer of data between the parts of the application system. For example, such a topology is used in concurrent multiplication of matrices, when strips of columns are transferred through a ring of pro cessors in which strips of rows are stored. When this class is used in a constructor, the number of points in the ring is specified. An example of use # Topology definition .topology = ost.Topology.Ring(N = < the number of points in the ring >) # Available synonyms of neighbors are left and right. The class of unstructured lattice ost.Topol ogy.Graph. By means of this topology an arbitrary graph is specified. The neighborhood of a vertex includes verti ces connected to the given point by an edge. When the class is used in practice, the user may use various inter faces of the constructor. Topology definition by means of a table that, for each vertex, describes its neighborhood edges = { P1: [P1_neighbor_1, …, P1_neighbor_N1], … Pm: [Pm_neighbor_1, …, PM_neighbor_Nm]} In this example, P1, …, Pm are vertices of the graph, and the list [Pi_neighbor_1, …, Pi_neighbor_Ni] specifies a set of vertices belonging to the neighbor hood of the ith point. # Topology definition by means of the connectivity table .topology = ost.Topology.Graph(edges = edges) A topology can also be defined by means of side graph generators, for example, by means of the widely used Metis system: # Topology definition by means of the Metis system .topology = ost.Topology.Graph(metis_datafile = datafile.dat) The fixed neighborhood class ost.Topology.Neigh borhood.

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

SOLUTION OF PARALLEL COMPUTATION PROBLEM BASED

By means of this topology class, a fixed neighbor hood of an arbitrary point in the space consisting of a fixed number of points is described. In the OST sys tem, the fixed neighborhood class is used in the case where a local topology of a particular object is to be specialized. In the constructor, a list of coordinates of the points belonging to the neighborhood is specified. # Local topology definition object.topology = ost.Topology.Neighborhood([P1, …, Pk]) # Here, Pi = [Pi_x1, …, Pi_xN] is a set of coordinates of the ith neighbor. 3.2. Marking of Actions in an Application System by Time It is assumed that each object of the considered dis tributed system evolves in the course of run passing through a sequence of states. The programmer is sup posed to mark each state by a particular value of “local object time,” which is stored in a special variable. The local object time may correspond to physical time for a process modeled inside an object or may be artificial time introduced just for ordering actions per formed. For example, it may be an ordinal number of an iteration in an algorithm given in an object. Synchronization of computations between the objects is implemented by the OST system based on the only rule: interaction is permitted if the local times of the objects are identical. In the course of evolution, objects perform queries on advancement of their local times. The OST system processes these queries and, if needed, stops calcula tions to meet the synchronization rule. This rule in the OST system is formulated as fol lows: to advance local time of a particular object to a certain value, there should be queries to the OST sys tem about the advancement of local times to the same value from all neighbors of the object under consider ation. Such a rule, first, guarantees that the situations where an operation is called from an object whose local time is greater than that in the calling object are impossible. Second, sufficient freedom is ensured to carry out concurrent computations for nonadjacent objects, since discrepancy of local times between a pair of nonadjacent objects by the sum of time increments for all intermediate objects located on the way by links between the considered nonadjacent objects is per mitted 3.2.1. Synchronization of calls between objects. One object interacts with another through a service object–reference. If the local times of the calling and called objects are not equal, the OST system postpones the call until these times coincide. The object–reference to an “actual” neighbor in the OST system is a special communication object, which secures a local call (if the calling and called PROGRAMMING AND COMPUTER SOFTWARE

195

object are in the same address space) or a remote call (if the calling and called objects are located on differ ent processors) keeping the aboveformulated syn chronization conditions. 3.2.2. Methods to keep synchronization conditions. Currently, there exist many different timesynchroni zation algorithms [4]. One extreme variant is to weaken call permission conditions reducing them to the requirement that the time of the calling object is not less than the time of the called object. In this case, many parallel computations are permitted; however, there may arise a situation when the time in a once called object advanced as a result of this call and, then, a call came from another object whose local time turned out less than the time of the object being called. In this case, a rollback (or, possibly, cascade rollback) of the system to an earlier state is required. Another extreme variant is an algorithm that glo bally orders all local times and permits a call only if it is marked by global minimal time. In this case, com putation parallelism is lacking. Generally speaking, it is assumed that there is a collection of algorithms, among which the user selects one that is optimal for a particular applied problem. In our opinion, in the majority of cases, the optimal algorithm is that without rollbacks, an intermediate variant between the two abovedescribed extreme vari ants. A synchronization rule for such an algorithm is presented in Section 3.2. 3.3. Object File For efficient work with the OST system, the user needs tools for creating and storing application sys tems. For this purpose, the authors suggest using an object file—a universal container for model storage. 3.3.1. Model creation. To create a model, the user writes a program of model creation. In the course of operation of this program, an object file is created into which the set of objects of the application system is placed. The objects in this file are stored in a “serial ized” form. In the course of run of an applied problem, the “serialized” objects are swapped from the file into the operating memory, “deserialized,” and “calcu lated.” The objects composing the model can, of course, be created and deleted dynamically in the pro cess of main computation. After placing all objects of the model into the object file, all additional components required for function ing of the application system are also placed into it. These components include data (for example, initial data and boundary conditions), source codes, librar ies, etc. The result of program operation is a unique object file, which contains everything that is required for run ning and subsequent functioning of the model on the MCS.

Vol. 38

No. 4

2012

196

ILYUSHIN et al.

An “almost” executable example presenting all stages of creation of a parallel program system is given in Appendix 2. 3.3.2. Model deployment on MCS. The OST sys tem relies on the principle of universality of work with object files on different computing systems. For the user, this means that the model can be deployed from the object file both on a home PC and on a MCS with out modifying the model or the object file. In the course of preparation of the model for com putation, the OST system automatically distributes objects from the object file among available computing resources and establishes links according to the given topologies. If optimization of computer resources is needed, the OST system may launch a swapping mechanism to move some objects between the proces sors or swap them out to the object file. Note that the given scheme of operation allows us to implement the mechanism of check points and restarts in a natural way. In the object file, at any time instant, the current state of the model objects, as well as (possi bly) several generations of these states, is stored. Therefore, the computation can be resumed from the place where it was stopped for one or another rea son and even on another MCS. 4. PRACTICAL RESULTS With the help of the OST system, a number of test and real problems from the field of gas dynamics were programmed and solved. One of the test problems— multiplication of matrices—clearly demonstrated computation speedup with the growth of parallel parts. This problem was solved by the algorithm based on band matrix separation [5]. Test calculations dem onstrated almost linear speedup upon increase of the number of processors in the MCS (see Appendix 1, Example 1). One of the practical problems was implementation of the parallel version of the complex of programs M2DGD for solving twodimensional problems of gas dynamics [6]. Specifically, the cone flow problem was solved. For this problem, calculations were carried out with different numbers of parallel parts. Results obtained with the help of the parallel version of the complex of programs completely coincided with refer ence results obtained in the onethread computation. Efficiency of parallelization was compared with that of the parallel version of M2DGD implemented with the help of MPI. For the identical parameters of the problem, almost identical run times were obtained for both implementations. The comparison showed that the use of the OST system for automation of con struction of links and for synchronization does not result in the loss of efficiency of the computational model obtained compared to the model in the frame work of which the same problems were solved manu ally (see Appendix 1, Example 2).

In the authors’ opinion, results of test and practical calculations demonstrated effectiveness and conve nience of work with the OST system. 5. CONCLUSIONS In our opinion, the main result of this work is that we managed to reduce complexity of creation of dis tributed parallel programs to that of programming of local systems. In the proposed program tools, concepts “space– time” are formalized. Such formalization continues the general tendency of development of programming means consisting in gradual formalization of general concepts from application fields. The examples are complex data structures designed for mapping com plex structures from application fields onto them. The concept of a process reflects the concept of application field evolution; the concept of an object reflects an applied object; and so on. Formalization of the most general concepts allows us to move to the operating environment those actions that the applied program mer had to repeat many times in each program. When creating parallel program models for any physical fields by any method, the following three basic problems are to be solved. 1. How to formally describe parts of a physical region that evolve and interact in parallel in the course of evolution? 2. How to describe topology of links between inter acting parts of a parallel system? For regions of com plex structure mapped onto tens or even hundreds of thousand processor kernels, this is a very hard task. 3. How to synchronize parallel run of program (computational) objects that evolve in parallel? The first problem has successfully been solved, for example, in objectoriented program systems by way of description of a part of the physical field in the form of a program object in the sense of one of the object oriented languages (C++, Java, Python). Currently, the two other problems are almost always solved by the applied programmers, who repeatedly solve almost one and the same problem using lowlevel language means. The basic idea of the implemented method for solving these two problems consists in the formaliza tion of “time–space” concepts, which are quite natu ral for physical applications, in the form of appropriate software. The concept of time was used for a long time in programming, but only in very narrow fields, such as modeling of discrete devices and realtime control systems. However, it was not used so far as a universal mechanism of generalpurpose synchronization to replace currently known mechanisms for synchroni zation of parallel actions. The concepts of “space” and “object neighbor hood” in the form of the “list of formal neighbors of an

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

SOLUTION OF PARALLEL COMPUTATION PROBLEM BASED

object” (which is a fundamental generalization of the concept “list of formal parameters of a subroutine”) were not previously used. These concepts simplify description of the link topology and are used substan tially in synchronization through introduction of the “local time” in the object neighborhood. The physical meaning of local time synchronization is that, in the case of finite rate of perturbation propagation, con current calculation of “remote” parts of the physical region for different time instants is possible, since the calculated changes affect only some neighborhood of the object. In conclusion, we would like to emphasize that the problems related to specification of links between parts of a distributed system and to synchronization of

197

evolution of these parts are to be solved, and are solved, in one way or another in any system of distrib uted calculations. A distinctive feature of this work is that solution of these problems is given in the general form on the basis of concepts that reflect only those features that are common for all considered particular cases and do not relay on the features that are inherent in some case and lacking in others. There is an opinion that the application of a general approach to a wide class of particular cases is inefficient in many cases. This is true only when an attempt is made of universal use of a tool that is specific to a narrow class of cases. However, if the generalization is made correctly, then its application to a particular case is efficient.

APPENDIX 1. Calculation results for two examples. Calculations were carried out on the MCS rsc4.kiam.ru consisting of 64 nodes with two processors each. Example 1. Matrix multiplication. Floatpoint random matrices of size 1024 × 1024 were multiplied. Matrices were partitioned into different numbers of parts in accordance with the algorithm band partitioning [5]. The calculation results are presented in the table. Example 2. Calculation of cone flow. Two different parallel implementations of the program complex M2DGD—one with automation of link construction and synchronization (OST) and another with manual link construction and synchronization (MPI)—were compared (see figure). APPENDIX 2. A sequence of actions on creation of a parallel program system. Let us consider a sequence of steps to be performed by an applied programmer for the creation of an appli cation system designed for running in the OST environment. Step 1. In the first turn, the programmer selects topology classes to be used in the system for specifying links between its parts evolving in parallel. The programmer either selects among the classes of the most frequently used topologies that are already available in the OST system or describes a new topology class. Let us give an example of a topology class defined by means of the neighborhood description function. # Class with the given neighborhood description function for the ring topology class applied_ring_topology(ost.Topology.Abstract): # Class constructor def __init__(self, N): self.N = N # Neighborhood description function for a point in the space def neighborhood(self, p): # self is an analogue of this in c++ # p is a set (array) of the coordinates of the considered point # In variable p_left, the set of coordinates of the left neighbor is placed if p[0] != 0: # If p[0] is not 0, then the neighbor is less by 1 p_left = [p[0] – 1] else: # otherwise, closure through the ring p_left = [N – 1] # In variable p_right, the set of coordinates of the right neighbor is placed if p[0] != N – 1: # If p[0] is not equal to N – 1, the neighbor is greater by 1 p_right = [p[0] + 1] else: # otherwise, closure through the ring p_right = [0] # Return neighborhood description # Synonym “left” corresponds to the left neighbor PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

198

ILYUSHIN et al. Time, sec 600 502

500 459

400 300

272 250

200 156 143

100

82

41

76

0

38

1

2

4

8

16 CPUs

21 20

32

13 10 10 8.7

64

100

Figure.

# Synonym “right” corresponds to the right neighbor return { “left”: p_left, “right”: p_right } Step 2. The next step in the creation of an application program is description of classes of the applied objects that will be used in the creation of parts of the distributed application system. It is worth to recall that, in the description of such a class, the concept of the neighborhood specified by a particular topology class is used. # General structure of the applied object class class applied_object(ost.Object.Abstract): # Applied object class # The class includes functions composing object interface def fun_1(self, …): # Content of the function #… def fun_N(self, …): # Content of the function # Function defining an interface of the described object with its environment # in the form of a list of formal neighbors. # When invoked, creates synonyms and interfaces for each neighbor def init_topology(self): # Define interface and synonym for the first formal neighbor self.init_neighbor(i1, Class_interface_i1, “synonym_i1”) # Define interface and synonym for the second formal neighbor self.init_neighbor(i2, Class_interface_i2, “synonym_i2”) … # Define interface and synonym for the Kth formal neighbor self.init_neighbor(iK, Class_interface_iK, “synonym_iK”) # As well as the function that launches calculations # Functions performing calculations def run(self): # self is an analogue of this in c++ # Usually in run, a loop in the algorithm iterations is used PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

SOLUTION OF PARALLEL COMPUTATION PROBLEM BASED

199

for iteration in xrange(0, M): # To address the neighbors, a list (array) can be used # self.topology.neighbors # Address to the ith neighbor by function fun_j self.topology.neighbors[i].link.fun_j() # The use is simplified if there are synonyms # Address to the left neighbor by function fun_j self.left.fun_j() # For synchronization, advancements in time are used # In variable self.time, the current object time is stored # To advance by step time_step, a query to the OST monitor is sent # return is performed only after the advancement self.setXYZT(self.time + time_step) # The objects may cjange their positions in the space # Obtaining of the actual array of coordinates of the current object coord = self.topology.get_coordinates() # OST monitor modifies the coordinates # only with the time advancement self.setXYZT(self.time + time_step, coord) # Upon termination of calculations, terminate the calculations self.setFinish() Step 3. The last step in the creation of an application system is creation of an initialization program. In this program, an object file is created, into which the programmer places everything that is required for functioning of the model in the parallel environment. In the program, the applied objects are created one by one on the basis of the classes described in Step 2, which are combined in a unique model with the help of topology classes described in Step 1. In addition to the set of applied objects, the object file is augmented, if necessary, by files with source codes, libraries, and initial data. # Example of initialization program # Start model initialization # Create initialization object # to be stored in the file modelname.mod obj_init = ost.Core.Init(“modelname.mod”) # Define global topology class # In the given case, this is a ring of ten elements obj_init.topology = applied_ring_topology (N = 10) # This is followed by a loop in which model objects are created # In the given example, ten objects are created for the ring for index in xrange(0,10): # Creation of an object of type applied_object app_object = obj_init.create_object(applied_object) # Further, the object is filled with necessary data # If required, a local topology of the object neighborhood can be introduced # In the given example, we describe a neighborhood consisting of elements P1, …, Pk app_object.topology = ost.topology.Neighborhood([P1,…,Pk]) # Place the object to a point of the object space # The point has coordinate index obj_init.topology.set(app_object, index) # Addition of the file with the program code of the object # By means of similar constructs, other data can be added as well obj_init.addSorceFile(“applied_objects.py”) # Storing of the model in a file obj_init.save() Summary. After launching the initialization program, an object file is created, which can be placed to the OST environment for parallel execution on an MCS. PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012

200

ILYUSHIN et al.

REFERENCES 1. O’Hanlon, C., A Conversation with John Hennessy and David Patterson, Queue, 2006, vol. 4, no. 10, pp. 14–22. 2. Krste Asanovic, Rastislav Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John Kubiatowicz, Nelson Morgan, David Patterson, Koushik Sen, John Wawrzynek, David Wessel and Katherine Yelick, A View of the Parallel Computing Landscape, Commun. ACM, 2009, vol. 52, no. 10, pp. 56–67. 3. Ilyushin, A.I., Kolmakov, A.A., and Menshov, I.S., Construction of a Parallel Computational Model

4. 5. 6.

7.

through Composition of Computational Objects, Mat. Modelirovanie, 2011, vol. 23, no. 7, pp. 97–113. Chetlur, M. and Wilsey, P.A., Experimental Computing Laboratory, Dept. of ECECS, PO Box 210030, Cincin nati, OH 452210030. http://math.csu.ru/~rusear/DipKurs/ParMetUmn Matr.html. Menshov, I. and Nakamura, Y., Hybrid Explicit– Implicit, Unconditionally Stable Scheme for Unsteady Compressible Flows, AIAA J., 2004, vol. 42, no. 3, pp. 551–559. ost.kiam.ru.

PROGRAMMING AND COMPUTER SOFTWARE

Vol. 38

No. 4

2012