Scheduling and Packing In The Constraint Language cc(FD)

0 downloads 0 Views 306KB Size Report
algorithm. Both applications show the versatility of languages such as cc(FD) for the ... produce a speci c language by de ning a constraint system (i.e. de ning a set of primitive ... The DSP application consists in scheduling tasks on a number ..... ter (processor 1) has a direct communication link with all processors in the ...
Scheduling and Packing In The Constraint Language cc(FD)

Pascal Van Hentenryck Department of Computer Science Brown University Providence, Rhode Island 02912 CS-92-43

September 1992

Scheduling and Packing In The Constraint Language cc(FD) Pascal Van Hentenryck

Brown University, Box 1910, Providence, RI 02912 (USA) Email: [email protected]

Abstract Constraint Logic Programming (CLP), and its generalization in the cc framework, de ne a class of declarative constraint languages combining nondeterministic goal-directed programming with constraint techniques over an arbitrary domain. CLP languages are particularly attractive for combinatorial search problems as they o er a short development time and a reasonable eciency. In this paper, we present the application of cc(FD), a CLP language using consistency techniques over nite domains, to two applications: the perfect square problem and a Digital Signal Processing (DSP) problem. The perfect square problem amounts to placing squares of di erent sizes in a master square in an exact manner (i.e. no empty space remains). We show a natural and very short cc(FD) program solving the problem for 21 and 24 squares in a couple of seconds. The DSP application amounts to scheduling tasks on a multiprocessor in presence of precedence, capacity, and delay constraints. The complexity of the problem is exacerbated by the non-uniform communication delays coming from the architecture which combines pipeline and master-slave processing. We present a short cc(FD) program which compares very well with a special-purpose branch and bound algorithm. Both applications show the versatility of languages such as cc(FD) for the solving of discrete combinatorial search problems.

1 Introduction Constraint Logic Programming (CLP) is a new class of declarative programming languages combining nondeterminism and constraint solving. The fundamental idea behind these languages, to use constraint solving instead of uni cation as the kernel operation of the language, was elegantly captured in the CLP scheme [12]. The CLP scheme can be instantiated to produce a speci c language by de ning a constraint system (i.e. de ning a set of primitive constraints and providing a constraint solver for the constraints). For instance, CHIP contains constraint systems over nite domains [23], Booleans [2] and rational numbers [11, 26], Prolog III [6] is endowed with constraint systems over Booleans, rational numbers, and lists, while CLP( C) is generated dynamically. The search is completed when the search space has been explored implicitly.

Example 9 Reconsider for instance the PERT problem discussed previously. A minimal

solution instantiating all variables can be obtained by the following program: solvePert(ResStartDates) :stateConstraint(StartDates,EndDate), minof(labeling(StartDates),EndDate,ResStartDates). labeling([]). labeling([F|T]) :indomain(F), labeling(T).

The key idea here is to state all constraints in a deterministic way and to enclose the nondeterministic part (i.e. the predicate making choices) in the meta-level predicate for optimization. The labeling predicate in the above program simply instantiates all variables using the indomain predicate. The indomain predicate is a nondeterministic predicate trying all possible values for the variable. Note that, in the above program, cc(FD) nds the optimal solution without any backtracking. For disjunctive scheduling, the labeling generally contains also a procedure to choose the ordering between the tasks [10, 23].

3 The Perfect Square Problem 3.1 Problem Statement

The problem amounts to packing a number of squares, all of di erent sizes, in a larger square, called the master square, in such a way that the squares do not overlap and leave no empty space. Placing 21 squares in a master square is of particular interest since 21 is the smallest number of squares that can t in a square. This problem and its variations have attracted much attention in the CLP community. Colmerauer [6] presents a program to ll a rectangle with squares of di erent sizes using linear equations, inequalities, and disequations over rational numbers. Aggoun and Beldiceanu [1] present a nite domain program which can solve large instances (e.g. 21 and 24) of the perfect square problem when the sizes of the squares are given. The program uses a specialized cumulative constraint and exploits the link between cumulative constraints and packing in two dimensions. We propose a simple cc(FD) program implementing a natural problem formulation which can solve large instances (e.g. 21 and 24) in about 30 seconds on a SUN 4/60. 10

3.2 Problem Solution

The main idea behind the program consists in creating the squares (i.e. creating variables for their coordinates), expressing the non-overlapping constraints, and assigning values for the coordinates in a nondeterministic way. Moreover, redundant capacity constraints are generated to speed up the computation by pruning the search space early.

3.2.1 Problem Data

The program is generic and receives as data the size of the master square and the sizes of the squares. For instance the data for 21 squares is as follows: sizeMaster(112). sizeSquares([50,42,37,35,33,29,27,25,24,19,18,17,16,15,11,9,8,7,6,4,2]).

3.2.2 Program Variables

Each square i is associated with two variables X and Y representing the coordinates of the bottom-left corner of the square. Each of these variables ranges between 1 and S ? S + 1 where S is the size of the master square and S is the size of square i. The following procedure describes the creation of the two lists of variables as well as the list of the sizes. i

i

i

i

generateSquares(Xs,Ys,Sizes,Size) :sizeMaster(Size), sizeSquares(Sizes), generateCoordinates(Xs,Ys,Sizes,Size). generateCoordinates([],[],[], ). generateCoordinates([X Xs],[Y Ys],[S Ss],Size) :Sdate := Size - S + 1, X 1..Sdate, Y 1..Sdate, generateCoordinates(Xs,Ys,Ss,Size).

j

j

j

2 2

3.2.3 The Non-Overlapping Constraint

The non-overlapping constraint between two squares (X1,Y1,S1) and (X2,Y2,S2) where (X1,Y1) and (X2,Y2) are the positions of the squares and S1 and S2 are their respective durations can be expressed using the cardinality constraint: nooverlap(X1,Y1,S1,X2,Y2,S2) :#(1,[X1 + S1 X2, X2 + S2





X1, Y1 + S1



Y2, Y2 + S2



Y1],2).

The cardinality constraint simply expresses that the rst square is on the left, on the right, below, or above the second square. It makes sure that, as soon as three possibilities are ruled out (i.e. they are not consistent with the constraint store), the last option is automatically enforced. As mentioned previously, the de nition simply generalizes the traditional disjunctive constraints to two dimensions. 11

3.2.4 The Capacity Constraints

A traditional technique to improve eciency in combinatorial search problem amounts to exploiting properties of all solutions by adding redundant or surrogate constraints. In the perfect square problem, the sizes of all squares containing a point with a given x-coordinate (resp. y-coordinate) must be equal to S , the size of the master square, since no empty space is allowed. These surrogate capacity constraints can be stated using cardinality and linear equations. For a given position P, the idea is to associate with each square i a Boolean variable Bi (i.e. a 0-1 domain variable) that is true i square i contains a point with xcoordinate (resp. y-coordinate) P. The Boolean variable is obtained using the cardinality constraint: #(Bi,[Xi





P & P

Xi + Si - 1 ],Bi).

The surrogate constraint can now be stated as a simple linear equation: B1 * S1 + . . . + Bn * Sn = Size. The program to generate a surrogate constraint is as follows: capacity(Position,Coordinates,Sizes,Size) :accumulate(Coordinates,Sizes,Position,Summation), Summation = Size. accumulate([],[], ,0). accumulate([C Cs],[S Ss],P,B*S + Summation) :B 0..1, #(B,[ C P & P C + S - 1],B), accumulate(Cs,Ss,P,Summation).

2

j

j





3.2.5 The Labeling Procedure

The generation of positions for the squares requires to give values to the coordinates of all squares. We use the idea of [1] for the labeling, exploiting the fact that no empty space is allowed. At each step, the program identi es the smallest possible coordinate and select a square to be placed at this position. On backtracking, another square is selected for the same position. The labeling is as follows: labeling([]). labeling([Sq Sqs]) :minlist([Sq Sqs],Min), selectSquare([Sq Sqs],Min,Rest), labeling(Rest).

j

j

j

j j

selectSquare([Sq Sqs],Min,Sqs) :Sq = Min. selectSquare([Sq Sqs],Min,[Sq Rest]) :-

j

12

Sq > Min, selectSquare(Sqs,Min,Rest).

The rst goal in the labeling nds the smallest position for the remaining squares while the second goal chooses a square to assign to the position. Since no empty space is allowed, such a square must exist.

3.2.6 The Overall Program

Figure 3 shows the overall program for the perfect square problem. As should be clear, the program is rather small. It packs 21 or 24 squares in a master square in about 30 seconds on a SUN 4/60, illustrating the expressiveness and eciency of the language.

4 The Digital Signal Processing Application 4.1 Problem Statement

The availability of programmable digital processors at low cost has raised much interest in DSP and its applications such as speech, image processing, and noise and echo cancellation. Moreover, the design of multiprocessor DSP systems has emerged as one of the most promising directions. Recently, Chinneck and al. [4, 15] have proposed a new approach to the design of DSP applications, that couples an extensible architecture with an automatic task to processor scheduling method. This approach is suitable for many complex DSP applications as it combines some of the advantages of specialized systems (often ecient but not exible) with those of general-purpose systems (often exible but much less ecient). In addition, it provides a real-time simulation tool for systems that will eventually be implemented on a VLSI chip. In the rest of this section, we review the key ideas behind the architecture and introduce the main combinatorial search problems. The presentation is based on [4, 15] where complete information on the approach can be found.

4.1.1 The Architecture

The extensible architecture is depicted in Figure 4 from [15]. It captures two common paradigms of DSP applications: pipelined processing and master-slave processing. The master (processor 1) has a direct communication link with all processors in the pipeline whereas processor i in the pipeline can read from its predecessor (processor i ? 1) and write to its successor (processor i +1). Some redundant links are included in this architecture to simplify programming of the applications. The processors are synchronized by a clock corresponding to the sampling rate of the incoming signal (or data) of the system. The clock cycle is divided into a read part during which a processor copies its input bu ers into local memory and a write part during which the processor is allowed to write in its output bu ers. As a consequence, all tasks must be performed during the clock cycle and data to be transferred must be available at the clock edge. 13

packSquares(Xs,Ys) :generateSquares(Xs,Ys,Sizes,Size), stateNoOverlap(Xs,Ys,Sizes), stateCapacity(Xs,Sizes,Size), stateCapacity(Ys,Sizes,Size), labeling(Xs), labeling(Ys). generateSquares(Xs,Ys,Sizes,Size) :sizeMaster(Size), sizeSquares(Sizes), generateCoordinates(Xs,Ys,Sizes,Size). generateCoordinates([],[],[], ). generateCoordinates([X Xs],[Y Ys],[S Ss],Size) :Sdate := Size - S + 1, X 1..Sdate, Y 1..Sdate, generateCoordinates(Xs,Ys,Ss,Size).

j

2

j

j

2

stateNoOverlap([],[],nnnnnnnnnn[]). stateNoOverlap([X Xs],[Y Ys],[S Ss]) :stateNoOverlap(X,Y,S,Xs,Ys,Ss), stateNoOverlap(Xs,Ys,Ss). stateNoOverlap(X,Y,S,[],[],[]). stateNoOverlap(X,Y,S,[X1 Xs],[Y1 Ys],[S Ss]) :nooverlap(X,Y,S,X1,Y1,S1), stateNoOverlap(X,Y,S,Xs,Ys,Ss). nooverlap(X1,Y1,S1,X2,Y2,S2) :- #(1,[X1 + S1

j

j

j

j

j

j



X2, X2 + S2



X1, Y1 + S1

stateCapacity(Cs,Sizes,Size) :- stateCapacity(1,Size,Cs,Sizes). stateCapacity(Pos,Size,Cs,Sizes) :- Pos > Size. stateCapacity(Pos,Size,Cs,Sizes) :capacity(Pos,Cs,Sizes,Size), Pos1 := Pos + 1, stateCapacity(Pos1,Size,Cs,Sizes). capacity(Position,Coordinates,Sizes,Size) :accumulate(Coordinates,Sizes,Position,Summation), Summation = Size. accumulate([],[], ,0). accumulate([C Cs],[S Ss],P,B*S + Summation) :B 0..1, #(B,[ C P & P C + S - 1],B), accumulate(Cs,Ss,P,Summation).

j

2

j





labeling([]). labeling([Sq Sqs]) :minlist([Sq Sqs],Min), selectSquare([Sq Sqs],Min,Rest), labeling(Rest). selectSquare([Sq Sqs],Min,Ss) :- Sq = Min. selectSquare([Sq Sqs],Min,[Sq Rest]) :Sq > Min, selectSquare(Sqs,Min,Rest).

j

j

j

j j

j

Figure 3: The Perfect Square Program

14



Y2, Y2 + S2



Y1],2).

Figure 4: The Extensible Architecture

Each processor in the architecture is identical, has its own local memory, and is characterized by its limits on the data memory, the code memory, and the processing time in a synchronization cycle. The main features of the architecture are 1. Extensibility: more processors can be added to accommodate the need for more processing power. 2. Bounded Communication Delay: the communication delay between two processors is at most 2. In the following, we assume that the DSP application is deterministic wrt the data stream and the task-to-processor scheduling can be performed once in the life time of the system for xed systems or once for each con guration for recon gurable systems.

4.1.2 The Task Graph

The DSP application is described by a task graph where each node represents a DSP function and an arc denotes a data ow path between two functions. Each task is characterized by its demand in processing time (e.g. 80 s:), in data memory (e.g. 50 bytes) and in code memory (e.g. 10 bytes). The data ow path expresses precedence constraints between the tasks. The communication delay between two successive tasks depends on which processors the tasks are assigned to. If the two tasks are assigned to the same processor, the communication delay is zero. If the two tasks are assigned to adjacent processors (i.e. i for the origin task and i + 1 for the destination task), the communication delay is one clock cycle, i.e. the destination task can proceed at the next clock cycle. Otherwise the communication is two clock cycles since the data transfer must go through the master. 15

In the following, we assume that the whole task graph include two ctitious tasks: an input task and an output task. Both tasks are assigned to the master processor and have no requirement. The total delay of the application is the time in clock cycles between the input and output tasks.

4.1.3 The Combinatorial Search Problems

There are two orthogonal problems that need to be solved: 1. minimizing the number of processors for a task graph; 2. minimizing the total delay for a given number of processor. The rst problem is a generalization of the well known bin-packing problem for which there exist very good approximation algorithms (e.g. rst- t decreasing). Our cc(FD) program uses such an algorithm to obtain a solution to the rst problem. The second problem (i.e. nding the minimum total delay given the number of processors given by the rst problem) is more dicult and is the topic of the remaining part of this section.

4.2 Problem Solution 4.2.1 Overview of the Program

The program is once again a typical nite domain program. The key idea is to create the problem variables (i.e. those variables on which the constraints will be imposed), to state the constraints and to generate values for the variables with a labeling procedure. Since it is an optimization problem, the labeling procedure, together with an objective function, is enclosed inside a meta-predicate for nding the optimal solution. The main predicate of the program is as follows: scheduleDsp(ResTasks) :createTasks(Tasks,Nbprocs), statePrecedence(Tasks), stateCapacity(Tasks,Nbprocs), stateDelay(Tasks), stateSurrogate(Tasks), findEndTask(Tasks,EndTask), getCycle(EndDate,EndCycle), minof(labeling(Tasks),EndCycle,ResTasks).

The rst goal creates the tasks Tasks, the next four goals state the constraints including some \semantically" redundant constraints, the next two goals retrieve the variable expressing the total delay (i.e. the cycle of the end task), and the last goal is the meta-predicate for optimization, receiving the labeling procedure and the variable EndCycle as objective function. In the rest of this section, the above steps are explained in details. 16

4.2.2 Problem Variables

Each task in the DSP application is associated with two variables:  P : the processor to which task i is assigned;  S : the starting cycle of task i. In the following, we refer to the rst type of variables as processor variables and the second type of variables as cycle variables. The key problem to be solved is the assignment of processors to the tasks. Once this assignment is completed, it is a simple matter to nd a solution to the overall problem. Indeed, the problem is then reduced to a simple PERT problem which is solved automatically through constraint propagation as mentioned previously. A solution can then be obtaining by assigning the earliest starting cycle to each task. Our cc(FD) program creates a term task(Number,Name,P,S,Time,Memory,Code) for each task where number is the number of the task, name its name, P its processor, S its starting cycle, and Time, Memory, Code its demands in computing time, data memory, and code memory. All components of the term are ground (i.e. fully instantiated) except P which is constrained to range between 1 and the number of processors available and S which is required to take a value between 0 and the maximum delay (which can be computed easily by assuming the worst communication delays). i

i

4.3 Precedence Constraints

As mentioned previously, the task graph implies a number of precedence relationships between the tasks. The precedence constraints only a ect the cycle variables and are stated in the standard way. precedence(Task1,Task2) :getCycle(Task1,C1), getCycle(Task2,C2), C1 C2.



4.4 Capacity Constraints

Since the capacity of each processor is limited, it is necessary to make sure that the tasks assigned to a processor do not exceed its resources. The capacity constraints for the DSP application are closely related to the capacity constraints of the perfect square problem but uses inequalities instead of equations. The cc(FD) program associates a Boolean variable with each task and each processor. The Boolean variable is true i the task is assigned to the processor and is obtained through a cardinality formula #(Bi,[ Pi = P ], Bi)

where B is the Boolean variable associated with task i, P is the processor variable of task i, and P is a processor number. The inequalities can now be stated as follows: i

i

17

. . . + Tin * Bn  MaxTime, . . . + Men * Bn  MaxMem, . . . + Tin * Bn  MaxCode, where B1,. . .,B are the Boolean variables associated with all tasks for processor P and the Tik,Mek,Cek are the demands of each task k in time, data memory, and code memory. The program for generating a capacity constraint is as follows: Ti1 * B1 + Me1 * B1 + Ti1 * B1 +

n

capacity(Tasks,Proc,MaxTime,MaxMemory,MaxCode) :collectCapacity(Tasks,Value,Time,Memory,Code), Time MaxTime, Memory MaxMemory, Code MaxCode.

  

collectCapacity([],Proc,0,0,0). collectCapacity([Task Tasks],Proc,T * B + Ts, M * B + Ms, C * B + Cs) :getTime(Task,T), getMemory(Task,M), getCode(Task,C), getProc(Task,P), B 0..1, #( B, [ P = Proc ], B ), collectCapacity(Tasks,Proc,Ts,Ms,Cs).

j

2

The predicate collectCapacity collects linear terms for the data memory, the code memory, and processing time. Those terms are then used in predicate capacity to state the capacity constraints. Collecting the linear terms is performed in a way similar to the capacity constraints for the perfect square problem.

4.4.1 The Delay Constraints

The total communication delay depends on the delay between every two successive tasks in the task graph. The cc(FD) program generates delay constraints between the cycle variables of the tasks. If C1, C2 are cycle variables of two successive tasks, the idea is to generate a constraint C2



C1 + Delay

where Delay is the communication delay between the two tasks. The communication delay can be expressed as a constraint involving the processor variables P1,P2 of the tasks using cardinality formulas: delay(P1,P2,S1,S2) :Delay 0..2, Delay = 0 P1 = P2, Delay = 1 P1 P2 & (P1 = 1 C2 C1 + Delay.

2



, ,

6=

_

P2 = 1

18

_

P2 = P1 + 1),

The rst goal expresses that the delay is atmost 2, the second goal states that the delay is zero whenever the tasks are allocated to the same processor, while the third goal imposes the delay to be one whenever the tasks are not allocated to the same processor and either one of them is assigned to the master processor or the second task is allocated to a processor on the right of the processor of the rst task. The third case can be omitted since the rst two cases are expressed as equivalences and the delay is at most 2. Note that the distance constraints preclude the need for the precedence constraints which can now be omitted. Finally note the simplicity of the implementation which follows closely the de nition.

4.4.2 Surrogate Constraints

The cycle variable of the end task provides an opportunistic evaluation of the minimum delay of the DSP problem. Indeed, this variable is constrained by all the delay constraints and its minimum value provides the optimistic evaluation. It is however possible and desirable to improve this evaluation by stating some properties of the solutions. This helps pruning the search space during the search for the optimal solution. The key idea is to exploit the resource constraints jointly with the precedence constraints. Consider, for instance, a precedence path t1; t2; . . . ; t and a resource r (e.g. computing time, data memory, code memory). If the tasks t1; . . . ; t (1  i  n) have resource requirements for resource r exceeding the capacity of the processor, it follows that the delay between tasks t1 and t is at least 1. A constraint C > C1 where C ,C1 are the cycle variables of tasks t and t1 can then be generated without removing any of the solutions. Our cc(FD) program systematically generates all such surrogate constraints for all resources. Note that it is sucient to consider for each task the smallest subpath exceeding the capacity of the resources. n

i

i

i

i

i

4.4.3 The Labeling Procedure

We now consider the labeling procedure for the DSP application. As mentioned previously, the key issue is the assignment of a processor to each task. It is a simple matter to complete the solution by assigning the earliest values to the cycle variables. Hence we focus on the rst part of the labeling. As is typical with constraint satisfaction problems, two decisions need to be made: 1. which variable should be instantiated rst; 2. which value should be given to the selected variable. For the rst decision, two strategies have been included in the cc(FD) program:  a depth- rst labeling;  a breath- rst labeling. 19

The depth- rst labeling proceeds by assigning processor variables in a depth- rst traversal of the task graph starting with the start task while the breath- rst labeling uses a breath- rst traversal. The motivation behind the depth- rst strategy is to obtain quickly an accurate evaluation of the total delay. The motivation behind the breath- rst strategy is to exploit the locality of the tasks and to schedule closely related tasks together. Both strategies have been systematically evaluated and experimental results are discussed in the next section. As far as the second decision is concerned, the cc(FD) program uses the following heuristics: if p is the value of the previously assigned variable,  rst try p;  next try p + 1;  next try the master processor;  nally try all other values. The heuristics is motivated by the desire to minimize the total communication delay. The rst choice guarantees a communication delay of zero between the last two tasks, the second and third choices ensure a communication delay of 1 while the last choice imposes a communication delay of 2. The implementation of this heuristics is based on the following procedure: assign(P,Previous) :P = Previous. assign(P,Previous) :P = Previous + 1. assign(P,Previous) :P Previous, P = 1. assign(P,Previous) :P Previous, P Previous + 1, P 1, indomain(P).

6=

6= 6 = 6 =

4.4.4 Exploiting Symmetries

In general, DSP applications have many symmetrical solutions and it would be of great advantage if the program could exploit them in order to prune the search space. Consider for instance the computation step where the program tries to assign a processor to a task. If there are several processors unused at this computation step, it would be convenient to restrict attention to only one of them. It turns out[4] that this idea is only applicable under restricted circumstances, namely when all tasks which are on a parallel pipeline with the task being considered have been assigned already. To evaluate the possible gain of symmetries in the DSP applications, our program also includes an incomplete labeling procedure that systematically performs the above pruning 20

at the risk of missing the optimal solution. The labeling procedure needs to keep track of the used processors and is based on the following procedure: assign(P,Previous,MaxUsed,MaxUsed) :P = Previous. assign(P,Previous,MaxUsed,P) :P = MaxUsed + 1. assign(P,Previous,MaxUsed,MaxUsed) :P Previous, P = 1. assign(P,Previous,MaxUsed,MaxUsed) :P Previous, P 1, P MaxUsed, indomain(P).

6=

6= 6 = 

In this procedure, the third argument is the maximum number of the used processors. The fourth argument is the new maximum number. The last clause only considers the previously used processors. Hence only one new processor can be used (i.e. in the second clauses).

4.5 Experimental Results

In this section, we summarize the experimental results obtained by the cc(FD) program on actual DSP applications and compare them with a specialized branch and bound algorithm (written in C). First note that the overall cc(FD) program is about 4 pages long. This obviously comes from the fact that the search tree management and the constraint propagation are automatically taken care by the programming language. The programming task consisted of stating the constraints and coming with a suitable labeling procedure. Note also the adequacy of the cardinality operator to express sophisticated constraints such as the capacity and delay constraints. Table 1 presents a description of actual DSP applications taken from [15]. The rst column describes the name of the applications, the second column the size of the task graph, the third column the number of processors available and the last column the topology of the applications. The rst problems are small, the last one being already much larger. For instance, the potential search space of RDAD40 is 825 which is 275. Table 2 presents the results of the complete cc(FD) algorithm. The second column gives the total delay of the optimal solution, the third column the times of the cc(FD) program with the depth- rst labeling strategy, the fourth column the times of the cc(FD) program with the breath- rst labeling strategy, and the fth column the times of the specialized branch and bound of [15]. All times are given in seconds for a SUN 4/60. On these actual applications, the cc(FD) program behaves much better in general with the depth- rst strategy than with the breath- rst strategy. In particular, it is about 14, 2230, 59 times faster on RDAD04, RDAD08, and RDAD40. The cc(FD) program also compares well with the speci c branch 21

Problem RDAD01 RDAD02 RDAD03 RDAD04 RDAD05 RDAD06 RDAD07 RDAD08 RDAD09 RDAD10 RDAD20 RDAD40 RDAD41

Size 9 9 6 19 12 16 12 15 9 15 13 25 25

Processors 3 3 5 6 4 5 4 5 6 5 5 8 8

Topology Pipeline Pipeline Architecture-like Parallel Pipelines Pipeline Parallel Pipelines Parallel Pipelines Merging Tasks Many Generators Parallel Pipelines Architecture-like Parallel Pipelines Parallel Pipelines

Table 1: Description of Actual DSP Applications

Problem Total Delay RDAD01 3 RDAD02 3 RDAD03 5 RDAD04 3 RDAD05 3 RDAD06 3 RDAD07 2 RDAD08 3 RDAD09 2 RDAD10 4 RDAD20 3 RDAD40 5 RDAD41 4

: DF 0.78 0.72 0.22 1.66 1.12 2.68 0.56 3.23 0.18 3.90 0.99 54.40 4.24

cc(FD)

: BF Specialized BB 0.31 0.016 0.32 0.016 0.11 0.000 24.14 51.700 5.29 0.016 18.31 6.300 5.19 0.050 7202.00 963.13 0.20 0.016 1.80 0.033 0.78 0.016 3216.7 ????? 3.16 0.100

cc(FD)

Table 2: Results on Actual DSP Applications

22

Problem Delay-DF Time-DF Delay-BF Time-BF RDAD01 3 0.78 3 0.26 RDAD02 3 0.74 3 0.27 RDAD03 5 0.18 5 0.11 RDAD04 3 1.25 5 4.61 RDAD05 3 0.92 5 1.62 RDAD06 3 2.12 3 5.88 RDAD07 2 0.56 2 1.83 RDAD08 3 1.86 3 359.69 RDAD09 2 0.16 3 0.15 RDAD10 4 3.03 4 1.13 RDAD20 3 0.79 3 0.52 RDAD40 5 16.55 5 150.16 RDAD41 4 3.45 4 1.46

Table 3: Results With Symmetries on Actual DSP Applications

and bound algorithm. In particular, the DF program is 31 and 298 faster on RDAD04 and RDAD08. The speci c branch and bound algorithm also could not nd a solution for RDAD40 but is faster in general on the problems requiring less computing time. Table 3 presents the results of the incomplete cc(FD) algorithm, i.e. the algorithm exploiting systematically symmetries. Delay-DF and Time-DF give the best total delay found and the computation time for the cc(FD) program with the depth- rst search strategy while Delay-BF and Time-BF the total delay and the computation time for the cc(FD) program with the breath- rst search strategy. The rst point to notice that, on those problems, the optimal solution is always found by both programs. The computation time can also be drastically reduced as illustrated by the problems RDAD40 and RDAD08. To compare the results with the specialized branch and bound, we give in Table 4 the results with the same exploitation of the heuristics and, in addition, a dynamic clustering of tasks. The dynamic clustering groups together tasks on a pipeline and thus substantially reduces the search space. The results indicate that the cc(FD) programs, even without dynamic task clustering, still performs well. It is faster on RDAD08 and less than 10 times slower on RDAD40 where many task clusterings can take place. These results indicate clearly that a language like cc(FD) can be competitive for industrial applications while providing a short development time.

5 Conclusion In this paper, we have described the application of cc(FD) to two applications: the perfect square problem and a digital signal processing application. cc(FD) is a declarative constraint logic programming language over nite domains o ering a short development time and a competitive eciency for many discrete combinatorial search problems. The cc(FD) program for the perfect square problem is about a page long, conceptually simple, and packs 21 or 24 squares in about 30 seconds on a SUN 4/60. The cc(FD) program for the DSP applications is about 4 pages long and is competitive in eciency with a specialized branch 23

Problems Delay-SBB Time-SBB RDAD01 3 0.016 RDAD02 3 0.000 RDAD03 5 0.000 RDAD04 3 0.000 RDAD05 3 0.016 RDAD06 3 0.033 RDAD07 2 0.000 RDAD08 3 29.067 RDAD09 2 0.000 RDAD10 5 0.000 RDAD20 3 0.000 RDAD40 5 1.733 RDAD41 4 0.033

Table 4: Results of the Specialized Branch and Bound with Symmetries and Clustering

and bound algorithms developed for the task. Both programs make use of advanced features of cc(FD) (e.g. the cardinality operator) and exploit the integration of constraint solving in a declarative language, e.g. by using sophisticated choice procedures and generating surrogate constraints. Both applications illustrate the potential bene t of constraint languages such as cc(FD).

Acknowledgements Michel Van Caneghem and Alain Colmerauer suggested the perfect square application and the approach taken in the cc(FD) program and pointed out the reference [1]. Gerald Karam provided the DSP applications and gave us the benchmarks to test our programs. We would like to thank them as well as Yves Deville and Vijay Saraswat for the work on the design of cc(FD). This research was partly supported by the National Science Foundation under grant number CCR-9108032 and the Oce of Naval Research under grant N00014-91-J-4052 ARPA order 8225.

References [1] A. Aggoun and N. Beldiceanu. Extending CHIP To Solve Complex Scheduling and Packing Problems. In Journees Francophones De Programmation Logique, Lille, France, 1992. [2] W. Buttner and H. Simonis. Embedding Boolean Expressions into Logic Programming. Journal of Symbolic Computation, 4:191{205, October 1987. [3] J. Carlier and E. Pinson. Une Methode Arborescente pour Optimiser la Duree d'un JOB-SHOP. Technical Report ISSN 0294-2755, I.M.A, 1986. 24

[4] J.W. Chinneck, R.A. Goubran, G.M. Karam, and M Lavoie. A Design Approach For Real-Time Multiprocessor DSP Applications. Report sce-90-05, Carleton University, Ottawa, Canada, February 1990. [5] N. Christo des. Graph Theory: An Algorithmic Approach. Academic Press, New York, 1975. [6] A. Colmerauer. An Introduction to Prolog III. CACM, 28(4):412{418, 1990. [7] T. Dean and M. Boddy. An Analysis of Time-dependent Planning. In Proceedings of the Seventh National Conference On Arti cial Intelligence, pages 49{54, Minneapolis, Minnesota, August 1988. [8] M. Dincbas, H. Simonis, and P. Van Hentenryck. Solving a Cutting-Stock Problem in Constraint Logic Programming. In Fifth International Conference on Logic Programming, Seattle, WA, August 1988. [9] M. Dincbas, H. Simonis, and P. Van Hentenryck. Solving the Car Sequencing Problem in Constraint Logic Programming. In European Conference on Arti cial Intelligence (ECAI-88), Munich, W. Germany, August 1988. [10] M. Dincbas, H. Simonis, and P. Van Hentenryck. Solving Large Combinatorial Problems in Logic Programming. Journal of Logic Programming, 8(1-2):75{93, 1990. [11] T. Graf. Extending Constraint Handling in Logic Programming to Rational Arithmetic. Internal Report, ECRC, Munich, Septembre 1987. [12] J. Ja ar and J-L. Lassez. Constraint Logic Programming. In POPL-87, Munich, FRG, January 1987. [13] J. Ja ar and S. Michaylov. Methodology and Implementation of a CLP System. In Fourth International Conference on Logic Programming, Melbourne, Australia, May 1987. [14] M. Kubale and D. Jackowski. A Generalized Implicit Enumeration Algorithm for Graph Coloring. CACM, 28(4):412{418, 1985. [15] Marco Lavoie. Task Assignment In A DSP Multiprocessor Environment. Master Thesis, Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, August 1990. [16] A.K. Mackworth. Consistency in Networks of Relations. Arti cial Intelligence, 8(1):99{ 118, 1977. [17] M.J. Maher. Logic Semantics for a Class of Committed-Choice Programs. In Fourth International Conference on Logic Programming, pages 858{876, Melbourne, Australia, May 1987. [18] R. Mohr and T.C. Henderson. Arc and Path Consistency Revisited. Arti cial Intelligence, 28:225{233, 1986. 25

[19] V.A. Saraswat. Concurrent Constraint Programming Languages. PhD thesis, CarnegieMellon University, 1989. [20] V.A. Saraswat and M. Rinard. Concurrent Constraint Programming. In Proceedings of Seventeenth ACM Symposium on Principles of Programming Languages, San Francisco, CA, January 1990. [21] V.A. Saraswat, M. Rinard, and P. Panangaden. Semantic Foundations of Concurrent Constraint Programming. In Proceedings of Ninth ACM Symposium on Principles of Programming Languages, Orlando, FL, January 1991. [22] P. Van Hentenryck. A Logic Language for Combinatorial Optimization. Annals of Operations Research, 21:247{274, 1989. [23] P. Van Hentenryck. Constraint Satisfaction in Logic Programming. Logic Programming Series, The MIT Press, Cambridge, MA, 1989. [24] P. Van Hentenryck and Y. Deville. The Cardinality Operator: A new Logical Connective and its Application to Constraint Logic Programming. In Eighth International Conference on Logic Programming (ICLP-91), Paris (France), June 1991. [25] P. Van Hentenryck, Y. Deville, and C.M. Teng. A Generic Arc Consistency Algorithm and its Specializations. Arti cial Intelligence, 1992. Accepted for publication. [26] P. Van Hentenryck and T. Graf. Standard Forms for Rational Linear Arithmetics in Constraint Logic Programming. Annals of Mathematics and Arti cial Intelligence, 1992. To Appear. A preliminary version was presented at The International Symposium on Arti cial Intelligence and Mathematics, Fort Lauderdale, Fl, January 1990. [27] P. Van Hentenryck, H. Simonis, and M. Dincbas. Constraint Satisfaction Using Constraint Logic Programming. Arti cial Intelligence, 1992. (Special Issue on ConstraintBased Reasoning), accepted for publication.

26

Suggest Documents