system implementation with arbitrary topology, using the object-oriented structure
to parti- ... Additional Key Words and Phrases: Distributed embedded systems,
hardware-software ..... synthesis thanks to their intuitive presentation of powerful
optimization ... In Proceedings, IEEE 1984 Real-Time Systems Symposium, IEEE,
.
Object-Oriented Cosynthesis of Distributed Embedded Systems WAYNE WOLF Princeton University
This article describes a new hardware-software cosynthesis algorithm that takes advantage of the structure inherent in an object-oriented specification. The algorithm creates a distributed system implementation with arbitrary topology, using the object-oriented structure to partition functionality in addition to scheduling and allocating processes. Process partitioning is an especially important optimization for such systems because the specification will not, in general, take into account the process structure required for efficient execution on the distributed engine. The object-oriented specification naturally provides both coarse-grained and fine-grained partitions of the system. Our algorithm uses that multilevel structure to guide synthesis. Experimental results show that our algorithm takes advantage of the object-oriented specification to quickly converge on high-quality implementations. Categories and Subject Descriptors: C.3 [Computer Systems Organization]: Special-Purpose and Application-Based Systems—microprocessor/microcomputer applications, real-time systems General Terms: Design Additional Key Words and Phrases: Distributed embedded systems, hardware-software codesign, object-oriented co-synthesis
1. INTRODUCTION This article describes a new algorithm for the architectural cosynthesis of distributed embedded systems. Our algorithm takes advantage of an objectoriented specification to partition the system’s functionality among the distributed CPUs, thus synthesizing a software architecture for the implementation, while simultaneously synthesizing a hardware architecture on which the software will be executed. Unlike many cosynthesis algorithms which are aimed at an architectural template, such as a 1 CPU-n ASIC system, our algorithm can synthesize a distributed system of arbitrary
A preliminary version of this work was presented at CHDL ’95. This research was supported in part by a grant from the National Science Foundation under grant MIP-9424110. Author’s address: Department of Electrical Engineering, Princeton University, Princeton, NJ 08544; email: ^
[email protected]&. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and / or a fee. © 1996 ACM 1084-4309/96/0700 –0301 $03.50 ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996, Pages 301–314.
302
•
W. Wolf
topology while simultaneously synthesizing the software architecture of the code which the engine will execute. Effective partitioning is especially important to obtain an optimized hardware-software architecture for the system that both minimizes cost and meets performance constraints; our algorithm takes advantage of an object-oriented specification to partition the software to optimize the hardware. An embedded computing system consists of a hardware engine that executes application software. The computing elements of the hardware engine are processing elements (PEs), a term we use to refer to any of a variety of general-purpose CPUs, DSPs, floating-point units, ASICs, and the like. The software is often specified as a set of communicating processes. Co-design includes four major steps: partitioning the specification into the processes that appear in the implementation; allocating processes to PEs; scheduling the execution of PEs; and mapping generic processes and PEs into detailed software and hardware components [Wolf 1994]. If, for example, the design goal is to minimize implementation cost while satisfying performance goals, the hardware engine must be designed with enough performance to make sure that the system meets its deadlines and its soft performance goals, while minimizing the engine’s cost. Process partitioning is critical to the generation of efficient software architectures during co-synthesis—improper partitioning of functionality can cause both excessive communication and inefficient CPU utilization. (Although many researchers have used the term hardware-software partitioning to describe their cosynthesis algorithms for 1 CPU-n ASIC systems, we prefer to use the term allocation for the process of determining what type of PE will be used to implement a function and use partitioning for the division of a complex function into a network of communicating processes.) A process network specification which is easy to describe and maintain may not be the most efficient process architecture for the implementation. Changing how the system is partitioned into processes changes the parallelism that is exposed for optimization during system architecture design. Cosynthesis from an object-oriented specification is a key to efficient partitioning. An object-oriented specification describes the system functionality in terms of a collection of communicating objects, and each object is refined into data and procedures. Our cosynthesis algorithm takes advantage of this two-level hierarchy of specification to repartition the application software during the design of the hardware engine. The next section surveys previous work in hardware-software cosynthesis and object-oriented specification. Section 3 describes the model of our problem. Section 4 describes our cosynthesis algorithm, and Section 5 describes the results of experiments with the algorithm. 2. PREVIOUS WORK 2.1 Hardware-Software Cosynthesis and Partitioning Previous work has studied related problems, but this article is the first to describe an algorithm that partitions as well as allocates/schedules/maps ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
•
303
application software while simultaneously synthesizing a distributed engine. The distributed systems and real-time communities have studied scheduling [Ramamritham et al. 1990], allocation [Chu and Tan 1987; Dasarathy and Feridun 1984], and partitioning [Huang 1985] for distributed systems. However, virtually all of that work has considered these problems separately or at most pairwise; all of that work also assumes that the distributed system’s topology is given. Several cosynthesis algorithms have studied various aspects of the design of engines with general topologies. In the SOS algorithm, Prakash and Parker [1992] solved the allocation, scheduling, and engine topology problems simultaneously using mixed integer linear programming (MILP) techniques. MILP algorithms, although powerful, require very long run times; Prakash and Parker reported execution times of hours for their examples. Barros et al. [1994] developed an algorithm for implementing UNITY language programs into systems with multiple processors and ASICs. Their algorithm uses a multistage clustering algorithm to allocate functions to hardware and software components but does not repartition the process specification. Wolf [1996] developed a heuristic algorithm for Prakash and Parker’s formulation of the distributed co-synthesis problem. Yen and Wolf [1995a,b] developed algorithms for distributed co-synthesis that could handle multiple tasks in a task graph. Architectural partitioning algorithms model the design as a marked graph and partition the graph into several smaller subgraphs to optimize performance and interconnect cost. Partitioning algorithms rely primarily on the structure of the graph during optimization. APARTY [Lagnese and Thomas 1991] is an architectural partitioner dedicated mainly to hardware designs thanks to its emphasis on a large number of relatively small operators; it uses a hierarchical clustering algorithm. PARTIF [Ismail et al. 1994] is an interactive partitioning tool based on the SOLAR design representation of communicating processes. Hardware-software partitioning algorithms implement a system from a canonical architecture, usually a programmable CPU and a custom ASIC communicating over a bus. Two strategies are to move operations from hardware to software to minimize cost, as does the algorithm of Gupta and De Micheli [1993], or to move operations from software to hardware to satisfy performance goals, as does the algorithm of Ernst et al. [1993]. Vahid et al. [1994] use a binary search algorithm to allocate functions in a CPU-ASIC system. Synthesis of distributed systems with arbitrary topologies requires new optimization functions. Hardware-software partitioning algorithms, because they work from an architectural template, concentrate on the synthesis of a custom ASIC to accelerate time-critical functions. These research efforts have developed effective methods for performance estimation of custom ASICs during co-synthesis. The work reported in this article concentrates on the distributed hardware/software topology. As a result, we have assumed that ASIC performance is predefined in a library of component descriptions. The estimation techniques developed for hardware-software partitioning are ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
304
•
W. Wolf
Fig. 1.
Objects with variables and methods.
directly applicable to object-oriented co-synthesis. However, extending the cosynthesis algorithms themselves to distributed engines—that is, determining which operations should be placed in ASICs versus CPUs in systems with arbitrary topologies—is a topic for future work. 2.2 System Specification System design is generally divided into several distinct phases [Pressman 1992]: requirements analysis creates a natural language description of the desired system through interviews with the customer; specification yields a more formal description of the system which satisfies the requirements; design refines the specification to create a software and hardware architecture; and maintenance fixes and improves the system once it has been deployed. (The requirements and specifications phases are often called system analysis.) System requirements are generally divided into functional and nonfunctional: a functional requirement specifies an inputoutput relation, whereas a nonfunctional requirement may include performance, reliability, power consumption, and the like. Our algorithm synthesizes architectural designs for hardware and software from a specification containing functions plus nonfunctional performance requirements. Object-oriented specification and design are used for general program design and object-oriented specification is widely used as a modeling technique for real-time systems. In a traditional structured specification, data structures and procedures are developed relatively separately; as a consequence, both tend to become fairly large-grained. Figure 1 shows an intuitive model for an object: the object is defined as a collection of variables; the procedures that operate on those data are called methods. A method may or may not return a value and it may or may not change the values of variables in the object. The type definition of an object is traditionally called a class; it defines both the object’s data and methods. We believe that object-oriented specifications are especially well-suited for hardware-software codesign because they provide a wealth of structuring information about the design: methods tend to implement finer-grained ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
•
305
operations than is typical in structured procedural systems; and the grouping of methods and variables into objects provides a hierarchical structure ranging from fine-grained methods to coarser-grained objects. There are several prominent object-oriented methodologies for real-time system specification and design. Shlaer-Mellor analysis [Shlaer 1992] models an object as a data structure plus a set of state machines that operate on those data. The behavior of an object can be described as an ASM-like state transition graph. Rumbaugh et al. [1991] developed an object-oriented methodology for system analysis and program design. Their methodology places more emphasis on the relationship between the object-oriented requirements model and the object-oriented program design than does Shlaer-Mellor analysis. ROOM [Selic et al. 1994] was developed for realtime systems such as telephone switching systems. It uses a variation of Statecharts to describe the functions of methods and places great emphasis on the role of inheritance in design abstraction.
3. PROBLEM FORMULATION Our goal is to define the functional and nonfunctional elements of the system specification and the model for the system implementation; however, we must first define an object-oriented specification using a simple definition. A class consists of variable templates and method templates:
# 5 ^9, }&
(3.1)
where 9 is a set of variable templates and } is a set of method templates. An object O has a class, a set of variables V 5 {V i }, and a set of methods M 5 {m i }. If two objects have the same class, they have isomorphic variable and method sets, but their variables may not have the same values. A variable v has a name n and a state s:
v 5 ^ n, s &
(3.2)
We use the definition of a class as a shorthand for the definition of a set of objects. The class of an object o (the most derived class when inheritance is taken into account) is referred to as C(o). Two objects o 1 and o 2 have identical variables and methods if they are of the same class, though the variables of o 1 and o 2 will not, in general, have the same states. We do not need to directly consider class inheritance during cosynthesis— cosynthesis uses the variable/method structure of objects, but not the class hierarchy itself. Our synthesis algorithm works on sets of variables and methods for each class derived by flattening the class hierarchy structure. The state S o of the object o consists of the Cartesian product of the variable states
So 5 s1 3 s2 3 . . . .
(3.3)
ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
306
•
W. Wolf
Fig. 2.
Dataflow between methods.
The variable set of a method m is the set of variables that the method either reads or writes; class encapsulation ensures that the variable set of a method in an object consists entirely of variables in that object. The read/write variable sets of a method are the variables read/written by the method. The variable sets of methods in an object may have nonempty intersections. An object-oriented system executes by methods calling other methods as illustrated in Figure 2; execution of a method on an object is called a message. We model the flow of data between methods during execution using a method dataflow graph. A node in the graph represents a method and its variable set. (If we have two objects of the same class, the methods in those two objects are represented by distinct nodes.) A directed edge in the graph flows from a calling method to a called method. We can now define the cosynthesis problem itself. Our formulation is related to formulations used for distributed system scheduling, allocation, partitioning, and mapping. We use a method dataflow graph to formulate the functional element of the system behavior. The method dataflow graph corresponds to the task graph specification used in distributed system synthesis [Chu and Tan 1987] and some other co-synthesis algorithms [Prakash and Parker 1992; Yen and Wolf 1995a,b; Wolf 1996]. We also need nonfunctional information, namely, the rate at which the process graph (the inverse of the execution period) must be executed. The execution rate of the method dataflow graph specifies the performance required of the system’s implementation. The method dataflow graph must be in the form of a DAG so that the execution time of the graph can be computed and compared to the required rate. The method dataflow graph corresponds to the process graphs used in distributed system synthesis; in our case, each method is a separate process. Each method is labeled with its variable set. Our algorithm requires a technology description of the available components: PEs, I/O devices, and communication channels. A technology database describes the properties of the hardware components and the impleACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
•
307
mentations of the methods. The technology database includes a list of all the available PE types { p 1 , p 2 , . . . } and communication channel types {c 1 , c 2 , . . . }. A communication channel connects to a PE or a device through a port. The database also gives the manufacturing cost of each p and c (the manufacturing cost of a communication channel may be zero if its circuitry is included in a PE), the set of channel types that can be implemented by each PE type, the channel’s throughput, and the maximum number of ports that can be connected to the channel. For each method m, the database specifies T(m, p), the execution time of that method on a PE of type p. (Note that we assume that the variable of an object can be implemented on any PE type, although this assumption is not essential for cosynthesis.) The implementation is specified as a graph and a pair of allocation functions. The hardware engine graph has nodes that represent the PEs and edges that represent communication channels. Both the nodes and edges are annotated with the type of component used (8051, I2C bus, etc.). The method allocation maps methods to PEs (implicitly allocating the variables in that method’s variable set to the PE as well); the communication allocation allocates edges in the method dataflow graph to communication channels. Synthesis also produces schedules for method execution and intermethod communication.
4. OBJECT-ORIENTED COSYNTHESIS WITH PARTITIONING Object-oriented specification is an important aid to automatic partitioning because the specification is naturally described at two levels of granularity. The gross system architecture is described as a network of objects that send messages between themselves to implement tasks. Each object is, in turn, described as a collection of data variables and methods (procedures which operate on the object’s data). The object-level system architecture provides an organization for gross optimization of the system. The method-level decomposition makes it possible to split and recombine objects. We use method-level decomposition to repartition both code and data—speeding up a critical operation may require moving both the code for the operation and the data on which it operates. However, we try to keep methods in an object together to the extent possible, since the designer’s specification of the objects presumably contains useful information about the affinity of data and code. Our algorithm does not split a method into several smaller sections of code, but it does split the variable set of an object across several PEs. The major objective for cosynthesis is to satisfy the rate requirement, and the minor objective is to minimize the total implementation (PE, device, and channel) cost. In order to perform well on both objectives, the algorithm is designed to overallocate hardware to meet the system rate requirement, then iteratively reduce the system cost by reallocating methods and data to new PEs substituting a cheaper PE for a more expensive one. Because PE cost usually dominates, the algorithm’s heuristic is to first ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
308
•
W. Wolf
minimize total PE cost, then select PEs to use on-board devices wherever possible, and finally to minimize communication channel cost. We previously developed a cosynthesis algorithm that worked from a standard, nonobject-oriented process specification [Wolf 1996]. This objectoriented cosynthesis algorithm builds upon the heuristics originally developed for cosynthesis of specifications with large processes, but contains several important modifications to handle large sets of methods and to consider the effects of splitting objects across PEs. Cosynthesis occurs in five steps, which successively satisfy the rate constraint, then minimize PE cost, and finally minimize communication and device cost: (1) Initial allocation and scheduling. Allocate processes to PEs such that all tasks are placed on PEs fast enough to ensure that all deadlines are met, keeping objects together as much as possible. Schedule the processes to determine process exclusivity and communication rates. (2) Minimize PE cost. Reallocate processes to PEs to minimize PE cost, splitting objects where necessary. (3) Minimize communication. Reallocate processes again to minimize inter-PE communication, taking into account traffic generated by splitting objects across PEs. (4) Allocate channels. Allocate communication channels. (5) Allocate devices. Allocate devices, either as on-chip devices or external devices on communication channels. Decisions at each step are made suboptimally, but local search tries to ensure that good decisions are made. Because PE cost dominates the implementation cost, Step 2, the PE cost-minimization step, is the most important and also the step in which the object-oriented nature of the specification is most important. The following paragraphs describe each step in more detail. Step 1 of our algorithm greedily allocates PEs to methods to ensure that the rate constraint will be met; this produces an expensive hardware engine whose cost must be reduced in subsequent steps. Our initial allocation keeps objects together by assigning all methods and variables for an object to the same PE, but it assigns each object its own PE. The type of the PE assigned to an object is the fastest PE required for any method in the object’s class. This allocation is not guaranteed to produce an initial implementation that has a feasible schedule, but we have not found any problems with this scheme in practice. We use this initial allocation because assigning each method its own PE, which is guaranteed to provide a feasible schedule, generates too much hardware in the initial solution and makes it difficult to optimize the system design. Step 1 also finds a feasible schedule for the methods—since allocation decisions must be made based on the schedule, the system schedule must be recomputed whenever the allocation changes. Since the method dataflow graph is a DAG, we can compute the schedule given an allocation using a longest path algorithm. We also use this scheduling procedure to test feasibility during Steps 2 and 3. ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
Fig. 3.
•
309
Details of Step 2 in object-oriented cosynthesis.
Step 2, which tries to reduce PE cost, is outlined in pseudo-C11 in Figure 3. The PE cost reduction step iteratively tries three tactics to reallocate methods and repartition objects to reduce PE cost. PE_replacement( ), whose body is not shown, simply tries to replace an expensive PE with a cheaper PE, without disturbing the allocation. oo_pairwise_merge( ) tries to take advantage of wasted cycles by moving all the methods off a PE so that the PE can be eliminated. This procedure tries to move methods to eliminate hidden object data communication. oo_balance_load( ) tries to redistribute methods to better balance the system load. Step 2 repeatedly applies these three tactics in order until the hardware cost is no longer reduced; the system schedule must be recomputed in each iteration, using the scheduling procedure used to find the initial feasible solution. The two most important procedures in this optimization loop are described in more detail in the following. oo_balance_load( ) tries to balance the computation load across the PEs by making use of the partitioning available in the object-oriented specification. The specification’s decomposition into objects contains much useful information about data partitioning and communication, so our heuristic is to preserve the specification’s partitioning as much as possible. Object partitioning can be changed during cosynthesis either to increase performance or reduce communication time and balance system load. As shown in ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
310
•
W. Wolf
Fig. 4.
Repartitioning objects to increase parallelism.
Figure 4, we may move a method to a different PE to increase the parallelism during execution. Because our algorithm first generates a schedule that meets the rate, this optimization is used to change the load balance within the system. By taking advantage of spare cycles on CPU 2, for example, we may be able to use a lower-cost PE for CPU 1. The usefulness of this optimization depends in part on the amount of data that must be copied from the original object when the method is moved. Alternatively, oo_pairwise_merge( ) tries to reduce the implementation’s manufacturing cost by making use of the structure of the object-oriented specification. When we are trying to remove all methods from a PE by coalescing methods onto a single PE, we can take advantage of the opportunity to reduce data communication between methods. As shown in Figure 5, when we have a choice of several places to move a PE, we prefer to move it to a new PE on which other methods from the same object reside. When these methods share data, this optimization reduces the communication traffic, making it easier to find a feasible schedule and perhaps allowing later synthesis steps to use a cheaper communication channel. When we move a method from one PE to another, we may have to copy data used by that method which is also needed by another method on the original PE. When we repartition objects, we must take into account the extra data communication required. This object data communication is a hidden communication cost—it does not appear in the specification’s dataflow. The communication required to synchronize the data used by methods on two different PEs is extra dataflow caused by repartitioning the objects. The optimizations shown in Figures 4 and 5 use estimates of communication time to determine the cost/gain of moving a method and its data from one PE to another. Step 3, which tries to minimize communication cost, performs a more detailed analysis of the data communication required to keep the object data consistent. The first two steps have assumed that unlimited communication bandwidth was available to implement both the specification’s communication and hidden object data communication. Step 3 takes into account the ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
Fig. 5.
•
311
Reducing communication costs.
bandwidth limitations of the channels that are supported by the PEs currently in the design. It first computes a feasible schedule for both the specified and hidden communication. It then selects for each communication path between PEs the cheapest channel that can meet the required rate; in the case of buses, if a bus has already been allocated, can be connected to the PE, and has sufficient spare bandwidth, then the PE can be added. The last two steps of the algorithm are straightforward: communication channels are allocated to implement all communication between processes and devices. The type of channel used for each link is determined by the types of ports supported by each PE and device and port cost; devices are allocated around the engine to ensure that each process has a communication channel to the devices it reads and writes. Device cost is minimized by using on-chip devices when possible.
5. RESULTS We implemented our algorithm in C11 using the NIH Class Library; the implementation consists of about 8,600 lines of code. We ran a series of experiments on an SGI Indigo workstation. We used as examples designs taken from software engineering books on object-oriented design: cfuge is the centrifuge of Calvez [1993]; dye is the dyeing machine of Selic et al. [1994]; juice is the juice plant of Shlaer and Mellor [1992]; train is the train control system of Booch [1989]. These are among the largest thoroughly described system specifications that we can find in the software specification literature. We used the object decomposition given by the authors and estimated CPU times on platforms based on the functional descriptions given by the authors. ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
312
•
W. Wolf Table I.
Experimental Results
Our results are summarized in Table I. A comparison of the results with the specifications shows that in every case, the algorithm came up with an efficient implementation with a minimal or near-minimal hardware cost. When compared to the nonobject-oriented cosynthesis algorithm we previously developed [Wolf 1996], our object-oriented cosynthesis algorithm created implementations with equal hardware costs, showing that the added complexity of the object-oriented specification did not confuse the optimization algorithm. The dye example required the most CPU time because it had the most methods. One significant contributor to CPU time is the scheduling feasibility check used in the inner loop of Step 2. This implementation included a simple and relatively inefficient scheduler; more efficient coding would lead to substantially shorter synthesis times. Even with the given implementation times, our algorithm executed quickly. ILP formulations of the distributed system cosynthesis problem, in contrast, require tens of thousands of CPU seconds even for small examples. ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
Object-Oriented Cosynthesis
•
313
6. CONCLUSIONS Many embedded systems are distributed systems, so it is important for cosynthesis algorithms to be able to generate systems with arbitrary hardware topologies as well as complex software architectures. When the compute engine includes multiple PEs, partitioning is very important because the system specification may not have properly exposed the parallelism required for an efficient implementation. Object-oriented specifications give us important hints on how to repartition the design during co-synthesis. Method decomposition provides natural cut points for decomposing objects into smaller pieces. We can use the partitioning information inherent in the object-oriented specification to modify the designer’s original partitioning for implementation; the partitioning into objects and methods allows us to find an efficient implementation while allowing the designer to work with the most natural organization for the specification. We believe that object-oriented specifications will find wide use in cosynthesis thanks to their intuitive presentation of powerful optimization alternatives. REFERENCES BOOCH, G. 1989. Object-Oriented Design. Addison-Wesley, Reading, MA. CALVEZ, J. P. 1993. Embedded Real-Time Systems: A Specification and Design Methodology. Wiley, New York. BARROS, E., ROSENSTIEL, W., AND XIONG, X. 1994. A method for partitioning UNITY language in hardware and software. In Proceedings, EuroDAC ’94, IEEE Computer Society Press, 220 –225. CHU, W. W. AND TAN, L. M.-T. 1987. Task allocation and precedence relations for distributed real-time systems. IEEE Trans. Comput. C-36, 6 (June), 667– 679. DASARATHY, B. AND FERIDUN, M. 1984. Task allocation problems in the synthesis of distributed real-time systems. In Proceedings, IEEE 1984 Real-Time Systems Symposium, IEEE, Piscataway, NJ, 135–144. ERNST, R., HENKEL, J., AND BENNER, T. 1993. Hardware-software co-synthesis for microcontrollers. IEEE Des. Test Comput. 10, 4 (Dec.), 64 –75. GUPTA, R. K. AND DE MICHELI, G. 1993. Hardware-software cosynthesis for digital systems. IEEE Des. Test Comput. 10, 3 (Sept.), 29 – 41. HUANG, J. P. 1985. Modeling of software partition for distributed real-time applications. IEEE Trans. Softw. Eng. SE-11, 10 (Oct.), 1113–1126. ISMAIL, T. B., O’BRIEN, K., AND JERRAYA, A. 1994. Interactive system-level partitioning with PARTIF. In Proceedings, EDAC ’94, IEEE Computer Society Press, Los Alamitos, CA. LAGNESE, E. D. AND THOMAS, D. E. 1991. Architectural partitioning of system level synthesis of integrated circuits. IEEE Trans. CAD/ICAS 10, 7 (July), 847– 860. LIU, C. L. AND LAYLAND, J. W. 1973. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM 20, 1 (Jan.) 46 – 61. PRAKASH, S. AND PARKER, A. C. 1992. SOS: Synthesis of application-specific heterogeneous multiprocessor systems. J. Parallel Distrib. Comput. 16, 338 –351. PRESSMAN, R. S. 1992. Software Engineering: A Practitioner’s Approach, 3rd ed. McGraw Hill, New York. RAMAMRITHAM, K., STANKOVIC, J. A., AND SHIAH, P.-F. 1990. Efficient scheduling algorithms for real-time multiprocessor systems. IEEE Trans. Parallel Distrib. Syst. 1, 2 (April), 184 –194. RUMBAUGH, J., BLAHA, M., PREMERLANI, W., EDDY, F., AND LORENSEN, W. 1991. ObjectOriented Modeling and Design. Prentice Hall, Englewood Cliffs, NJ. ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.
314
•
W. Wolf
SELIC, B., GULLEKSON, G., AND WARD, P. T. 1984. Real-Time Object-Oriented Modeling. Wiley, New York. SHLAER, S. AND MELLOR, S. J. 1992. Object Lifecycles: Modeling the World in States. Yourdon Press, Englewood Cliffs, NJ. VAHID, F., GONG, J., AND GAJSKI, D. D. 1994. A binary-constraint search algorithm for minimizing hardware during hardware/software partitioning. In Proceedings, EuroDAC ’94, IEEE Computer Society Press, Los Alamitos, CA, 214 –219. WOLF, W. 1994. Hardware-software co-design of embedded systems. Proc. IEEE 82, 7 (July) 967–989. WOLF, W. 1996. An architectural co-synthesis for distributed, embedded computing systems. IEEE Trans. VLSI Syst., accepted for publication. YEN, T.-Y. AND WOLF, W. 1995a. Sensitivity-driven co-synthesis of distributed embedded systems. In Proceedings, International Symposium on System Synthesis, IEEE Computer Society Press, Los Alamitos, CA. YEN, T.-Y. AND WOLF, W. 1995b. Communication synthesis for distributed embedded systems. In Proceedings, ICCAD-95, IEEE Computer Society Press, Los Alamitos, CA, 288 –294. ZAVE, P. 1989. A compositional approach to multiparadigm programming. IEEE Softw. (Sept.), 15–25. Received January 1996; accepted June 1996
ACM Transactions on Design Automation of Electronic Systems, Vol. 1, No. 3, July 1996.