The ultimate goal is to build a research framework that actually renders MPSOC platforms transparent (platform abstraction), enabling the application and SW ...
Multicore Processing and ARTEMIS -An incentive to develop the European Multiprocessor researchTiberiu Seceleanu (University of Turku, Turku Centre for Computer Science, Finland) Hannu Tenhunen (University of Turku, Turku Centre for Computer Science, Finland) Ahmed Jerraya (TIMA, France) Axel Jantsch (Royal Institute of Technology, Sweden) Georgi Kuzmanov (Delft University of Technology, The Netherlands) Kees Goosens (NXP, The Netherlands) Thierry Collette (CEA LIST, France) Bernard Candaele (Thales, France) Laila Gide (Thales, France) Jan Madsen (Technical University of Denmark, Denmark) Rudy Lauwereins (IMEC, Belgium) Lars Svensson (Chalmers University, Sweden) Luca Benini (Università di Bologna, Italy) Ivan Ring Nielsen (Technoconsult, Denmark) Marcello Coppola (STMicroelectronics, France) Paolo Ienne (EPFL, Switzerland) Alex Ramirez (Universitat Politècnica de Catalunya, Spain) Ran Ginosar (Israel Institute of Technology, Israel)
Content 1. Introduction .........................................................................................................................1 1.1. The “Multicore Processing and ARTEMIS” (A-MPSOC) networking session ....................1 1.2. Goals of A-MPSOC ................................................................................................1 2. The A-MPSOC research framework ..........................................................................................2 2.1. Fundamental MPSOC research ..................................................................................3 2.2. The framework .......................................................................................................3 2.3. Design by contract ..................................................................................................4 3. Expected impact....................................................................................................................5 References ..............................................................................................................................5 Appendices..............................................................................................................................6 Appendix A. State-of-the-art............................................................................................................... 6 A.1. Hardware (technology) landscape ................................................................................ 6 A.2. Software landscape ....................................................................................................... 6 A.3. Application landscape................................................................................................... 7 A.4. Tools landscape............................................................................................................. 7 Appendix B. Challenges for the future MPSOC designer.................................................................. 9 Appendix C. Organization of MPSOC research...............................................................................10 Appendix D. Research details ...........................................................................................................12 Appendix E. Actual research proposals ............................................................................................ 15
Multicore Processing and ARTEMIS 1. Introduction Even though multiprocessor architectures have been developed for a long time now, the approach was mostly focusing on multi-chip realizations. Clustering computers or micro-processors on the same board was the solution to manage complex applications vs. performance requirements. It is only in the recent period that technological advances allow for a change of this paradigm towards on-chip distributed platforms, or multi-core, or multi processor system-on-chip (MPSOC). A multiprocessor architecture may be defined as: onchip clusters of heterogeneous functionality modules, cooperating in the implementation of multiple concurrent applications. Architecturally, MPSOC combine characteristics from both distributed (DS) and on-chip systems (SOC). However, addressing issues from either one of these later paradigms will not necessarily bring optimal benefits to MPSOC. For instance, in MPSOC, differently to a “traditional” SOC view, concurrency at all levels plays a deterministic role, while problems such as power consumption, addressable separately in the nodes of a DS, must be unitary considered. Thus, distinct research and development issues must be defined for MPSOC, building on the indispensable experience in DS and SOC, and other related domains. Primarily motivated by market concerns, and also by the promises of the available billion transistor technology, MPSOC is increasingly becoming the preferred target for embedded systems (ES) implementations. Furthermore, the possibility to fit a huge number of different applications onto such a platform, poses important challenges to the realization of such systems. Leaving long time ago the “application specific” approach, the “domain specific” solution will soon also become insufficient as an application mapping paradigm, due to the increasing interaction between separate areas of the same industrial field, or even between distinct industrial domains (such as photo-mobile-phones, in car / on plane / on train multimedia systems, to cite just a few of the most popular ones).
1.1. The “Multicore Processing and ARTEMIS” (A-MPSOC) networking session The A-MPSOC event, part of the IST 2006 conference, tries to coagulate a concentrated effort at the European level towards pushing MPSOC research issues on the priority list of European actions. While changes are expected to improve this proposal, here we present our current options towards reaching the above mentioned goal.
MPSOC landscape. Currently, important hardware (HW) and software (SW) industrial actors world wide, such as Intel [1,2,3], AMD [4], Phillips [5], Microsoft [6], and others turn towards the MPSOC paradigm as a solution to further improve design efficiency, performance, to meet low cost requirements, and, thus, to maintain or upgrade positions on the global market. However, the multiprocessing design solutions for general use processors do not cover, at least at the moment, the full spectrum of requirements for ES-MPSOC. In the future, even such general ES processors may have to address a larger scope of design requirements. MPSOC and ARTEMIS. European academic and industrial partners noticed the need to strengthen the embedded systems development in Europe. One of the viable responses was the creation of the embedded systems technology platform, ARTEMIS (Advanced Research & Technology for EMbedded Intelligence and Systems). One of the main ARTEMIS objectives is to “define a common strategic research agenda that will become a reference in its own domain and attract commitment of all stakeholders in the sector. ARTEMIS will help to create the necessary critical mass and co-ordinate research efforts and initiatives across Europe in order to establish and implement a coherent and integrated European research and development strategy for Embedded Systems” [7]. The ARTEMIS Strategic Research Agenda (SRA) is organized along three major topics of the present ES design, and tries to set the roadmap towards its future. ES are approached from three perspectives: reference designs and architectures (RD&A), seamless connectivity & middleware (SC&M) and system design methods & tools (SDM&T). The issues of MPSOC are naturally contained within the ARTEMIS SRA. however, due to the larger scope of the SRA which has to embrace many aspects of ES design and deployment, the specific challenges and requirements of the MPSOC domain is not addressed, fully or specifically.
1.2. Goals of A-MPSOC While MPSOC issues are addressed by the ARTEMIS SRA in a generic manner, our efforts concentrate on providing a refinement of the SRA in the area of MPSOC-ES. The ultimate goal is to build a research framework that actually renders MPSOC platforms transparent (platform abstraction), enabling the application and SW developers to efficiently and optimally plan MPSOC designs. A set of reference architectures will contribute greatly to support a large movement towards such approach.
Page 1 @November 22, 2006
The suggested approach will naturally offer the premises for high competition at platform and application levels, as it will be possible to easily implement more applications, on a multitude of platforms. The A-MPSOC event, endorsed by ARTEMIS, intends to extract a consensual view on the promotion of the European MPSOC research at levels with full visibility. In this direction, the A-MPSOC goals also answer the challenges expressed by the European Comission’s perspective on information and communication technologies, in the areas of Embedded Systems design [8] and Computing Systems [9]. Thus, the roadmap / approach resulting from the AMPSOC event will be forwarded to the ARTEMIS board, for analysis and possible updates, and sent further for the European Commission, in order to contribute as basis for future FP7 calls. In the following sections, we develop the A-MPSOC perspective on the European MPSOC research field. We identify the current problems in the area and briefly analyze the scenarios for MPSOC development activities. The more detailed directions are described in a set of annexes, open for discussions and future contributions.
2. The A-MPSOC research framework It is difficult to plan / foresee any major breakthroughs from either methodological or technological perspectives, in the following 5 to 10 years. For instance, it is difficult to predict that a novel programming language may take over the current rich family of C-languages. However, it is not far-fetched to predict an evolution of current Cbased programming paradigms, for instance towards MPSOC parallel processing. This can be achieved through step-by step research advances and appropriate management of compiler development. Additionally, in the conditions imposed by time-tomarket, efficiency and correctness requirements, there will be a reduced set of MPSOC platforms. Differentiations will be offered by the quality of used IPs, both as HW and as SW modules, and the quality of their synergetic activities. The range of IPs will also be specialized to larger (microprocessor cores) or smaller (video, audio codecs, communication controllers, etc) devices. Portability issues must be addressed not by operating system (OS) procedures (as the traditional solution), but by the middleware layers. Thus, an engineering / managerial approach may be well suited to handle the previewed design and execution processes and the possibilities offered by previewed technologies. The research approach must take into account design aspects located at different levels of abstraction, from application specification to HW layers. Accordingly, the analysis of current practices is of extreme importance. This may be
viewed from several stand points: system developers, SW designers, platform developers (HW designers). A feasible solution is an approach based on separation of concerns, which considers both design and run time phases. A strategic separation can consider the application, software and hardware domains / layers. An additional, complementary attention must concentrate on the development of adequate tools for MPSOC. A brief exposure of current challenges is given as follows, respectively. Application. Unified models that capture concurrency in all of its flavors will enable efficient and flexible mapping onto various platforms, at later stages. Application “compatibility” is also a necessary study, as relative independence on the eventual resource management level is an ultimate desire. Accurate decomposition of applications into smaller tasks and their consecutive concurrent composition at run-time must not only ensure a timely and correct functional operation, but also an efficient one, in terms of platform parameters. Thus, application specifications must be able to convey design performance constraints such as required throughput, power, size, etc. Software. Concurrent SW paradigms must evolve towards the multi processor approach, different from the current multi-thread option. Application code must be tuned to optimally fit the available resources. Compiler technologies must cover, apart from the traditional OS calls, links towards the firmware layers. Based on these connections, platform adaptability – static, at design time, but also dynamic SW re-compilation, at run-time, become possible. OS procedures must also cover issues at lower abstraction layers, close to firmware, in order to enable features like dynamic reconfiguration, dynamic resource (re)placement (space scheduling), self-repairing and to provide support for autonomous dynamic re-compilation. Hardware. One of the issues at the HW level is the provision of abstraction towards higher specification layers. This will enable transparency with regard to the way in which multi-processing is implemented on specific platforms, re-utilization of resources, and platform adaptation at different application requirements. Another issue is related to the degree, granularity and speed of dynamic reconfiguration. Thus, the platforms must allow immediate superior layers (SW) to have a standardized access to resources, in order to allow run-time configuration, assessment of performance, failure diagnosis and (self-) repair. Reference architectures at platform level, with accurate characterizations, will trigger an increased activity at application layers, where a growing number of applications will be easier to implement on MPSOC. Tools. The tools available for MPSOC are mostly “focused” tools based on more traditional architectures and systems. Processor developers like ARM and Intel keep improving their compilers and
Page 2 @November 22, 2006
development environments and adapt them to support multi-core architectures. However, the tools are still based on the same assumptions such as that the input is a sequential C program. In contrast what these incremental approaches offer, future heterogeneous MPSOC need a very systematic support for concurrency, synchronization, communication, representation of global time, and consistency of complex, shared data structures. Moreover, system design will require solid tool support for functional verification, performance and power analysis.
3.
2.1. Fundamental MPSOC research From the above described challenges, a set of fundamental MPSOC research priorities can be extracted. While the timing of results on these directions cannot be estimated, when available, they will provide significant advances for either the design or the wide ES employment of MPSOC. These priorities / tracks can briefly be stated as follows. 1. Platform abstraction. Abstraction is essential, in order to hide concurrent complexity, and to relieve the application and SW developer of the MPSOC platform intricacies, in search for the optimal specification. 2. Concurrency modeling. New models of concurrency are required, in order to move from the multi-thread paradigm, useful for uniprocessor systems, towards the multi-processor approaches. Such models must span allover the
4.
system hierarchy, characterized by possibly different models of computation (MOC) at each level. Appropriate “lossless” MOC transformation rules are hence necessary, too. Self-organization. This track will have to address issues sourcing in the system itself – self-awareness, or in its environment – contextawareness. Self-awareness may be identified, context and stage-dependent, with robustness, optimality, quality. Context-awareness characterizes the situations when the system has to react either to user demands or to environment conditions. It may be interpreted as versatility, adaptability, responsiveness. Benchmarking. One of the biggest challenges going forward will be to benchmark the system performance of MPSOC. Benchmark suits that consist of both stochastic micro-benchmarks and realistic application models can significantly drive and speed up the development and evaluation of MPSOC. Especially a versatility stressmark will provide essential information for an appropriate pairing of platforms to applications and environments. Thus, one could hardly underestimate the value of benchmarking suits for tools analysis and comparison and for directing new developments.
2.2. The framework In Fig. 1, we illustrate one proposal for the definition of MPSOC research framework, addressing the above challenges. The figure also builds up a
Fig. 1. MPSOC operational / research framework. Page 3 @November 22, 2006
possible MPSOC design flow, at high levels of abstraction. More information on research topics is detailed in appendices C and D. Of interest here is the lower level interaction between the SW and HW domains, ruled by a contract-based approach to design. The SW domain is split into two parts. One (“SW development frameworks” in Fig.1) deals with the more “traditional” aspects of OS, compiler technologies, scheduling and programming techniques; the second SW domain partition (“Firmware+”, Fig.1) comes closer to HW subdomain: The OS+ topic will try to accommodate procedures implementing autonomous dynamic reconfiguration, time and space scheduling with optimality in mind, identification of failures and remedies, and it is strongly connected to the HW platform through firmware modules. It also provides support for middleware, drivers to various MPSOC platforms and the Compiler+ features. OS+ realizations will also have the task to accommodate, by re-shaping, the existent application SW. OS+ is intended to be a standardized set of procedures, possible to deal with specific firmware through means of drivers (DR - Fig. 1). The Compilers+ topic deals with static and dynamic re-compilation of higher level SW modules, in situations where the HW platform is changed, or failures require the run-time reconfiguration of the HW platform.
2.3. Design by contract Information exchange between the abstraction layers during the design process, but also at run-time is based on virtualization concepts, meant to render lower layers transparent for higher level design techniques. Briefly, this means that the underlying HW platform must be transparent to the SW developer (except Firmware+ levels), while the SW architecture may not be among of the concerns of the application developer. In order to accommodate the development of a rich set of platform independent applications, through a variety of OS
and programmes, it becomes necessary to also build standardized exchange points (SEP), enabling the breach through the borders of the virtual abstraction layer. The exchange of information through SEPs is regulated by the so called contracts. The contracts (Fig. 2) bear application/platform specific information and are meant for early assessment of the implementation quality in service, based on requirements at higher abstraction layers and features offered by modules situated at lower abstraction layers. Thus, even though there may be contracts binding only two neighboring layers, most often these are part of a larger contract, covering issues from application to platform, at design or run-time. Model of computation (MOC) translations; performance parameters such as: throughput, power consumption, area; failure information, performance downgrade, etc, are few of the design or execution terms that are included in a contract. Contracts explicitly express obligations of both contract partners concerning functionality and performance. It means that one partner can rely on the other partner fulfilling its obligations under all conditions and situations. This is an essential prerequisite to building very complex systems because system designers can rely that each component will comply with its obligations; it is not necessary to re-verify a component’s behavior and performance when new blocks and functions are added. A contract is conditional in the sense that one component can fulfill its obligations only of its environment and its neighboring components are compliant to their respective obligations as well. Design by contract is a very powerful way to build arbitrarily complex systems out of simpler components and sub-systems. Next, “intelligence” embedded at the different levels (controllers) will have to communicate, through SEPs, such that aspects of the contracts are negotiated, again, either statically, at design time, or dynamically, during the execution of the system – in which case the contracts specify acceptable ranges of values for the included measures. The “intelligent” control is exercised through a mixture of firmware and SW modules, with exhaustive knowledge of the platform and with standardized connections towards the upper layers.
Fig. 2. Contracts content. Page 4 @November 22, 2006
Breaches of contracts, such as undeliverable quality of services will be observed by OS / OS+ procedures and appropriate measures will be taken. The proposal conceived by figures 1 and 2 cover most of the future design challenges (Appendix B), projected on the MPSOC domain. The separation on the four technical domains localizes the concerns, and the foreseen virtualization and standardization of interactions, together with the contract-based approach ease the way in which problems may be addressed.
3. Expected impact Further, benefits of the approach come in various shapes: Platform independence and seamless portability. Possible due to the HW and SW abstractions, plus the Firmware+ layer. Compatibility / continuation / adaptability. Legacy, S(ingle)PSOC SW can easily find the way to a MPOSOC implementation by new compiling techniques and adaptation middleware. New SW can be built at higher quality standards, at the same time offering the possibility to target different HW platforms, or to deal with changes in platform functionality. Optimal implementation. By providing the enhanced OS+ and Compiler+ features, information on platform features is advanced to higher abstraction layers. Relative and timely placement of resources will optimize the space and performance of the implementation. The contract-based design framework enables the specification and verification of design constraints, at every abstraction level, as well as early and run-time estimation of various design requirements. Faster product development. Enabled through standardization of exchange points and Firmware+ layer. Same application (set) can be
easily retargeted to a different MPSOC platform. Designer proficiency. The framework will increase the quality of both SW and HW designers, as their respective design concerns are focused in their professional fields. Design costs. Economical gains are expected to appear, sourced in the increased re-utilization characteristic of the framework. IP-based design is taken to superior levels, as methodology independent constructs are possible to be interconnected and to synergistically interact. Competitiveness of European MPSOC. Enabled by the previewed relaxation of design issues at high abstraction layers (application / SW). An increased number of applications will be implemented at qualitative standards on a variety of MPSOC platforms.
References [1] Accelerating Security Applications with Intel® Multicore Processors. [2] D. Perlmutter. Intel® Centrino® Duo Mobile Technology: The Beginning of an Era of Mobile Multi-Core Computing. Intel® Technology Journal, Volume 10, Issue 02. May 15, 2006. [3] Intel MultiProcessor Specification. 1997. [4] AMD Athlon MP Processor Model 10 Data Sheet for Multiprocessor Platforms. [5] S. Dutta, R. Jensen, A. Rieckmann, Viper: A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems. IEEE Design and Test of Computers, September/October 2001, pp. 21-31. [6] Multiprocessor Considerations for Kernel-Mode Drivers. Microsoft Corporation. 2004. [7] ARTEMIS Strategic Research Agenda. [8] Embedded Systems Design - Networked Embedded and Control Systems. Session at IST 2006, Helsinki, Finland. [9] Computing Systems. Session at IST 2006, Helsinki, Finland.
Page 5 @November 22, 2006
Appendices. Appendix A. State-of-the-art. ............................................................................................................................ 6 A.1. Hardware (technology) landscape............................................................................................... 6 A.2. Software landscape...................................................................................................................... 6 A.3. Application landscape ................................................................................................................. 7 A.4. Tools landscape ........................................................................................................................... 7 Appendix B. Challenges for the future MPSOC designer ................................................................................ 9 Appendix C. Organization of MPSOC research.............................................................................................. 10 Appendix D. Research details..........................................................................................................................12 Appendix E. Actual research proposals........................................................................................................... 15
Appendix A. State-of-the-art. A.1. Hardware (technology) landscape Even though, in spite of many fears, technology scaling complied with Moore’s law, this has not been followed by a corresponding advancement in other areas of system design. Several factors have hindered the growth here, and these are well known from a rich existent bibliography. Very briefly, they can be summarized as follows. Transistor performance has scaled exponentially, but the delay through global interconnect has followed a much slower improvement trend. Therefore, architectures are now wire-delay-limited. The development of design tools (quality, usability, applicability, etc) did not match the growth in the number of available transistors. As a consequence, the human factor is still important, as ever larger design teams and longer design cycles are required. The strive for ever-increasing performance has led to large growths in dynamic and static power dissipation. Power consumption and thermal dissipation become some of the primary design concerns. As feature sizes scale down, manufacturing defects and parametric variations have become increasingly common. High parameter variability and transistor mismatch, increasing leakage, wear out of transistors are just some of the resulting issues. Novel technologies, for a while, at least, will have to coexist with “old” CMOS technologies. Thus, technological mixtures have to be considered.
A.2. Software landscape The main issue in this track of MPSOC design stands in the inherent parallelism offered by the underlying hardware platforms. While parallel programming is not a new idea, parallelism has long been the “next big thing” [1]. However, it seems that the performance of single / sequential CPU software has leveled. Now “REAL CONCURRENCY” is the word, with software having to extract benefits from the simultaneously operating hardware resources. Concurrent programming originated in 1962 with the invention of channels, which are independent device controllers. Channels make it possible to have a CPU execute a new application program at the same time that I/O operations are being executed on behalf of other, suspended application programs. Hence, concurrent programming was initially of concern to operating systems designers. The most common concurrency model nowadays is based on threads, semaphores, and mutual exclusion locks. However, these methods date to the 1960’s (e.g. Dijkstra). Patterns are an additional means to alleviate the costly rediscovery and reinvention of proven concurrent software concepts and component solutions. Patterns are useful for documenting recurring micro-architectures, as abstractions of common object-structures that expert developers apply to solve concurrent software problems. They are also mostly thread-based approaches to solve concurrency (thread-per request, thread pool, thread-per session, etc). With all these, tauntingly simple rules (e.g. always grab locks in the same order) are impossible to apply in practice [2]. Seemingly, “humans are quickly overwhelmed by concurrency and find it much more difficult to reason about concurrent than sequential code. Even careful people miss possible interleavings among even simple collections of partially ordered operations.” [3]. Thus, perhaps the main problem nowadays in planning concurrent programming stands in the fact that threads are nondeterministic. Hence, the programmer’s job is to prune away the nondeterminism by imposing constraints on execution order (e.g., mutexes), such that the underlying sequential machine may proceed with the execution [2]. Moreover, history has shown the benefits of this approach to multiprocessing are far from clear. Multi threading is essentially a uniprocessor technique where only the minimum level of processor logic is duplicated to support additional hardware threads [4].
Page 6 @November 22, 2006
In addition, it is impossible to think that a single or at least only a few standard MPSOC architectures will be available in the future. It is also possible that a variety of MPSOC operating systems will also appear, each being more adapted to a certain (group of) MPSOC. Therefore, it will be very important to consider middleware layers able to synchronize platform operations with the expected reactions / commands at the OS level. The effort at middleware layers should target composability support, scalability and minimal power consumption while offering open interfaces to third parties for application development.
A.3. Application landscape Adding to the inherent technological complexity of on-chip multi processor systems, new challenges arise from application development perspectives. Multiple applications, pertaining to various application-domains, will be co-located within the borders of a single chip. The Artemis SRA identifies four application domains: Industrial Systems, Nomadic Environments, Private Spaces and Public Infrastructure. They may further be refined into healthcare, home environment, transportation, communication, public information and education, security, etc. The introduction of domain clustering, as expressed by the SRA, too, intends to simplify the otherwise impossible task of finding solutions to fit requirements from all the domains. To evaluate platforms, architectures, methods and tools, real applications or relevant abstractions are required. However, it is inherently hard to use real application models to drive developments of tools, architectures and methodologies for complex MPSOC, because, by definition, a single MPSOC application will consist of a variety of different sub-applications from baseband signal processing to multi-media stream processing to the handling of TCP/IP protocol stacks and fancy user interfaces. Thus, the traditional platform based design (Fig.2. a)), while preserving its validity with respect to individual applications, must be considered in a larger picture. Specific design approaches ending in specific platform
a)
b)
Fig. 2. a) Traditional platform based design methodology. b) Possible new considerations in MPSOC design mappings must be melted into a singular platform, able to satisfy the requirements of all the components (Fig.2. b)). Such a platform, controlled by a reactive application and hardware resource aware operating system will accommodate the concurrent execution of multiple applications in terms of resource availability and scheduling (time and space axes). In order to achieve this desiderates, a common modeling standard is necessary to be employed. The modeling should spread over application, software and hardware dimensions.
A.4. Tools landscape The tools for available for MPSOC are mostly point tools that are based on more traditional architectures and systems. For instance, the processor developers like ARM and Intel keep improving their compilers and development environments and adapt them to support multi-core architectures. However, this is a very evolutionary route and all improvements are incremental and the tools are still based on the same assumptions such as that the input is a sequential C program. In contrast what these incremental approaches offer, future heterogeneous MPSOC need a very systematic support for concurrency, synchronization, communication, representation of global time, and consistency of complex, shared data structures. Moreover, system design will require solid tool support for functional verification, for performance and for power analysis. Again, there are point tools that address these issues at the block and component level, but they do not generalize easily to the system level for complex, heterogeneous MPSOC.
Page 7 @November 22, 2006
In order to develop strong tools for MPSOC, the community still has to develop and agree on several basic concepts, techniques and standards. First attempts have been taken by several research groups, but also notably by industry [5,6]. There the first steps have been taken to express communication and synchronization in heterogeneous MPSOC in a standardized way that allow for efficient modeling and implementation. But more needs to be done to also cover more complex communication models and to provide efficient access to large, shared and distributed data structures such as lists, arrays, trees and hash tables. Furthermore, the performance and power modeling aspect has hardly been addressed at all. Some recent projects such as the FP6 project SPRINT (http://www.ecsi-association.org/sprint) are starting to develop concepts, notations and standards to express Quality of Service parameters. Once this is accomplished, system level performance and power analysis tools can be developed. But all these efforts are only a nascent beginning and will require strong and focused research for several years. In the absence of a large number of complete application scenarios, one important support for researchers and developers can come from benchmarks. One of the biggest challenges going forward will be benchmarking the system performance of multicore devices [7]. Benchmark suits that consist of both stochastic micro-benchmarks and realistic application models can drive and speed up the development and evaluation of MPSoC significantly. Their value for analyzing and comparing proposed tools and for directing new developments can hardly be underestimated. Consequently, several groups in Europe, the US and Canada have recently joined forces to organize the development of benchmarks for MPSOCs and Networks-on-Chip. This effort is supported by the OCP organization. However, this is only a first step and a more systematic and broader approach that involves all the big European industries and research organizations has to be initiated, if industrial development and academic research shall be intimately tied together to benefit from and drive each other. References [1] H. Sutter. Software and the Concurrency Revolution. MPSOC 2006 presentation. [2] E. A. Lee. The Future of Embedded Software. Artemis 3rd conference, Graz, 2006. [3] H. Sutter and J. Larus. Software and the concurrency revolution. ACM Queue, 3(7), 2005. [4] J. Goodacre. Tutorial: How to analyze your multiprocessing options. Embedded Systems Design Magazine. 05/09/06 [5] P. Paulin, C. Pilkington, E. Bensoudane, StepNP: A System-Level Exploration Platform for Network Processors, IEEE Design & Test of Computers, vol. 19, no. 6, pp. 17-26, November-December 2002. [6] P. van der Wolf, E. A. de Kock, T. Henriksson, W. Kruijtzer, G. Essink, Design and programming of embedded multiprocessors: an interface-centric approach, International Conference on HW/SW Codesign and System Synthesis, pp. 206-217, 2004. [7] D. Bursky. Multicore solutions proliferating. EE Times, 09/04/2006.
Page 8 @November 22, 2006
Appendix B. Challenges for the future MPSOC designer OVERALL CHALLENGE IN BRIEF: Create methods, tools, platforms for mapping multiple SW applications on one flexible yet energy efficient and scaling tolerant platform Among a large set of challenges to be met by the tomorrow ES / MPSOC designer, we mention the following [1,2,3, etc]: - abstraction - precise methods, that are aware of both functionality and of the physical platform upon which the software runs, are needed to direct the embedded software design process, from flexible higher levels of abstraction, through mathematics based refinement steps, towards constraint satisfaction; - benchmarking – benchmark suits that consist of both stochastic micro-benchmarks and realistic application models can drive and speed up the development and evaluation of MPSOC significantly. - concurrent development frameworks – surpassing the sequential paradigm. Developing generic modeling and design methods that take into consideration the parallelism at the platform level and the dynamically reconfiguring architectures. Models must depend on languages and scalable algorithms for the control of evolvable, distributed and adaptable systems. They must help the designer to master complexity, temporal and spatial uncertainties such as delays and bandwidth in communications and resource availability. Design of multi-processor systems will require not only the now “traditional” HW-SW partitioning, but also “SW– SW” partitioning and co-design, i.e., assignment of SW tasks to various processor options. - development environments – tools that transcend particular formats, enabling early estimations on design performance, virtual prototyping; environment must be taken into consideration; - dynamism and softness - the systems must adapt, at runtime, under the influence of user requirements; no methodology yet exists that enables designers to identify and select programmable features of a chip. Architectural innovation is needed to meet SOC opportunities and constraints and to support such dynamic behavior/architecture. At the same time, a rigorous division of responsibilities must be established, between the design and the execution phases. - interconnect architectures - arbitration, synchronization, routing and repeating schemes. Synergetic behavior of heterogeneous components is a must, to be achieved both through intelligent interfacing and through middleware development. A large scale integrability of IP blocks is necessary for speed up the time-to-market directives; - memory – estimates indicate 90% of die area will be memory, considerably impacting performance; require exploration and synthesis of memory and communication architectures early in the design flow; see of memory with islands of functional units or the opposite ? - middleware – adaptability of the OS to different MPSOC platforms, as well as providing interface to communication frameworks external to the MPSOC. - multidisciplinary thinking - hardware, software, environment, usability, cultural frameworks; - platformisation - the possibility to use in a sensitive and constructive manner the “infinite” transistor availability to create new circuit architectures able to withstand large transistor parameter variability; - predictability - the capability of making early decisions based on the expected performance of the final implementation. The new technological approaches – multi-gate transistors, single electron transistors, silicon nano-wires, etc, raise obvious issues with respect to the expected behaviour of the system, at delivery and during its operation period; - software architectures – MPSOC OS capable of dealing with true concurrency, hosting procedures for dynamic reconfiguration based on application requirements or environmental changes. Additional OS facilities to deal with self-organization and self-repair, helping to build autonomous devices. - security - a critical issue, requiring new classes of software applications and technologies that are uniquely served by multi-core processors. An increasingly effective approach to providing additional platform security is to leverage the power of virtualization technology to segregate trusted applications from untrusted ones References [1] Jan M. Rabaey. System-On-Chip-Challenges In The Deep-Sub-Micron Era. A case for the network-on-aChip. In “Interconnect-Centric Design for Advanced Soc and Noc”. J. Nurmi, H. Tenhunen, J. Isoaho, A. Jantsch, eds., Kluwer, 2004. pp. 5-26. [2] M. Sgroi et al. Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design. The 38th Design Automation Conference, Las Vegas, June 2001, pp.667-72. [3] International Technology Roadmap For Semiconductors 2005 Edition. Design.
Page 9 @November 22, 2006
Appendix C. Organization of MPSOC research The MPSOC research can be organized in two main categories: Work areas ¾ Corresponding to longer term research, or to the “foundational research” as expressed by the Artemis SRA. Intent is to identify research paths and goals, reached at the end of several projects. Research for innovation is a large part of this category. ¾ Contains: Challenges & Vision Objectives (non-measurable results) Goals (measurable results) Content ¾ Corresponding to shorter term research periods. May be assimilated to calls for projects and the projects themselves. Addresses the actual work to be done in these periods. ¾ Finds partial answers to the issues identified as above. ¾ Contains: Definition of objectives.
Issues / Work Areas / Fundamental research
Project 1
Project 2 Project Calls
MPSOC Research space
Content / “Projects” / Short term research
Fig. 3. The MPSOC research space. Work approach Key results or innovations. A representation of the research space can be visualized as illustrated in Fig. 3. A typical project call or a project by itself should address (a small) part of the fundamental issues and provide several concrete results along the content axis. Research topics, be them fundamental or short-term, will most naturally cover several parts from the defined MPSOC research chapters. In the next paragraphs, we identify research subjects meant to improve the understanding and developing of MPSOC projects.
Page 10 @November 22, 2006
Application. ¾ Design methodology. Models. A unified application – platform modeling environment will bring enormous advantages to the design process from the perspectives of time-to-market, correctness of design and efficiency. Adoption and adaptation of existent modeling paradigms is possible. Formal frameworks must be accommodated with these environments in order to ensure a correct design path. Methods to identify and extract parallelism, at high levels of abstraction. Models should provide means for specification of a multitude of requirements, sourced either in application, or in the platform (HW / SW) layers. This should ensure a simultaneous and possibly transparent top-down and bottom-up design flow activities. Partial or overall methodology should either provide means for tool integration or contain clear directions on tool specification. ¾ Benchmarks Stochastic micro benchmarks Realistic benchmarks that mimic industrial applications in important ways. Software platforms. ¾ OS for MPSOC. HW level concurrency-aware operating systems. Reactiveness to dynamic environmental changes. ¾ Dynamic reconfiguration controllers (OS-level). Multiple functions application tasks must be perceived not only on a time basis, but also on a resource, spatial basis: parts of SW application may require new resources to be implemented on-the-fly on the area-restricted device. ¾ Fault tolerance / HW tolerance. Self-diagnosis procedures must detect and repair / replace defective or malfunctioning HW resources while the system maintains its operation capabilities. ¾ Middleware for mismatches OS / HW platform Hardware platforms. ¾ Platform modeling and characterization: high level capture of MPSOC platform characteristics: parallelism, communication infrastructure, communication protocols, timing. Implications on both software and tools tracks. ¾ Reconfiguration: both functional units, as well as interconnect structures. Granularity and placement, libraries containing functional and spatial / geometrical information. Self-aware, selfrepair, autonomous devices based on dynamic reconfiguration. Implications on both software and tools tracks. ¾ Metrics: developed to assess MPSOC platform suitability for a specific range of applications. Traditional metrics plus new ones: performance (time), area, power consumption, adaptability, QOS, etc. Requires correct representation in the tools track. ¾ Technology: reliability and fault tolerance (HW levels), heterogeneous technologies, that is, nonFET based logic and memory, and its possible integration with CMOS, development of novel FPGA(like) technologies. Tools. ¾ CAD tools for heterogeneous (mixed technologies) design ¾ Heterogeneous simulation environment: electrical, mechanical, optical, thermal ¾ Compilers and synthesis for concurrent, mixed HW/SW applications ¾ Standards for communication, synchronization, shared memory access with QoS support ¾ System level functional verification ¾ System level performance analysis ¾ System level power and energy analysis
Page 11 @November 22, 2006
Appendix D. Research details Based on the period in which the respective goals are reachable, research scenarios and priorities may be further categorized in “long-term”, that is, goals are possible to materialize after a period of 10 to 15 years, and “shortterm”, that is, results are expected to appear relatively fast, at the end of smaller scale projects, in about 4 to 5 years. We give in the following a short overview of what such research plans may be. Long term research plans (~ 10-15 years): LTRP1 –Build on demand / by remote construction: Differently from self-adaptability (and the likes), and actually building-up on such concepts, “self-building” systems are delivered to the end user without a preinstalled functionality. The only features provided are a very basic OS with procedures that implement, on request, a link to IP providers. The user (transparently) will download the required functionality, under the control of the OS. The later is also responsible of placing the modules on the chip, building the interconnect network and managing the local execution. The identification of the correct and appropriate IPs is to be done on the “server” side. Here, the library IPs are verified against the restrictions provided by the user. The corresponding methodology rules are applied, in order to guide the residing OS into building a performant (a range of measures) system. The system may be upgraded or changed, wherever desired by user or specified by functionality. The path of this topic will evolve on two directions: a. Self assembling systems. Given a high level specification, a fabric of resources, and associated managers, the system self assembles, i.e. downloads functionality, compiles/interprets it, allocates resources, maps it, schedules / executes it, etc.. b. Infrastructure. The environment must provide the opportunities for enabling such self-building features. Similar to the current wireless surroundings, a certain “network coverage” must be provided for the system to interact with possible IP providers. Databases of components with through descriptions of their characteristics must also be available through such connections. Security aspects become also extremely important in such environments, and have to be covered by the network provider. LTRP2 - Holistic (end-to-end) approaches for complex real-time reconfigurable embedded system design: New approaches must address holistic optimal and rapid design process of real-time reconfigurable embedded systems. The general objective is to develop methodologies and tool-chains supporting the entire embedded system design flow, from high-level algorithm definition to efficient modular implementation of a reconfigurable heterogeneous system (RHS). An RHS is considered as a combination of embedded processors, digital signal processors and reconfigurable hardware. Generally, the prospective design approaches are envisioned to include: (a) support for various formats in algorithm description and exploration (both diagrammatic and textual), (b) frameworks that allow novel algorithms for design space exploration, (c) system synthesis tools producing near-optimal implementations that best exploit the capability of each type of processing element. For instance, dynamic reconfigurability of hardware can be exploited to support function upgrade or adaptation to operating conditions. From the application point of view, the complexity of future embedded devices is becoming too big to design monolithic processing platforms. This is where the approaches with reconfigurable heterogeneous systems become vital. Their goal is to provide the future designers with powerfull methods and tools to “best fit”applications in a selected reconfigurable heterogeneous platform. Short term research plans (~ 5 years). The following STRPs may be considered as potential project generators. They partially contribute to the realization of the LTRP1, either from methodological or technological points of view. STRP1 : Definition of execution model and environment1: New MPSOC architectures will be based on new execution model and execution environment. This work will focus the study of environment system evolution and participation on the definition of these new environment systems, each one depending on the application domain. For instances, some of them will need good performance in computing, other in power consumption, real time etc.
1
also in the SRA, DM&T
Page 12 @November 22, 2006
STRP2. – Specification languages at new levels2: Base on the execution models, unified languages (metamodels) must be defined. In particular, they must express and extract parallelism in a standardized manner. This addresses aspects of both application(s) and platform modeling, in a unitary high level specification environment. Seamless portability is an important goal here, to be achieved by procedures that map applications to various MPSOC platforms. Those new languages will also include real time, value semantics, energy, QoS, reliability, etc. information related to components. This will allow the deterministic composition of elements and efficient validation & verification of applications. The objective here is to propose new parallel programming paradigm for MPSOC that will support the variety of application classes that are foreseen. Must develop metrics for evaluation of parallelism degrees, which should guide the MPSOC design. Will trigger / strengthen the development of specialized tools for automatic analysis and synthesis of MPSOC. STRP3 - Reconfigurability and platformisation2,3: to cope with fast evolution of application needs and severe constraints in terms of time to market, power, cost, reliability, flexibility and reuse in a comprehensive manner, a range of possible solutions can be foreseen due to the increasing possibilities offered by incoming SoC era. Dynamic reconfigurability is the key word here. Emphasis is on configuring specialized platforms tailored for some given classes of applications as well as on high speed and low power reconfiguring of some hardware functionalities. Granularity of reconfiguration and on the fly reconfiguration mechanisms are another essential aspects of reconfiguration for MPSoC. We have three options, in general, to look at system reconfiguration: 1. Node reconfigurability (NR) Some nodes are FPGA devices, implying:
2.
•
Various functionalities over time
•
Static size
• The communication infrastructure remains stable System reconfigurability (SR) The system IS a FPGA device, implying: •
Dynamic size
•
Requires not only scheduling in time, but also in space !
•
The communication infrastructure will change
3. Both In any of the above situations, we need: 1. operating systems (STRP4) that, by predefined or acquired policies are able to control the reconfiguration schedules, based on application-set requirements the component properties as defined in STRP2 the available components from new downloads 2. improvements in the technological dimension (on-chip): size of dynamically reconfigurable blocks speed of reconfiguration cost of reconfiguration in terms of hardware & software infrastructure 3. infrastructure: network coverage (more than the current wireless, but something like this ?). Meant to offer the possibility for downloads. component databases available through the above networks. Thus, effective and dynamic exploitation of resources implies a coordinated tuning of design methodologies, target architectures throughout all the abstraction layers of the application design – see STRP2. Issues are to be found both at research level as well as with industries providing solutions for reconfigurable devices. STRP4 – New operating systems for MPSoC3: System software, including operating systems that implement the new programming paradigm defined in STRP1 and efficiently support the dynamic reconfiguration characteristics described above. Especially important for MPSOC targeting resource & energy efficient devices 2 3
also in the SRA, RD&A also in the SRA, SC&M
Page 13 @November 22, 2006
and for network enabled systems. Include advances on modular approaches to operating systems, support for virtual machines within MPSOC, and real time support. In order to take into account at the same time the environment system and the architecture, hierarchical and distributed operating systems for MPSOC must be proposed, at least one managing the application environment, another one managing the complex MPSOC architecture. STRP5 – Autonomy / self awareness2,3: The key challenge in nano-scale and giga-complexity MPSOC is to build highly reliable, predictable and very complex systems out of partly stochastically behaving subcomponents (like communication networks) and devices (large variability in device characteristics and reliability) operating in unforeseen environments with strict performance or power consumption constraints. A dynamic and autonomous control system will be required as part of platform infrastructure. This infrastructure should provide HW and SW and communication resources for testing mechanisms, power management, resource management, fault management, performance management etc. in a reconfigurable way and provide also hooks and mechanism to extend and customize these features. Furthermore, HW & SW architectural mechanisms must be proposed in order to design architectures able to support defects inside themselves and able to support the ‘self healing computing”. STRP6 - Mixed Nano/CMOS technologies4: It is widely recognized that the MOS transistor will remain as the main actor of microelectronics for years, and that it will continue to play its role in the context of CMOS technologies. The SIA prospects tell us that this will be true at least for the next decade. Then, a key issue to enable the use of any nano-device relies on the ability this device may have to co-exist with CMOS. In this sense, there is a need to incorporate the nano-devices to the regular design flow in a way compatible with the use of CMOS, otherwise their impact on the microelectronics industry should not be relevant at all. This is in close relation to the need of offering a plug and play capacity to heterogeneous technologies and might condition the emergency of new applications based on the nano-electronics platform. STRP7 – Tools for MPSOC: Intrinsically connected to STRP1 and STRP2, and strongly related to all the other STRPs. “Tools” for ES are part of the Artemis SRA, however, not-focused especially to MPSOC. STRP7.1 - formats in algorithm description and exploration1: develop / integrate methods and algorithms for unification of multiple types of high-level algorithmic entries into a single algorithm description and a single design solution; translate multiple front-end algorithmic entries into a single unified representation. This representation will also be able to assist the designers to instrument and possibly improve the input algorithm at the highest level of abstraction. The latter functionality will involve certain algorithm explorations, which may also identify and collect a number of properties of the input algorithm. Such properties will be conveyable further through the tool chain. STRP7.2 - design partitioning and task transformations1: to assist a designer in partitioning an application algorithm into tasks for efficient execution on a candidate reconfigurable heterogeneous system platform comprising a GPP, a DSP, and reconfigurable hardware. The tools will provide a valid hardware/software partitioning of an algorithm among different system components. STRP7.3 – benchmarking. Metric evaluations for both hardware and software components: to perform metric analysis and transformation of an application. A toolset will collect feedback data from different design space exploration tools, and by means of different metrics estimations will provide optimal design solutions with respect to particular requirements. Benchmarking will have to address specific requirements related to a variety of MPSoC architectures, and correlated with the target application domains. STRP7.4 - system synthesis tools producing near-optimal implementations1: employed once a promising reconfigurable heterogeneous system has been identified. These tools will provide the synthesis of a given reconfigurable heterogeneous system by providing binaries for all its hardware and software components.
4
To be addressed by (in cooperation with) complementary technological platforms
Page 14 @November 22, 2006
Appendix E. Actual research proposals To be edited after the networking event.
Page 15 @November 22, 2006