customization paradigm. I. INTRODUCTION. Multicore architectures are becoming indispensable to high-end embedded computing as application energy/ ...
2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip
Automatic Runtime Customization for Variability Awareness on Multicore Platforms Gasser Ayad and Ramakrishna Nittala
Romain Lemaire
Department of Control and Computer Engineering (DAUIN), Politecnico di Torino, Turin, Italy
Univ. Grenoble Alpes, F-38000 Grenoble, France CEA, LETI, MINATEC Campus, F-38054 Grenoble, France
Abstract—Driven by increasingly aggressive CMOS technology scaling, sub-wavelength lithography is incurring more evident variability in the technology parameters of the semiconductors fabrication process. That variability results in otherwise identical designs displaying very different performances, power consumption levels and lifespans once fabricated. Hence, process variability may lead to execution uncertainties, impacting the expected quality of service and energy efficiency of the running software. As such uncertainties are intolerable in certain application domains such as automotive and avionic infotainment systems, it has become a persistent necessity to customize runtime engines to introduce measures for variability awareness in task allocation decisions. The purpose of compensating process variability is to avoid performance degradation and energy inefficiency. And customization is meant to take place automatically through exporting the variability-impacted platform characteristics - such as per-core manufactured clock frequency - for the runtime library to perform variability-aware workload sharing on the target cores of the hardware platform. Hence, we can eventually achieve noticeable optimization results, not only on the system performance and energy consumption levels, but also in increasing productivity in systems development, testing, integration, and marketing. This paper presents a holistic approach starting from a system model of the target multicore platform, to building and integrating the runtime library, and finally highlighting the optimization results achieved through the proposed runtime customization paradigm.
I.
Process variations in future technology generations will have a profound impact on the reliability, performance, and power consumption of microprocessor designs. These manufacturing deviations due to both systematic fabrication errors as well as random statistical variations affect gate size, dopant concentration, interconnect width, spacing, and thickness. This translates directly to chips that miss critical circuit design targets including latency, power, and resilience to noise. In current designs, foundry induced physical deviations already produce significant die-to-die variation [1]. ITRS [2] predicts that circuit performance variability will increase from 48% to 66% in the next ten years [3]. Moreover, many-core die sizes may scale faster than geometric technology scaling, which undoubtedly indicate that manufacturing variability will have an increasing prominence in future designs. Variability tolerance in multicore platforms requires circuits to monitor dynamic variations and to compensate them, as well as software policies to decide when and how to apply compensation in response to static and dynamic perturbations in the nominal operating characteristics. In order to support optimal energy management, it is typical to allow dynamic clock adjustments on each individual processing element, and to allow certain cores to be clock-gated or power-gated. When there is work to be done, individual cores are powered up to run at full speed, while during periods of inactivity in the workload, cores are power-gated or slowed down to save energy. Such effort required to carefully engineer a solution that makes efficient use of a heterogeneous platform is highly dependent on the platform configuration, which may be difficult to predict. Put differently, for most variability-prone or heterogeneous SoC architectures, many configuration options exist, with different numbers of general purpose and specialized coprocessors, different clock rates, different memory sizes, and different interconnection speeds. Furthermore, the configuration options may change from year to year. Managing the distribution and evolution of complex software systems across such a bewildering diversity of deployment platforms is extremely difficult without automated support to facilitate masking or abstracting the underlying hardware design complexities for more efficient cycles of development, testing and marketing of software solutions [4]. In other words, application programming for variability impacted systems has become especially challenging with the level of sophistication featured in today’s chip design technologies. Thus, new and more efficient tools for software development have become an absolute necessity in order to cut software design, implementation, testing and marketing time.
I NTRODUCTION
Multicore architectures are becoming indispensable to high-end embedded computing as application energy/efficiency requirements exceed 10 GOPS/Watt. Unfortunately, sub-65 nm CMOS technology nodes will be increasingly affected by the variation phenomenon, and multicore architectures will be impacted in many ways by the variability of the underlying silicon fabrics. In particular, intra-die process variations result in significant core-to-core as well as chip-to-chip frequency variations implying heterogeneity in core performance, energy consumption, and reliability. This problem is being addressed at multiple levels of abstraction, from the circuit level up to the system level. Variability is observed as either process or runtime variability. Process variability, also known as static variability, results in otherwise identical designs displaying very different power consumptions and lifespans once fabricated, whilst runtime or dynamic variability is concerned with changes that occur after fabrication during the lifespan of the system, i.e. component degradation or failure, constant change in battery power levels, temperature variations, and transient faults. 978-1-4799-8670-5/15 $31.00 © 2015 European Union DOI 10.1109/MCSoC.2015.19
143
tolerance will be presented in section II. Target platform as well as a methodology for modeling it using the UML/SysML language will then be established in section III followed by a detailed description of the automatic customization approach in section IV. A parallelizable software benchmark is used in experimental setup for drawing the customization results in section V. The paper is concluded in section VI by delivering the overall outcome of the automatic customization approach and introduces to prospects of more comprehensive customization.
In this paper, an innovative approach for automatic variability compensation is being presented. The customization flow starts from a high level model-based description of the target hardware platform and ends in a customized variabilityaware runtime library capable of performing well-informed task allocation decisions aiming at cutting overall execution time and thus reducing energy consumption. These benefits are especially valuable in battery-enabled domains such as mobile communications, remote data acquisition, wireless controller applications, as well as the promising Internet of Things. These benefits are also relevant to large server farms, where improved power efficiency leads to huge savings in cooling and electric power costs [4], thus supporting Green IT prospects.
II.
R ELATED W ORK
A. High-Level Modeling of Target Hardware Platforms
The abstraction level of the description is thought to be detailed enough to capture characteristics relevant for the runtime manager to make adequate task allocation decisions. Such characteristics typically include types and organization of cores, memory hierarchy, and topology of the interconnect. Post-manufacturing characterization of the platform reveals the variability-impacted operating factors such as the manufactured clock frequency value of each core so that it becomes known how much it differs from the datasheet value.
The increasing amount of hardware resources in next generation MultiProcessor Systems-on-Chip (MPSoC) calls for efficient design methodologies and tools to reduce their development complexity. Presented in [6] is a candidate MPSoC design environment Gaspard2, which uses the MARTE (Modeling and Analysis of Real-Time and Embedded systems) standard profile for high-level systems specification. Gaspard2 adopts a methodology based on Model-Driven Engineering. It promotes separation of concerns, reusability and automatic model refinement from higher abstraction levels to executable descriptions.
At the top of the customization toolchain, a SysML [5] model of the target platform is developed. The modeling tool facilitates exporting the model into a format - such as XML - where variability parameters can next be parsed and reformatted suitably for integration with the runtime library. The runtime library incorporates a state-of-the-art variabilityaware task allocation policy which has been proven efficient - in terms of overall performance and energy consumption compared to widely known variability-unaware policies. The automatically-customized runtime library is tested against a parallelizable benchmark running on top of the target platform, which lies at the bottom of the toolchain where overall execution time and energy consumption statistics are obtained showing the added value due to customization. The customization flow is shown in figure 1.
In addition, [7] presents a novel methodology for modeling partially dynamic reconfigurable hardware at transaction level. That paper covers the lack of tools and mechanisms for the design of reconfigurable logic at system level and for the exploration of the different configurations of such architectures. The presented mechanisms have been implemented in ReSP a transaction-level MPSoC simulation platform that works at a high abstraction level. Components used by ReSP are based on SystemC and TLM hardware and communication description libraries. ReSP provides a non-intrusive framework to manipulate SystemC and TLM objects. Its capabilities augment the platform with the possibility of observing the internal structure of the SystemC component models. This feature enables runtime composition and dynamic management of the architecture under analysis. B. Awareness of and Compensating the Variability Factor Recently, much attention has been given to task allocation and scheduling strategies for MPSoCs affected by variability and aging. [8] gives an overview of the concept of variability. Concerning the allocation countermeasures proposed in the literature, a process-variation-aware thread mapping has been recently proposed in [9]. In that work, the main purpose is to maximize performance and it targets loop-intensive applications. However, this approach does not provide a full-fledged solution by not taking energy consumption into account. Moreover, [10] proposes a statistic scheduling approach to mitigate the impact of parameter variations in a multicore platform. The proposed policy is based on static estimation of task execution times and variability information but it does not consider power consumption.
Fig. 1: Automatic Customization Toolchain Chip manufacturing may not only incur inter-core (core-tocore) variability, but also inter-chip variability is equally likely to exhibit. The proposed customization approach of variability awareness promises to handle both occurrences alike, thus providing a unified way for runtime customization for variability awareness without a need for a case-by-case handling of the uncertainty that is dictated by the extreme subtleties of the unltra-compact CMOS manufacturing processes.
In paper [11], the concept of time-constrained variabilityaware task allocation methodology with the objective of minimizing the energy consumption is proposed. The allocation problem was formulated in two sequential steps where the
This paper is organized as follows. Related works from the literature on high level modeling as well as variability
144
The paper in hand presents the innovative approach of automatic runtime customization for variability compensation and energy efficiency. Automatic customization is achieved through creating a high-level model of the target multicore platform to abstract the hardware properties including those that are relevant for the runtime customization. In our study we used a well-known modeling tool as well as a nextgeneration heterogeneous multicore platform to showcase the customization toolflow and highlight the results. However, the customization approach itself remains fully generic and applicable using the majority of modeling toolsuites and hardware architectures.
solution computed by a Linear-Programming (LP) approach was fed into a Bin-Packing (BP) algorithm for final task allocation. That paper targets realtime streaming multimedia applications. Also in scope of streaming applications, [12] focuses on software counter-measures that reshape applications workload to make up for the variability in the underlying multiprocessor fabric. Proposed is a workload allocation policy to compensate for core-level speed and power variations. The focus is multimedia processing, which is typically characterized by application-level frame-rate constraints. In that context, the top-priority goal of variability compensation policies is to meet the real-time constraints imposed by the frame rate of the multimedia stream, while minimizing energy as a secondary objective [12].
To showcase the toolchain and highlight the optimization results achieved through our experiments, we developed the system model using one of the leading commercial tools in the R R Artisan Studio2 . And systems modeling industry, Atego we used a timing-accurate simulator of two interconnected instances of the next generation research platform GENEPY [16], [17] as the target hardware architecture.
Another proposal has been made in [13] of a new formulation of the task allocation problem for variability affected platforms, which manages per-core utilization to achieve a target lifetime while minimizing energy consumption during the execution of rate-constrained multimedia applications. This work devised an adaptive solution that can be applied online and approximates the result of an optimal, offline version. The purpose is to have a runtime that adapts system resource utilization to time-varying and uneven platform degradation, so as to prevent premature chip failure. In this context, task allocation techniques were used to deal with heterogeneous cores and extend chip lifetime while minimizing energy and preserving quality of service.
III.
D ESCRIPTION AND MODELING OF TARGET P LATFORM
For any given application workload, deciding the most efficient division of labor between general purpose processors and specialized digital co-processors requires an understanding of many low-level details that are difficult to ascertain from the software development perspective. So a SysML model has been developed to abstract the variability-implied heterogeneity of the target multicore platform. This model is used to generate an XML representation of the platform to automatically configure the runtime to decide the optimal dispatching or allocation of the workload on the target clusters. That is, the runtime customization solution is meant to be “automated” based on information made available in a modelbased description of the target platform. This section describes the hardware platform GENEPY along with the simulation environment of an extended version of GENEPY, then delves into a relevant power model, and closes by explaining the platform modeling approach.
Most closely related to our approach, variability-aware workload allocation policies for independent task sets are presented in [14]. Two policies are considered, aiming at maximizing performance or minimizing power, with the assumption that voltage scaling is available on a per-core basis (this is not supported in our platform). Moreover [14] assumes that the number of tasks is not larger than the number of cores (in our paper, it is larger). Our results are obtained with a similar version of the policies described in [14], with suitable modifications to suit our system setup. C. From High-Level Model to Runtime - A Top-down Approach
A. Description of Target Platform and Simulator
According to the aforementioned relevant works, there is an obvious gap between top-level modeling of target hardware MPSoCs and variability awareness at the runtime level. In more detail, the center of the methodology is the high level modeling language (UML/SysML)1 that will be used to describe the target platform. High level modeling allows an architectural independent description of the application and therefore enables customization for different architectural templates. From a research perspective, this work is taking a lead in bringing hardware variability information up to the level of system design model and thus closes the aforementioned gap. That is, bringing runtime customization strategy up to the level of system modeling. [15] is a similar research work but the provided results address only the impact on workload distribution and lack performance and energy optimization statistics.
The main purpose of this simulation environment is to provide a virtual prototype of the real heterogeneous hardware platform. The exploitation of this simulator is not limited to the GENEPY chip and enables different options to model a large number of platforms derived from the GENEPY concept. The current architecture of the simulated platform is presented in figure 3. It consists of ten clusters connected by a Network-onChip (NoC). The topology is the same as the one implemented in the GENEPY chip. A cluster is an autonomous block of IP cores. By “autonomous” we mean that a cluster includes its own control system and once it has been booted it can execute tasks in an independent way. Clusters share common characteristics and instantiate similar IPs. The functions of these IPs are the following: control, processing and data storage. Control is presumed to be performed by a GPP processor in order to provide a high level of flexibility. Processing is supported by
1 SysML is specified as a profile (dialect) of the Unified Modeling Language (UMLTM ), the industry standard for modeling software-intensive systems, so SysML is frequently implemented as a plugin for popular UML modeling tools.
2 Atego
145
and Artisan Studio are registered trademarks of PTC, Inc.
a set of DSP cores. Local storage is used both for application data and instructions for DSPs and consists of embedded memories. The clusters also include communication features to exchange data and synchronize themselves through the NoC. The communication mechanisms are handled by a Network Interface (NI) that implements the NoC protocol and decouples the low-level details of the network at application level. A generic overview of a cluster appears in figure 2.
Fig. 3: Target MPSoC Platform Fig. 2: Cluster Architecture whereas MEPHISTO cores can only run in HCE mode. MEPHISTO ISS is expected in a future release.
GENEPY features two different types of clusters, referred to as SMEP and ICYSMEP (figure 3). In scope of this paper, we opted for conducting the empirical experiments on the SMEP clusters, using the MIPS core - the GPP unit - in each cluster. Each SMEP cluster is composed of the following components: •
MIPS core: which can act as a controller and/or a processing element.
•
Two MEPHISTO DSP cores: which support the main processing tasks of the cluster.
•
Network Interface (NI): which connects the SMEP cluster to the NoC.
•
Smart Memory Engine (SME): which contains the main storage memory for the cluster and hardware mechanism for data transfer with MEPHISTO cores and NI.
B. Power Modeling The power model is a transaction-level-modeling (TLM) instrument - provided by the hardware platform manufacturer - that yields dynamic power (Pdyn ) and static power (Psta ) values to the runtime on a periodic basis. Through the Time/Pdyn /Psta statistics provided by this model, the runtime can make more accurate decisions, such as target prioritization for offloading, frequency scaling, clock gating or power gating, toward achieving energy efficiency and variability awareness. The power model works out the dynamic and static power values according to the following procedure: 1)
Regarding the modeling of programmable cores (MIPS and MEPHISTO), the approach adopted for the GENEPY simulator is to provide two modes:
2)
Host Code Execution (HCE): compiling the application for the host machine. That means, in the case where the user has a x86 host architecture for instance, HCE mode is about compiling the code for x86. A library is created from the source code, and is linked dynamically with the platform. The HCE mode allows fast simulation but does not provide accurate timing.
3) 4)
Instruction Set Simulation (ISS): consists of compiling the code with the target architecture compiler. In the ISS mode, the binary code is executed on the simulator. At the time of writing this paper, both HCE and ISS modes are available for MIPS cores,
The model is provided with the following input arguments: an ID of a cluster/core, its running frequency, the power state (active, idle, clock-gated, or powergated), as well as a time snapshot. In case of “active” or “idle” state, the model looks up the input frequency value in a file or an array containing the Freq-Pdyn pairs. Should a match be found, the model registers the required value of Pdyn . In a similar fashion, Psta is obtained from another file or list according to the input power state. In case of “clock-gated” (i.e. standby) state, Pdyn is set to zero, and Psta is to be obtained as in step 2. In case of “power-gated” (i.e. off) state, Pdyn and Psta are both set to zero.
C. System Modeling of Target Platform A target platform model describes a specific heterogeneous multicore platform. SysML can by used to model a heterogeneous multicore platform using the methodology described in
146
Fig. 4: SysML Structural Diagram Fig. 5: SysML Internal Block Diagram [10]. SysML blocks characterize a set of abstract hardware types or components of a heterogeneous multicore platform, while SysML value properties abstract the physical hardware characteristics of that type or component, such as manufactured clock frequency and power consumption. A target platform model describes the hierarchical structure of the heterogeneous multicore platform including details of the underlying processing elements that run the actual code of the benchmark. The hierarchical structure allows individual processing elements and clusters to be assigned values representing their real hardware characteristics. Inter-connections between hardware elements are also modeled to provide information on the intercluster communication topology (NoC). Figure 3 illustrates the architecture of the target platform model.
(depicted as yellow folders). Under the “Target” package there is a component block named after the same name of the parent package “EightMIPSChip” and depicted as a red cube. This block is composed of sixteen parts (eight MIPS processors and eight NoC routers) as well as an internal block diagram named “GENEPY Chip”. That diagram, as in figure 5, shows the topology of the platform. Moreover, the capability attributes, such as frequency and power values of each core, can be set through a properties window dedicated to each element. IV.
RUNTIME L IBRARY C USTOMIZATION
The overall target of the the toolchain is the development of a methodology for automatic customization of the runtime library toward smarter mapping of workload tasks to processing elements and better utilization of the intercluster communication topology on the target platform. Runtime customization is intended to achieve variability awareness for overall performance optimization and energy consumption saving. The runtime library is customized by automatically generating or deriving a hardware description language from the hardware model presented in section III. This language is in XML format and contains the structural and capability information of the target hardware platform. We also refer to it as ”customization language”.
From the modeling perspective, a methodology is proposed for abstracting existing RTL IPs into SysML components. During the abstraction flow, it is possible to set the level of detail to be maintained in SysML, such as hierarchical structure and data types of the IPs, in order to allow designers to choose the level of abstraction to be preserved in the SysML model. The methodology aims at producing SysML models with both structural and behavioral information. In that sense, the target platform model is composed of structural as well as capability information of the hardware architecture. Structural information abstracts the hardware units (the building blocks) and interconnect topology, while capability information presents the variability-impacted attributes relevant for customizing the runtime for variability awareness toward achieving gains in overall performance and energy consumption through smart distribution of the workload over the target processing elements.
The parameters used for customization of the runtime library are those that mainly concern the dynamic decisions that cannot be taken at compile time. The customization methodology supports a class of policies for task allocation and scheduling of tasks on the available cores. These policies assume a general objective of overall performance maximization and energy saving. A number of policies have been discussed in [11] and [18]. These policies differ in terms of approach
R In Artisan Studio , the platform structural model, as in figure 4, consists of a number of packages and sub-packages
147
- between the master core (the task allocator) and the taskdesignated slave core (the destination).
(heuristic, probabilistic, etc.), complexity, and effectiveness. For this work we have opted for the probabilistic frequency ranking policy, yet the other policies can be employed in the same way as part of the runtime library.
Customization opportunities are virtually unlimited thanks to the capabilities of system modeling of the target hardware platforms. Properties can be added to the hardware model to predetermine the availability of cores or processing elements. For instance, a customization decision may be to utilize certain processing element(s) in certain clusters, while the other elements are required to be kept in a standby or clock-gated state (for the sake of energy saving and/or task migration in case of failure of one or more of the main processing elements). Customization also enables a choice between types of target processing elements: GPPs or DSPs, as well as determining the type of target clusters.
The XML customization language is generated by Artisan R Studio , which, besides being used for modeling the target hardware platform in SysML, also features SysML-to-XML transformation. The generated XML language contains values of the hardware-related characteristics, such as per-core clock frequency and power consumption, as appears in figure 6, in addition to NoC delays. However, another model can also be developed for the target application/benchmark as well, where tasks can be mapped to certain clusters or processing elements. This mapping information can be exported to the XML language to add another level of customization for the runtime. That is, to allow task-to-processing-element mapping decisions to override the default task allocation decisions rendered by the alloction algorithm of the runtime.
V.
E XPERIMENTAL S ETUP AND R ESULTS
Matrix multiplication has been preferred as a convenient benchmark since it is representative of many multimedia kernels and easily scales for a wide range of performance testing because the work grows in O(n3 ) for matrices of O(n). Besides, matrix multiplication is inherently parallelizable, as the workload can be easily distributed over the selected target cores as per the decisions of the runtime allocation policy. Experimental simulations were conducted once on the entire eight clusters of the GENEPY platform, and the other time on four clusters only (while the remaining four clusters were not started). And in each of these two cases, three different clock frequency configurations - also known as spreads - have been tested. First: a full spread, i.e. maximum difference, configuration where clock frequencies of the cores are distributed over the whole allowed range of 100 MHz to 400 MHz. Second: a tight spread configuration, where frequencies span a smaller range of 200 MHz to 350 MHz. Last: a hotspot spread, where only a single core features a relatively much higher frequency of 370 MHz, while all the other cores on the other hand show a very low frequency of 120 MHz. However, the first MIPS core, denoted by the ID 00 in figure 3, has been designated as a master core, i.e. responsible only for the frequency ranking policy computation as well as performing task dispatching to the offload targets (the slave cores).
Fig. 6: Snippet of The Generated XML Customization Language
Tailoring or customizing the runtime behavior relies on the selected runtime allocation policy. In scope of this work we target probabilistic frequency ranking which aims at achieving performance and energy optimization compared to other variability-unaware policies such as random allocation or Round-robin. The XML language generated from the SysML model is automatically analyzed - through a custom-built parsing tool - to obtain the clock frequency and power values per each core. The parsing tool makes these values available in a header file that can be included with the runtime library source code in order to be fed into the ranking policy algorithm for computing the allocation decisions at execution time.
The goal behind running such various scenarios is to show that the frequency ranking, i.e. variability-aware, policy outperforms the other common variability-ignorant allocation techniques such as random allocation or Round-robin allocation, regardless of the severity or degree of frequency variability that has been incurred to the chip due to the fabrication processes. In figures 7-a, 7-b, and 7-c, the frequency ranking policy have been tested against a maximum-spread frequency configuration, a tight-spread frequency configuration, and hotspot frequency configuration, respectively. Frequency ranking has shown a relative advantage - in terms of both execution time and energy consumption - over random allocation and Roundrobin allocation policies on three parallel offload destinations.
The workload is divided into an equal-size number of tasks. Per each task, the ranking policy determines which destination or slave core is the most appropriate for the next task assignment. A probability of allocation is associated with each core. This probability is proportional to the speed difference between the cores, in order to achieve overall execution time equalization and overcome the intrinsic variations between operating frequencies. For NoC-awareness, an additional reward/penalty is given depending on the distance - number of hops or routers
Similarly, in figures 8-a, 8-b, and 8-c, frequency ranking continues to outperform variability-unaware allocation policies on three parallel offload destinations. These results prove the fruitfulness of automatic customization for variability awareness and compensation, which can even be amplified with employing more complex allocation policies such as the linear
148
(a)
(a)
(b)
(b)
(c)
(c)
Fig. 7: Frequency Ranking Policy on Seven Parallel Destinations - Different Frequency Configurations
Fig. 8: Frequency Ranking Policy on Three Parallel Destinations - Different Frequency Configurations
VI.
C ONCLUSION
This work has presented the promising prospects of customizing a runtime library in an automated way through introducing the customization information from a high-level SysML model of the target hardware platform. Such information is obtained through characterizations typically conducted after
programming and bin packing approach presented in [11], or with targeting many-core architectures rather than multicore ones. 149
the manufacturing or fabrication process. Customization is meant to provide variability awareness against performance degradation and energy inefficiency. The automatic customization approach is presumably applicable on all target platforms through a broad range of systems modeling and abstraction toolsuites. In addition, automating the customization through a toolchain that starts at a high-level model of the target platform would allow for complexity abstraction and hence more streamlined development, integration, testing, and marketing of embedded solutions. VII.
[6]
[7]
[8]
[9]
F UTURE W ORK
Performance degradation and energy inefficiency due to dynamic, or aging-induced, variability are becoming too influential to be overlooked. On-chip sensors (for instance, temperature sensors) can be employed to monitor and get information about a cluster current state with the possibility of taking corrective actions through actuators (for instance, change local frequency). In fact, dynamic variability awareness presents a high-degree of complexity not only due to the dynamic nature of variability that is caused by aging and typical wear of the chip elements, but also as a result of the necessity to perform task migration or re-allocation in order to achieve the maximum utilization of the chip toward the highest performance and lowest energy consumption possible, in addition to preventing potential failures that may be critical. Automatic runtime customization for dynamic variability awareness is being considered as a possible future extension to the work of this paper.
[10]
[11]
[12]
[13]
[14]
R EFERENCES [1]
[2] [3]
[4]
[5]
Ke Meng; Huebbers, F.; Joseph, R.; Ismail, Y., “Modeling and Characterizing Power Variability in Multicore Architectures,” Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on , vol., no., pp.146,157, 25-27 April 2007, doi: 10.1109/ISPASS.2007.363745. ITRS, 2007, http://public.itrs.net. Sartori, J.; Pant, A.; Kumar, R.; Gupta, P., “Variation-aware speed binning of multi-core processors,” Quality Electronic Design (ISQED), 2010 11th International Symposium on , vol., no., pp.307,314, 22-24 March 2010, doi: 10.1109/ISQED.2010.5450442. Gauthier, L.; Gray, I.; Larkham, A.; Ayad, G.; Acquaviva, A.; Nilsen, K., “Explicit Java control of low-power heterogeneous parallel processing in the ToucHMore project,” JTRES ’13 Proceedings of the 11th International Workshop on Java Technologies for Real-time and Embedded Systems, Pages 68-77, doi: 10.1145/2512989.2513001. SysML.org: SysML Open Source Specification Project, www.sysml.org.
[15]
[16]
[17]
[18]
150
Dekeyser, J.; Ben Atitallah, R.; Gamati, A.; Boulet, P.; Etien, A., “Using the UML Profil for MARTE to MPSoC Co-Design”, 1st International Conference on Embedded Systems and Critical Applications ICESCA’08, Tunis - Tunisia, May 2008. Beltrame, G.; Fossati, L.; Sciuto, D., “High-Level Modeling and Exploration of Reconfigurable MPSoCs,” Adaptive Hardware and Systems, 2008. AHS ’08. NASA/ESA Conference on, vol., no., pp.330-337, 22-25 June 2008, doi: 10.1109/AHS.2008.15. D. Marculescu and E. Talpes, “Variability and Energy Awareness: A Microarchitecture-Level Perspective” Dept. of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA. Hong, S.; Narayanan, S.H.K.; Kandemir, M.; Ozturk, O., “Process variation aware thread mapping for Chip Multiprocessors,” Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE ’09. , vol., no., pp.821,826, 20-24 April 2009, doi: 10.1109/DATE.2009.5090776. Feng Wang; Nicopoulos, C.; Xiaoxia Wu; Yuan Xie; Vijaykrishnan, N., ”Variation-aware task allocation and scheduling for MPSoC,” ComputerAided Design, 2007. ICCAD 2007. IEEE/ACM International Conference on , vol., no., pp.598,603, 4-8 Nov. 2007, doi: 10.1109/ICCAD.2007.4397330. Paterna, F.; Benini, L.; Acquaviva, A.; Papariello, F.; Desoli, G.; , “Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints,” Embedded Systems for Real-Time Multimedia, 2009. ESTIMedia 2009. IEEE/ACM/IFIP 7th Workshop on, vol., no., pp.134-142, 15-16 Oct. 2009, doi: 10.1109/ESTMED.2009.5336824. Paterna, F.; Acquaviva, A.; Caprara, A.; Papariello, F.; Desoli, G.; Benini, L.; , “Variability-Aware Task Allocation for Energy-Efficient Quality of Service Provisioning in Embedded Streaming Multimedia Applications,” Computers, IEEE Transactions on, vol.61, no.7, pp.939953, July 2012, doi: 10.1109/TC.2011.127. Paterna, F.; Acquaviva, A.; Benini, L., ”Aging-Aware Energy-Efficient Workload Allocation for Mobile Multimedia Platforms”, IEEE Transactions on Parallel & Distributed Systems, vol.24, no. 8, pp. 1489-1499, Aug. 2013, doi:10.1109/TPDS.2012.256. R. Teodorescu and J. Torrellas, “Variation-aware application scheduling and power management for chip multiprocessors,” SIGARCH Comput. Archit. News, vol. 36, no. 3, pp. 363374, 2008, doi: 10.1109/ISCA.2008.40. Ayad, G.; Acquaviva, A.; Macii, E.; Sahbi, B.; Lemaire, R., “HW-SW integration for energy-efficient/variability-aware computing,” Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013 , vol., no., pp.607,611, 18-22 March 2013, doi: 10.7873/DATE.2013.133. Lemaire, R.; Thuries, S.; Heiztmann, F., “A flexible modeling environment for a NoC-based multicore architecture,” High Level Design Validation and Test Workshop (HLDVT), 2012 IEEE International , vol., no., pp.140,147, 9-10 Nov. 2012. Nagel, J.; Lemaire, R.; Thuries, S.; Morgan, M.; Bertrand, F.; Piguet, C., GENEPY Heterogeneous Multiprocessor Platform, CSEM Scientific and Technical Report 2012. Tiwari, A. and Torrellas, J., “Facelift: Hiding and Slowing Down Aging in Multicores,” Proc. IEEE/ACM Intl Symp. Microarchitecture, pp. 129140, 2008, doi: Facelift: Hiding and Slowing Down Aging in Multicores.