to system design in CMP's, these systems also offer a large number of ... a complex function that given some input will
Using Predictive Modeling for Cross-Program Design Space Exploration in Multicore Systems∗ Salman Khan, Polychronis Xekalakis†, John Cavazos, and Marcelo Cintra School of Informatics University of Edinburgh {salman.khan, p.xekalakis}@ed.ac.uk, {jcavazos, mc}@inf.ed.ac.uk
Abstract The vast number of transistors available through modern fabrication technology gives architects an unprecedented amount of freedom in chip-multiprocessor (CMP) designs. However, such freedom translates into a design space that is impossible to fully, or even partially to any significant fraction, explore through detailed simulation. In this paper we propose to address this problem using predictive modeling, a well-known machine learning technique. More specifically we build models that, given only a minute fraction of the design space, are able to accurately predict the behavior of the remaining designs orders of magnitude faster than simulating them. In contrast to previous work, our models can predict performance metrics not only for unseen CMP configurations for a given application, but also for unseen configurations of a new application that was not in the set of applications used to build the model, given only a very small number of results for this new application. We perform extensive experiments to show the efficacy of the technique for exploring the design space of CMP’s running parallel applications. The technique is used to predict both energy-delay and execution time. Choosing both explicitly parallel applications and applications that are parallelized using the thread-level speculation (TLS) approach, we evaluate performance on a CMP design space with about 95 million points using 18 benchmarks with up to 1000 training points each. For predicting the energydelay metric, prediction errors for unseen configurations of the same application range from 2.4% to 4.6% and for configurations of new applications from 3.1% to 4.9%.
∗ This work was supported in part by the EC under grant HiPEAC IST004408. † The author was supported in part by a Wolfson Microelectronics scholarship.
1 INTRODUCTION When designing a modern chip-multiprocessor (CMP) the architect is faced with choosing the values for a myriad of system and microarchitectural parameters (e.g., number of cores, issue width of each core, cache sizes, branch predictor type and size, etc.). Simulation is the tool of choice for evaluating each design point in this large design space. Unfortunately, the complexity of modern cores and CMP’s, together with the complexity of modern workloads, has increased the time required for accurate simulation of the design space beyond practical levels. Alternative approaches such as analytical modeling, simplified kernel workloads, and reduced input sets are increasingly undesirable as they cannot accurately capture the intricacies of realistic architectures, workloads, and input sets. Recently, statistical sampling has received much attention [19, 25]. These techniques have been shown to cut simulation time by 60 times in many cases with error rates of around 3%. Whether such techniques will keep pace with the increased complexity of future architectures and workloads is an open question. In this paper we explore a different and complementary approach to reducing the time required to explore the large design space of CMP’s running a variety of applications: predictive modeling. Predictive modeling using statistical machine learning (e.g., regression and multilayer neural networks) has recently received much attention [7, 8, 11]. The key idea is to build models that once trained for a given program on a number of design points (configurations) can predict the behavior of different configurations. Constructing a model requires simulation results for only a small fraction of the entire configuration space. Both model creation and using the models to predict the performance of new configurations takes orders of magnitude less time than simulating these configurations. Thus, these techniques can cut down the time needed to explore a large design space by several
orders of magnitude. Unfortunately, the simulations required for constructing predictive models are still expensive and previous work requires a set of simulations for every application [7, 8, 11]. In this paper, we take this technique a step further to cut down model creation time. We show that it is possible to create a model using only a subset of the applications of interest and then use this model to accurately predict the behavior of new configurations for new unseen applications. This is achieved with a technique we term reaction-based characterization, that requires only a very small number of simulations for every new application not in the training set. The key intuition is that when given such a new application the model tries to automatically match this application’s behavior against that of applications with similar behavior in the training set. If such an application exists then the model can accurately predict the behavior of the new application without having to pay the cost of the model creation for the application. The idea is depicted in Figure 1. Figure 1(a) shows the case where no predictive modeling is used and the designer must simulate all the configurations for all the applications of interest. Figure 1(b) shows what happens when predictive modeling is used as in previous work. In this case, only a small fraction of the configurations must be simulated (0 < α