influence of parameters and describe regions of the design space with high ..... {Da,Db,Dc,...,Dn} â D. Decision trees are typically complete and conflict-free, so.
Boosting Design Space Explorations with Existing or Automatically Learned Knowledge Ralf Jahr1 , Horia Calborean2? , Lucian Vintan2 , and Theo Ungerer1 1
2
Institute of Computer Science, University of Augsburg, 86135 Augsburg, Germany “Lucian Blaga” University of Sibiu, Computer Science & Engineering Department, E. Cioran Str., No. 4, Sibiu - 550025, Romania
Abstract. During development, processor architectures can be tuned and configured by many different parameters. For benchmarking, automatic design space explorations (DSEs) with heuristic algorithms are a helpful approach to find the best settings for these parameters according to multiple objectives, e.g. performance, energy consumption, or real-time constraints. But if the setup is slightly changed and a new DSE has to be performed, it will start from scratch, resulting in very long evaluation times. To reduce the evaluation times we extend the NSGA-II algorithm in this article, such that automatic DSEs can be supported with a set of transformation rules defined in a highly readable format, the fuzzy control language (FCL). Rules can be specified by an engineer, thereby representing existing knowledge. Beyond this, a decision tree classifying high-quality configurations can be constructed automatically and translated into transformation rules. These can also be seen as very valuable result of a DSE because they allow drawing conclusions on the influence of parameters and describe regions of the design space with high density of good configurations. Our evaluations show that automatically generated decision trees can classify near optimal configurations for the hardware parameters of the Grid ALU Processor (GAP) and M-Sim 2. Further evaluations show that automatically constructed transformation rules can reduce the number of evaluations required to reach the same quality of results as without rules by 43%, leading to a significant saving of time of about 25%. In the demonstrated example using rules also leads to better results.
1
Introduction
Processor architectures can be influenced by many parameters as long as tape-out has not been completed (i.e. architectural design is not final). For benchmarking and during development of a processor it is important to find the best settings for these parameters in a given situation or environment. Due to the size of the design space and the long time necessary for a single simulation, an exhaustive search in the design space is impossible. Instead, heuristic algorithms like NSGA-II [8], SPEA2 [30] or SMPSO [19] can be applied to explore the design space carefully and lead to near-optimal configurations in reasonable time. The genetic algorithm NSGA-II runs with populations of ?
Horia Calborean was supported by POSDRU financing contract POSDRU 7706
many individuals (e.g. 50 or 100) and tries to improve their quality over time. Each individual represents a configuration. Usually random individuals are selected for the first generation. The main result of a DSE is basically a set with the best individuals found during the exploration. They approximate the best possible configurations and hence show values for the objective functions very close to the optimum. Nevertheless, they do not allow conclusions to be drawn concerning the values of parameters probably leading to high-quality configurations. But this information would be very helpful in order to understand the influence of parameters and their values. To tackle this, an approach is presented to automatically calculate a decision tree with machine learning techniques and translate the tree into understandable rules describing subsets of the total design space with a high density for configurations of high quality. Multiple design space explorations (DSEs) are often necessary to find the best configurations, e.g. for specific application domains or in setups with small changes. Common algorithms always start from ground up, so they cannot gain any profit from the intuition of an engineer or conclusions acquired from previous DSEs. To overcome this, we present an extended version of the Framework for Automatic Design Space Explorations (FADSE [5, 6, 13]), which can incorporate domain knowledge represented by transformation rules expressed as fuzzy rules in the DSE process [3]. These rules specify how to transform individuals to move them probably closer to the Pareto front consisting of optimal individuals. They can be specified by the engineer performing the exploration to describe his knowledge or intuition about the explored system, hence hopefully accelerating the process of the DSE by avoiding the evaluation of configurations of obviously low quality. As alternative, we show how the rules derived from a decision tree calculated automatically based on the results of a prior DSE can be translated into transformation rules, hence creating a feedback-loop to gain profit from a prior DSE. The main advantages of the presented approach are that (a) the rules calculated from the results of a DSE allow to draw conclusions about values for parameters with high probability to result in high-quality configurations, (b) DSEs can be accelerated with rules calculated from results of a prior similar DSs, (c) DSEs can be sped up with rules representing knowledge of the person running the DSE, and (d) all types of rules either calculated automatically or to be specified by an engineer are highly readable and can thus be understood and specified easily. Section 2 introduces related approaches using machine learning techniques and knowledge in DSEs. Modeling, acquiring and using of knowledge in FADSE is described in Section 3. The potential of the approach is evaluated in Section 4. The paper is concluded in Section 5.
2
Related Work
In difference to many publications on design space exploration and its acceleration, as introduced in this section, our approach makes use of (a) existing knowledge in a (b) readable notation and (c) NSGA-II as algorithm, which is known to work very well in most situations because of a good trade-off between simulation time and quality
of the results [4]. The aim is to choose individuals of the design space in a way that evaluating them leads, with a higher probability, to an improvement of the approximated Pareto front. In comparison to not using existing knowledge, fewer individuals have to be evaluated to get an approximation of the Pareto front of the same quality. Related to the work presented in this article are all approaches using knowledge, often represented as meta-model, to speed up DSEs. To achieve this, machine learning techniques can be used to estimate the location of individuals (in the design space) that will be close to the Pareto front (objective space). Based on NSGA-II, Mariani et al. [16, 18] use an Artificial Neural Network (ANN) to predict the quality of individuals, which is then used to decide if they should be simulated or not. ANNs are also integrated into a multi-level model [17]. The link between the parameters of ANNs and understandable facts is typically hard to establish, hence they are not useful as interface for engineers to specify facts. Predictive models in a more general matter are also used by Ozisikyilmaz et al. [20] to accelerate DSEs. Mariani et al. [15] apply statistical methods to select promising candidates for next evaluations on the fly by predicting “the expected improvement of unknown configurations”, too. Although not naming it, Ozisikyilmaz and Mariani try to approximate a model for the response surface, which is also performed by Cook and Skadron [7]. Other work introducing alternative algorithms for design space explorations has been presented by e.g. Sengupta et al. [24, 29], Beltrame et al. [2], Ascia et al. [1], and Palermo et al. [21]. These algorithms do no incorporate existing knowledge, too. It could rather be possible to extend them with the ideas of this article. To our knowledge, there is no approach offering a possibility to specify existing knowledge or even a possibility to describe the set of Pareto optimal individuals in a readable form.
3
Description and Integration of Knowledge in a Design Space Exploration
After describing the basic concepts of design space explorations (Section 3.1) it is explained how to model (Section 3.2) and integrate knowledge in the DSE process with the algorithm NSGA-II (Section 3.3). Finally a way to automatically acquire knowledge is introduced (Section 3.4). 3.1
Basic Concepts
The aim of a DSE is to find the best points (equivalent to configurations) in the so-called parameter space P ⊂ Zn+1 , i.e. configurations consisting of values for all parameters p = (p0 , ..., pn ). They are evaluated with one or more objective functions, resulting in a point o = (o0 , ..., om ) in the objective space O ⊂ Rm+1 . Evaluating a point in the parameter p space can be described as projection f (p) := P → O into the objective space: (p0 , ..., pn ) → (o0 , ..., om )
The dominance relationship defines an order for configurations; A configuration i ∈ P dominates another configuration j ∈ P if all values of the objectives for i are the same or better and for at least one definitely better than those of j. The true Pareto front is defined as the optimal set consisting of all non-dominated individuals. It is approximated during the DSE by the set of known non-dominated individuals, which is called approximated Pareto front.
3.2
Modeling Knowledge
Engineers running a design space exploration (DSE) typically have a rough idea of the area in the parameter space where they expect to find high-quality configurations, for example: “If the width of the processor front-end, the number of execution units and the number of write back units is medium then the processor should have a medium number of commit units.” In more general words, a single parameter shall be set to a value from a defined subset of its domain if other parameters have values from specific regions of their domains. Such conditional statements are typically described as fuzzy rules, which make use of so-called linguistic expressions like small, medium, etc. Fuzzy rules can be written in Fuzzy Control Language (FCL) [12], a domain-specific language. Due to the clear structure and high readability of this language, as well as the major impact on reducing execution time, it can, from our point of view, be expected from an engineer to specify his knowledge in this form. A FCL file specifies inputs and outputs, which are the parameters used in the design space exploration. For them, all used intervals, e.g. low, medium and high are described. They can then be used to formulate rules like the following: IF frontend_width IS medium AND exec_units IS medium AND wbk_units IS medium THEN commit_units IS medium
In the DSE process the rule is evaluated by an engine for fuzzy rules, e.g. the library jFuzzyLogic.As first step, the actual values for the three parameters in the above example are set as inputs. Next they are fuzzified; this means that their class is calculated. For this task a membership function is used, because – as we are in fuzzy logic – it can happen that a value is not fully attributed to a class but e.g. 40% to one class and 60% to another class when being on the triangular areas of the trapezoid describing the classes (example in Fig. 1). After evaluating all rules the values of the membership functions for the output values are calculated and with a defuzzification method distinct values are determined.
Membership
1 low medium
0,5
high 0 1
2
3
4
5
6
7
8
9
10
Processor Frontend Width
Fig. 1. Trapezoid areas describing exemplary fuzzy classes Mutation
Initial Population
Crossover
Current Population
Mutation
Offspring Combination & Selection for next generation
Fig. 2. Sketch of an extended version of NSGA-II with additional mutation operator
3.3
Using Knowledge during the DSE Process
We are running FADSE with NSGA-II as DSE algorithm. Its coarse structure with an additional mutation step for the initial population, which will be explained later3 , is displayed in Figure 2. To avoid generating individuals that we know from our experience are likely to have low quality, knowledge can be helpful in two situations: (a) to generate the initial population and (b) when mutating individuals. If the initial population already contains individuals that are supposed to be closer to the Pareto front, then the algorithm converges more quickly. Whenever a parameter of an individual is mutated, which happens with a defined probability β, then a new value is chosen from anywhere in the parameter’s domain. Here knowledge can help to select a probably better value. Knowledge can be expressed in FADSE as transformation rules modeled as fuzzy rules. They are integrated into the mutation operator. Instead of using an initial population generated by random as starting point for the DSE, a random generation is first processed by the mutation operator (see Fig. 2). This increases the number of individuals supposed to be close to the Pareto front. When integrating knowledge in the mutation process, one has to keep in mind that the algorithm should not be restricted but supported by the existing knowledge. Hence the influence of the knowledge should be higher in the first generations than at the end of the design space exploration. Therefore in Listing 1.1, which describes a possible knowledge-supported mutation operator for NSGA-II, the knowledge-supported new value for a parameter is used only with a probability αgeneration dependent of the generation count. The mutation operator is also influenced by the standard mutation probability β, typically set to 1/#parameters, e.g. 61 ≈ 0.17 for six parameters. To calculate αgeneration , a Gaussian distribution starting at 0.8 for the first individual and converg3
Nevertheless, randomly changing random values does not change anything; the initial population stays random.
Probability
1 0,5 0 0
1
2
3
4
5
6
7
8
9
10
Generation
Fig. 3. Example for Gaussian distribution modeling the influence of the rules in the DSE process foreach ( Parameter p) if ( rand (0.0 ,1.0) < αgeneration ) # Knowledge ? // If a value is available if ( knowledge_value (p )) para . value = knowledge_value (p ); else if ( rand (0.0 , 1.0) < β) # Random ? para . value = random_value (p ); Listing 1.1. Mutation operator for NSGA-II able to handle existing knowledge
ing towards β is used. The value β is reached after about 500 individuals or, with 50 individuals per generation, after 10 generations (see Figure 3 for an example). When fuzzy rules are matched, the results are typically intervals – to be more precise: a trapezoid with intervals as width and membership function as height (Figure 1) – and with a specific method a single value is calculated for further use. This process is called defuzzification. One of the more popular defuzzifiers is Center of Gravity (COG) but many more methods exist [23]. Beyond this, there is a difference in usage between the way we use fuzzy rules and the way they are typically used. Normally all input values are set, the rules are triggered and all output parameters are defuzzified and written back. In our mutation operator we work iteratively (respecting the bit-flip mutation operator implementation). For each parameter we set all yet available input values, trigger the rules, but write back only a single parameter, then move to the next one. It is vital to sustain diversity of the individuals, the algorithms shall not be limited by generating very similar individuals on and on. An increased risk of getting stuck in a local minimum would be the consequence. This issue was taken into account by (a) using a Gaussian distribution for the probability of applying the transformation rules, hence reducing the influence of knowledge over time, and (b) because only a fraction of all possible individuals can be matched and thus transformed by fuzzy rules. This also has to be kept in mind when designing them. 3.4
Acquiring Knowledge through Data Mining Techniques
In this section, it is assumed that knowledge is not available explicitly but through a previous similar DSE. Actually, these results contain information about where to find high-quality individuals in the design space, but this knowledge is only represented by data points and not as facts or rules, so it is only implicitly available. Data mining
techniques can be applied for this transformation. The goal is to find rules describing how to change individuals to improve their quality. It is an evident approach to measure the quality of an individual by the distance to the true Pareto front. Because the true Pareto front is unknown, instead its approximation calculated from the available results is used. In areas with very little or very high slope, the minimum distance between two of these points can be relatively high. Hence additional imaginary points are calculated with e.g. linear interpolation and added the approximation of the Pareto curve to close gaps. The quality of an individual i is then defined by calculating the minimum weighted distance in the objective space to an individual j on the approximated Pareto curve; the weighted distance d(i, j) is defined for i and j with objective values f (i) = a, f ( j) = b and s ∈ {0, ..., m} as follows: v u s u as − bs 2 where i, j ∈ P; a, b ∈ O; f (i) = a; f ( j) = b d(i, j) = t ∑ ∆s {0,...,m} ∆s is the difference of the maximal and minimal value that points of the approximated Pareto front show for objective s; dividing by ∆s normalizes the objectives. Individuals within a maximal distance ε to the approximated Pareto front are called perfect; they are the candidates to be described by rules. This created two classes of individuals, perfect ones and all the others, named good. In our evaluations we selected ε so that about one third of the total number of individuals is rated as perfect. A model is needed to decide a priori, so before calculating the real values of its objectives, if it will be a perfect or a good individual. This is a classification task, a quite common situation in which machine learning techniques can be applied. The following paragraphs describe how rules are generated from an automatically constructed decision tree. Other approaches replacing decision trees for classification should be possible, too. For our evaluations we use the data mining tool WEKA [11]. First a reduction of the complexity of the parameter space is done by selecting only the most influential parameters with the CFS attribute subset evaluator [10]. Second, a decision tree is built with the algorithm C4.5 by Quinlan [22]. A decision tree is a binary tree with nodes end edges. Nodes without subsequent edges are called leafs and labeled with either perfect or good. A hierarchical set of decisions is mapped to all other nodes having two outgoing edges. A decision D j ∈ D is of the form pi ≤ a or pi > b where pi is a parameter pi ∈ {p0 , ..., pn } of an individual and a and b are numeric values. The path Wk from the root of the tree to a leaf k, which is labeled perfect or good, can be described as Wk = hDa , Db , Dc , ..., Dn i with {Da , Db , Dc , ..., Dn } ⊂ D . Decision trees are typically complete and conflict-free, so every individual can be classified and the sets of individuals classified as perfect and good are disjoint. To calculate the class for an individual the tree is traversed from top to bottom evaluating all the decisions on the nodes and following the edges corresponding to the result of the decision. Finally a leaf is reached and the individual is classified as element of the corresponding class. If an individual can be evaluated with the leaf k, then all decisions on the path Wk can be interpreted as conditions restricting the parameter space. So the conjunction of all conditions on Wk , i.e. Da ∧ Db ∧ Dc ∧ ... ∧ Dn , must be true.
Conditions in transformation rules can have a lower and an upper bound, so conditions on a path in the decision tree can be compacted often (e.g. pi ≤ 9 ∧ pi ≤ 12 ∧ pi > 4 ⇒ 4 < pi ≤ 9). The resulting decision tree has height smaller or equal to the number of parameters. The basic idea to gain a transformation rule R from conditions is to pick one of the conditions Ci from the path W and use it as consequence of the transformation rule R: R := Ca ∧Cb ∧Cc ∧ ... ∧Cm → Ci where {Ca ,Cb ,Cc , ...,Cm } = W \Ci This means that if all conditions of the path W from the root of the decision tree to the leaf except Ci are true, then the parameters of the individual are set in a way that Ci is true, too. Such a rule can be generated for all Ci ∈ W . As on each path there are at most as many decisions as parameters (i.e. n) in the individual, at most n transformation rules are created for each perfect leaf of the decision tree. Although these rules have sharp conditions, which is because the common relations ≤ and < are used, they can easily be written as fuzzy rules. They however do not use fuzzy interval borders, which is in contrast to the rules specified manually by an engineer, so the membership function for their conditions is always 0 or 1. If the same fuzzy rule matches for several individuals, all of them will – if COG or a similar method is used – in the end have the same value set for one or more parameters because the membership function has only very high (close to 1) or very low values (close to 0). This is in contradiction to the purpose of mutation. Hence we developed a new random defuzzifier, which picks a random value from regions with high membership values in order to sustain diversity. When evaluating an individual, it will match one of the transformation rules only if at most one of the individual’s parameters is not in the interval described by the rule. To increase the probability of matching a transformation rule, additional rules can be introduced to change a parameter of an individual which is in an interval that is not covered by any rule to a value from the complementary interval. As example, imagine the domain of a parameter ]0; 8] being covered by rules in ]0; 4]. To deal with individuals with pa ∈]4, 8], a rule is introduced to move the value of pa of an individual to the covered region ]0; 4]. This increases, as we were able to show in some experiments, the probability of matching a rule and leads to a higher number of perfect individuals generated with the rule set (in our case about 10% more perfect unique individuals, see Section 4.1). To summarize, we use an automatically constructed decision tree based on the data of a previous DSE to decide for an individual without evaluating it, if it will be close to the approximated Pareto front or not. Based on these decision rules we construct a set of transformation rules described by fuzzy rules for further use in the DSE process.
4
Evaluation
To show the effectiveness of adopting existing knowledge during a DSE process we focus on the Grid ALU Processor (GAP), a novel reconfigurable processor architecture for the execution of sequential instruction streams. A superscalar in-order frontend loads instructions. The GAP-specific configuration unit maps them dynamically onto a
Description p0 Rows p1 Columns p2 Layers
Domain {4, 5, 6, ..., 32} {4, 5, 6, ..., 31} {1, 2, 4, ..., 64}
Description p3 Line size p4 Sets p5 Lines per set
Domain {4, 8, 16} {32, 64, 128, ..., 8192} {1, 2, 4, ..., 128}
Table 1. Parameter space for GAP’s array (left) and instruction cache (right)
three-dimensional array of functional units (FUs). The array is organized in columns, rows, and configuration layers. Detailed information about the processor architecture is given by Uhrig et al. [27] and Shehan et al. [25]. The following three subsections cover different independent scenarios and evaluate the benefit of automatically generated rules. 4.1
Automatically Developing Rules
The goal of this subsection is to show the effectiveness of automatically generated rules in describing the set of individuals close to the Pareto front (see Section 3.4). Basic data is the data gathered from the DSE of the hardware parameters of the Grid ALU Processor (GAP) described in [13]. It comprises 2833 individuals (out of which 2385 are unique) with six parameters (see Table 1) and values for two objectives, i.e. hardware complexity [13] and performance measured as clocks per instruction (CPI). Individuals with a weighted distance smaller than ε = 0.018 to their closest individual of the approximated Pareto front (or an imaginary individual calculated with linear interpolation) are classified as perfect; this is 994 or about 35% of all evaluated individuals. As described earlier (Section 3.4) we use the well-known data mining tool WEKA [11] for our studies. The parameters p0 and p3 are eliminated by the complexity reduction with the CFS attribute subset evaluator [10]. The decision tree constructed with C4.5 [22] is a tree with 28 leaves, out of which 10 classify perfect individuals. The algorithms for complexity reduction and constructing the decision tree are state of the art and make high use of entropy. Applying 10-fold cross-validation, the tree is able to classify 81.4% of all individuals correctly (average weighted F-Measure 0.816, for perfect 0.742 and for good 0.855). This result is good enough for our intentions. One should keep in mind that the rules should only point at the regions of the parameter space where perfect individuals can be found and not describe them sharply. The following table shows the confusion matrix; it can be concluded that the classification of individuals works well. Perfect Good ⇐ Classified as... Perfect 754 240 Good 285 1554 The complexity of the tree can be reduced further by sacrificing a bit of precision, i.e. by cutting away leafs which classify only a very small number of individuals. In our case 64 classifications are lost by selecting only the five most important leafs. The next step is to compact and sort the conditions on the paths from the root to the selected leafs. The paths to the five most important perfect leafs look as shown in Figure 4.
(p1 ≤ 20) ∧ (3 < p2 ) ∧ (28 < p4 ) ∧ (p5 ≤ 24 ) ⇒ (652/150) (p1 ≤ 20) ∧ (5 < p2 ) ∧ (211 < p4 ) ∧ (p5 ≤ 21 ) ⇒ (153/41) (p1 ≤ 20) ∧ (3 < p2 ) ∧ (27 < p4 ≤ 28 ) ∧ (21 < p5 ≤ 24 ) ⇒ (81/15) (p1 ≤ 8) ∧ (p2 ≤ 3) ∧ (27 < p4 ≤ 210 ) ∧ (20 < p5 ≤ 21 ) ⇒ (70/30) (11 < p1 ≤ 18) ∧ (3 < p2 ≤ 5) ∧ (211 < p4 ) ∧ (p5 ≤ 22 ) ⇒ (34/9) Fig. 4. Paths from the root of the decision tree to the five most important leafs describing perfect individuals. The numbers in brackets are the number of classified and the number of incorrectly classified individuals. 0,8
Clocks per Instruction (CPI)
Perfect => Good Good => Perfect
0,75
Perfect => Perfect Good => Good
0,7
0,65
0,6
0,55
0,5 0
500
1000
1500
2000
2500
3000
Hardware Complexity
Fig. 5. Classification of the configurations evaluated in the DSE with five rules, only a part of the objective space is displayed
For a graphical demonstration all configurations evaluated during the DSE of GAP’s hardware parameters have been classified with the conditions described by the five most important paths. Figure 5 shows the results, where e.g. Perfect ⇒ Good describes perfect configurations classified as good, so they are classified incorrectly. In the most important regions, i.e. very close to the Pareto front and with complexity below 1200, there are nearly no incorrectly classified individuals. Far away from the Pareto front the density of correctly classified individuals is high, too. Although there is a region close to the Pareto front with configurations incorrectly classified as perfect this should not cause problems in the DSE because they help sustaining diversity. Also their number is comparably small as described by the confusion matrix. The two clusters with perfect configurations classified as good could be caused by the picking only five rules to classify perfect individuals. Based on Section 3.4, it is easy to translate these rules into transformation rules and to represent them as fuzzy rules; 20 of them are generated. Three additional rules are introduced to provide full coverage of all values of the domains of parameters p1 , p4 , and p5 .
To evaluate the rule set random individuals are generated and evaluated with different probabilities to apply the fuzzy rules α. 50 iterations with 1000 unique individuals were performed, then the average values for the ratio of unique perfect generated individuals are calculated. This measure is used because it is (a) not helpful in the DSE process if the rules generate the same individuals again and again (little diversity) and (b) still a high ratio of generated perfect individuals is desired. If knowledge is totally ignored (α = 0), the ratio is about 11%. This value increases linearly to 51% if the parameter values calculated with knowledge are used always (α = 1). The influence of the rules is reduced as expected with reducing the probability to use values generated by them. The ratio of unique and perfect individuals is high enough to have a significant impact on the DSE process and low enough to sustain diversity. Apart from the GAP we also tried to find transformation rules for M-Sim 2 [14]; clocks per instruction (CPI) and energy consumption are used as objectives. One third of all individuals is defined as perfect. The number of parameters was reduced from 30 to 9. The decision tree has 72 leafs, 32 are labeled perfect. It classifies 74% of all individuals correctly and 10 rules are enough to describe more than 85% of the individuals classified as perfect by the decision tree (F-measure for perfect is 0.541, for good 0.814). So as conclusion, the proposed approach works well for M-Sim 2, too. 4.2
Accelerated general DSE of GAP’s Hardware Parameters
Because a DSE with a representative number of benchmarks needs much time we propose to run a special case, i.e. a DSE with a single benchmark, gain rules representing the outcome of this DSE automatically, and then run the general DSE supported by these rules. We picked stringsearch, one of the smallest from the MiBench Benchmark Suite [9] in terms of execution time. A DSE was performed with it and from the 1100 unique results a set of four rules is automatically gained as described in Section 3.4. Supported by these rules, the DSE process is then run with 10 benchmarks for five times; as reference we also run DSEs without support by rules. Figure 6 shows the average hypervolumes for five runs with and without rules in relation to the number of evaluated unique individuals. The hypervolume, also called hyperarea, is the region between the so-called hypervolume reference point and the approximation of the Pareto front. The higher the hypervolume is, the better is the found approximation of the Pareto front. But neither its numerical value can be interpreted nor its optimum can be defined. For both tasks (a) the true Pareto front, which is per-se unknown, and (b) a very deep understanding of the evaluated problem, which would even supersede a DSE, would be requirements. After two generations, the runs supported with transformation rules show better results until the exploration is aborted after 40 generations. The quality of the results gained with rule-support can never be reached without rules. Typically a DSE can be aborted if the hypervolume stops showing progress for some generations. For the run without rules this is the case after 871 configurations with a 5 generation threshold; the hypervolume is nearly the same as after 780 configurations or 24 generations. If the automatically calculated rules are used the same value for the hypervolume is reached after 423 evaluated configurations or 11 generations. As
conclusion, in our setup approx. 46% fewer evaluations are necessary to reach the same value for the hypervolume with rule-support. Concerning the time necessary to (a) run the simple DSE, calculate rules, and perform the complex DSE in comparison to (b) just running the complex DSE, one always has to keep in mind that with rules results of a superior quality can be reached. So, even running the DSE without rules twice as long will not lead to comparable results. The simple DSE finished in about 2 hours, calculating the rules as this is a semi-automated task took only about 15 minutes. Without rules, an evaluation took on average 18 hours, with rules it was 11 hours to gain the same hypervolume (durations are given as an example to understand the difference in time between tasks). So the reduction is approx. 4:45 hours, which is about a quarter.4 Nevertheless, as the rule-supported exploration still shows progress and reaches a higher level of the hypervolume, one should let it run longer and gain profit from the improved results. 0.818 0.817 423
Average Hypervolume
0.816
780
871
0.815 0.814 0.813 0.812 0.811 0.810
Default
With Rules
0.809 0
200
400
600
800
1000
1200
Evaluated Unique Individuals
Fig. 6. Average hypervolume for five DSEs of the hardware parameters of GAP in relation to the number of unique individuals which have to be evaluated. The data points represent the hypervolume gained after completing a generation starting with the first generation on the very left.
Our results show a speedup of progress of the DSE of ca. 50% when using the results of this DSE. Because running a DSE with a single benchmark does not consume much time, the break-even point in terms of duration and result quality can be reached easily. Moreover, using the rules from the previous run leads to even better results. 4.3
Specializing the GAP for Application Domains
In recent years IP cores got more and more important. They can be configured with optimal parameters for the aimed application domain. A DSE can be used to find opti4
As the explorations with and without rules were run in parallel and with the same number of cores the interaction of the runs also caused by results stored in the database (see [13]) is assumed to be little.
mal parameters and the vendor could support this by providing a set of rules describing optimal configurations in typical environments. In a case study the GAP was configured for (a) encoding/decoding images with JPEG and (b) encrypting/decrypting data with the Rijndael algorithm (AES). A set of five rules as shown earlier in Figure 4 was used to support the explorations with FADSE; the rules were calculated automatically based on a DSE with multiple benchmarks. With support from the rule set, superior results were found in shorter time and with less simulations then in comparative runs without rules.
5
Summary, Conclusion and Outlook
We have presented an approach to accelerate the progress of the algorithm NSGA-II for automatic design space explorations (DSEs). This works by directing the mutation operator, i.e. the component in the algorithm which is used to sustain diversity, in the first generations of the exploration to regions of the design space, where good individuals are supposed to be located. The modified mutation operator presented for NSGA-II could also be integrated in algorithms like OMOPSO [26] and SMPSO [19], because they also incorporate a mutation operator. Technically speaking, fuzzy rules are used to describe the necessary transformations of individuals to improve their quality. These rules, also called transformation rules, are specified in the fuzzy control language (FCL), a domain specific language with high readability. Hence they can be specified easily by an engineer according to his knowledge or intuition. Also it is possible to use data mining methods to calculate transformation rules. We showed how a decision tree can be constructed automatically from a prior similar DSE and converted into transformation rules. These rules, describing areas of the parameter space with many high-quality configurations, are a very profitable result of the analyzed DSE because they allow conclusions about good values for the parameters. In the evaluation of the approach the Grid ALU Processor (GAP) is used as example. When performing DSEs for processor architectures, typically extensive simulations have to be run. We were able to show that a decision tree can be constructed automatically to describe regions of the design space with high density of high-quality configurations for GAP as well as M-Sim 2. With the transformation rules derived from this tree a respectable speed-up of about 25% for similar DSEs with slightly changed setups was gained for the GAP. As future work we plan to further extend FADSE making it able to cope with much larger design spaces with the goal of using it for adaptive code optimizations (see e.g. [28]), where parameters for multiple code optimizations and accordingly chosen hardware parameters have to be found. We already tested the proposed technique (generating the transformation rules) on the parameters of the M-SIM 2 simulator with very good results. The algorithm used to construct the classification trees is well known and should scale to larger design spaces easily. When dealing with many parameters not all of them might influence the output very much so some of them can be removed from the decision tree because of their low entropy.
References 1. G. Ascia, V. Catania, A. G. D. Nuovo, M. Palesi, and D. Patti. Efficient design space exploration for application specific systems-on-a-chip. Journal of Systems Architecture, 53(10):733 – 750, 2007. Embedded Computer Systems: Architectures, Modeling, and Simulation. 2. G. Beltrame, D. Bruschi, D. Sciuto, and C. Silvano. Decision-theoretic exploration of multiprocessor platforms. In Hardware/Software Codesign and System Synthesis, 2006. CODES+ISSS ’06. Proceedings of the 4th International Conference, pages 205 –210, oct. 2006. 3. H. Calborean. Multi-Objective Optimization of Advanced Computer Architectures using Domain-Knowledge. PhD thesis, Lucian Blaga University of Sibiu, Romania, 2011 (PhD Supervisor: Prof. Lucian Vintan, PhD), Sibiu, Romania, 2011. 4. H. Calborean, R. Jahr, T. Ungerer, and L. Vintan. Optimizing a superscalar system using multi-objective design space exploration. In Proceedings of the 18th International Conference on Control Systems and Computer Science (CSCS), Bucharest, Romania, volume 1, pages 339–346, Calea Grivitei, nr. 132, 78122, Sector 1, Bucuresti, May 24-27 2011. Editura Politehnica Press. ISSN 2066-4451. 5. H. Calborean and L. Vintan. An automatic design space exploration framework for multicore architecture optimizations. In Roedunet International Conference (RoEduNet), 2010 9th, pages 202–207, Sibiu, Romania, June 2010. 6. H. Calborean and L. Vintan. Toward an efficient automatic design space exploration frame for multicore optimization. In ACACES 2010 poster Abstracts, pages 135–138, Terassa, Spain, July 2010. 7. H. Cook and K. Skadron. Predictive design space exploration using genetically programmed response surfaces. In Proceedings of the 45th annual Design Automation Conference, DAC ’08, pages 960–965, New York, NY, USA, 2008. ACM. 8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on, 6(2):182–197, 2002. 9. M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and T. Brown. Mibench: A free, commercially representative embedded benchmark suite. 4th IEEE International Workshop on Workload Characteristics, pages 3–14, December 2001. 10. M. Hall. Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, 1999. 11. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11:10–18, November 2009. 12. IEC 1131 - programmable controllers, part 7 - fuzzy control programming, January 1997. 13. R. Jahr, T. Ungerer, H. Calborean, and L. Vintan. Automatic multi-objective optimization of parameters for hardware and code optimizations. In W. W. Smari and J. P. McIntire, editors, Proceedings of the 2011 International Conference on High Performance Computing & Simulation (HPCS 2011), pages 308 – 316. IEEE, July 2011. ISBN 978-1-61284-381-0. 14. K. G. Joseph J. Sharkey, Dmitry Ponomarev. M-sim: A flexible, multithreaded architectural simulation environment. Technical Report CS-TR-05-DP01, State University of New York at Binghamton, October 2005. 15. G. Mariani, A. Brankovic, G. Palermo, J. Jovic, V. Zaccaria, and C. Silvano. A correlationbased design space exploration methodology for multi-processor systems-on-chip. In Proceedings of the 47th Design Automation Conference, DAC ’10, pages 120–125, New York, NY, USA, 2010. ACM. 16. G. Mariani, G. Palermo, C. Silvano, and V. Zaccaria. Meta-model assisted optimization for design space exploration of multi-processor systems-on-chip. In Proceedings of the 2009
17.
18.
19.
20.
21.
22. 23. 24.
25.
26.
27.
28. 29.
30.
12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools, DSD ’09, pages 383–389, Washington, DC, USA, 2009. IEEE Computer Society. G. Mariani, G. Palermo, C. Silvano, and V. Zaccaria. Multi-processor system-on-chip design space exploration based on multi-level modeling techniques. In Systems, Architectures, Modeling, and Simulation, 2009. SAMOS ’09. International Symposium on, pages 118 –124, july 2009. G. Mariani, G. Palermo, V. Zaccaria, and C. Silvano. An efficient design space exploration methodology for multi-cluster vliw architectures based on artificial neural networks. In Proc. IFIP International Conference on Very Large Scale Integration VLSI - SoC 2008, Rhodes Island, Greece, October 13-15 2008. A. Nebro, J. Durillo, J. Garcıa-Nieto, C. A. Coello, F. Luna, and E. Alba. SMPSO: A new pso-based metaheuristic for multi-objective optimization. In Proceedings of the IEEE Symposium Series on Computational Intelligence, pages 66–73, 2009. B. Ozisikyilmaz, G. Memik, and A. Choudhary. Efficient system design space exploration using machine learning techniques. In Proceedings of the 45th annual Design Automation Conference, DAC ’08, pages 966–969, New York, NY, USA, 2008. ACM. G. Palermo, C. Silvano, and V. Zaccaria. Discrete particle swarm optimization for multiobjective design space exploration. In Proceedings of the 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools, pages 641–644, Washington, DC, USA, 2008. IEEE Computer Society. J. R. Quinlan. C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning). Morgan Kaufmann, 1 edition, Jan. 1993. S. Roychowdhury and W. Pedrycz. A survey of defuzzification strategies. International Journal of Intelligent Systems, 16(6):679–695, 2001. A. Sengupta, R. Sedaghat, and Z. Zeng. Rapid design space exploration by hybrid fuzzy search approach for optimal architecture determination of multi objective computing systems. Microelectronics Reliability, 51(February):502 – 512, 2010. 2010 Reliability of Compound Semiconductors (ROCS) Workshop; Prognostics and Health Management. B. Shehan, R. Jahr, S. Uhrig, and T. Ungerer. Reconfigurable grid ALU processor: Optimization and design space exploration. In Proceedings of the 13th Euromicro Conference on Digital System Design (DSD) 2010, Lille, France, 2010. M. Sierra and C. Coello Coello. Improving pso-based multi-objective optimization using crowding, mutation and ε-dominance. In C. Coello Coello, A. Hern´andez Aguirre, and E. Zitzler, editors, Evolutionary Multi-Criterion Optimization, volume 3410 of Lecture Notes in Computer Science, pages 505–519. Springer Berlin / Heidelberg, 2005. 10.1007/978-3540-31880-4 35. S. Uhrig, B. Shehan, R. Jahr, and T. Ungerer. The two-dimensional superscalar gap processor architecture. International Journal on Advances in Systems and Measurements, 3(1 and 2):71–81, September 2010. T. Waterman. Adaptive compilation and inlining. PhD thesis, Houston, TX, USA, 2006. Adviser-Cooper, Keith D. Z. Zeng, R. Sedaghat, and A. Sengupta. A framework for fast design space exploration using fuzzy search for vlsi computing architectures. In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 3176 –3179, 30 2010-june 2 2010. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength pareto evolutionary algorithm. Technical Report 103, Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH), Zurich, Switzerland, 2001.